Pages

Monday, 25 April 2022

Scraping Data off an Online Networked Learning Community

 

Gilly Salmon's 5-stage model to online learning


On a recent Agile day at my workplace, my big boss talked about the importance of networks. It got me thinking a bit harder about my past explorations on social network graphs and my past failed attempts at scraping data off SgLDC (a private Facebook group managed by ETD).  The SgLDC group has grown from less than 1000 members when I first joined to about 21k members now (considering the MOE teacher population of about 33k that number is pretty impressive).


from sketchplanations 

I was also recently introduced to Metcalfe's law which indicates that the value of a network is proportional to the square of the number of users and these are important concepts for the scaling-up efforts and processes of the prototypes that I am testing for Classroom of the Future. Thus, the value of the SgLDC is beyond a million now considering the current users (although many could still be considered passive users).



Thus, I explored various tools to see if I can extract data from Facebook groups or pages that I have access to. A colleague's previous attempt using RPA did not work so well so I thought I could try with my newfound knowledge of TagUI. However, it was to no avail (or perhaps I approached it wrongly). Searching the chrome web store, there were a few that seemed to work and one of them even helped to scroll down the FB page to load past posts (albeit imperfectly and takes a pretty long time). It is called the instant web scraper. These are 2 other alternatives: Scraper and Web scraper. One thing to note however is that some intense data cleaning may be required after the time-consuming (but automated) scraping process. I guess it is a good time to also mention that some sites do not allow web scraping and you may be banned from them if detected but I think these extensions are not so easily detectable. Hence, please only scrape sites that do not have explicit rules against scraping and perhaps where relevant use the data with consideration of the privacy of the various users or stakeholders. 

For me, I am still working on cleaning the data and so far it has been interesting to see in the time period of about more than half a year, there were more than 500 posts and one can then sort posts by popularity (likes and comments) and identify patterns on what makes those posts or posters so popular and in that way effective for communicating out messages.

No comments:

Post a Comment