Facebook Data Mining

As the Facebook IPO was approaching, I couldn’t stop thinking how much are they really worth.

Technologies: #GraphAPI #OLAP #Gephi #SNA #Tableau #Linux #JSON #PHP #MySQL #R


To answer this question, I concluded I need two variables:

  1. User activity worth
  2. User activity frequency

I understand Facebook, Inc. value not as a single number or a snapshot. It’s rather a complex process, an ever-changing cash flow. Yet, I needed to pick a place to start my estimations. The place was a marketing agency white paper I found during my research. Every couple of years Syncapse releases documents predicting social media engagement value. While not perfect, they became the basis for my estimates.

Picture above: A fragment of my Social Network visualized with Gephi Tookit. Visible community structure (who likes what).

The second part was harder. I needed to get random samples of Facebook users activity. Using SNA algorithms and computational statistics I could infer what was going on in a particular time frame in a selected area. In short, I solved this problem by writing a PHP Facebook App that would monitor user’s news feeds and their friends’ activities. Anonymized data was then siphoned to a MySQL cloud instance running an OLAP database. 52 randomly selected people agreed to take part in the research. Eleven days study and those 52 agents allowed me to gather around 90 000 records about 17 241 people.

This is how I accidently started my Master’s degree thesis. The database was later analysed using Gephi SNA Toolkit, Tableau and a bit of R programming.

So how much is Facebook really worth? You can see results in the presentation below (in Polish). If you have any questions about this project, feel free to contact me using social buttons on the left.

Click below to move the slides forward.

2018 Update

It appears that I used the same method as Cambridge Analytica to mine the Facebook data. CA is a notorious company mentioned recently by Mark Zuckerberg in his statement.

In 2013, a Cambridge University researcher named Aleksandr Kogan created a personality quiz app. It was installed by around 300,000 people who shared their data as well as some of their friends’ data. Given the way our platform worked at the time this meant Kogan was able to access tens of millions of their friends’ data.

The first difference is that I did that a year before them. The other differences are that I asked people for consent and did not pretend to give a ‘personality quizz’. I anonymized the mined data, and the whole situation convinced me rather to delete my own Facebook account than to make business out of systematic privacy violations.