Cambridge Analytica isn’t the only problem, but there is an answer

6 min readApr 9, 2018

About two years ago, I was invited to participate in an event on the issues of research in problematic environments. I was joined by people who were scholars in anthropology and other social sciences. Many of them had experience working in extreme areas such as disaster relief, famine relief, and even the Ebola crisis. I discussed issues of data collection online.

The results of the event were all published for free by the University of Oxford. and my piece, titled “Big data and anthropology: Concerns for data collection in a new research context”” can be found online for free (this is the link to the paper). In the paper, I address a number of issues related to data security and research ethics. Keeping in mind that this paper was presented in 2015, and published in 2016, before the election was even finished and years before the Cambridge Analytica scandal, I wanted to bring to your attention one passage from page 78 (page 5 of the pdf from the link above):

“Accidental data collection
A second issue with data protection now arises. Specifically, when I grant access for an outside party to gather my data, by implication it also allows them to collect information about other individuals (i.e. my friends). This is the case even though there was no informed consent on the part of any other person besides myself. Given how many friends an individual is likely to have on a social network, what results is that informed consent has not been obtained for most of the ‘participants’ who have now become part of a study.”

This sort of accidental data collection has been addressed by Facebook in the past, as well as recently. However, the issue is still a very real part of what qualifies as “archival” research in most research ethics manuals. The fact is that if you put the information online, researchers, corporations, and governments have free unfettered and unvetted access to that information. This is not just an issue of Cambridge Analytica either. This is an issue of ethics in the internet age.

Personally, my opinion is if you don’t want it on the internet, don’t put it on the internet. The issue is however, that its impossible to take something down from the internet after its been distributed or saved onto someone else’s server unless you have direct access to it. Furthermore, its hard to imagine a mechanism for deleting previous content that you might no want because of a change of life status (pictures of you and an ex put on social media before they became your ex) or just because you grew up (pictures with red solo cups come to mind), without there also being a mechanism to track everything you’ve done on the internet.

Overly simplified meme that misses the point. Still interesting though.

At the moment, there are still tons of organisations that can access our data. I believe that the reason we are so angry with Facebook over Cambridge Analytica is we allowed Facebook to have our data but not with the idea that we would get Trump elected. The fact is that this is not what happened. The media has been horrible at reporting about Cambridge Analytica, Chris Kavanagh did a great piece on this. This isn’t the first time though, similar companies have obtained our social media data to influence elections. After all , this was heralded as a great move by President Obama, whose data team went on to found BlueLabs, the liberal version of Cambridge Analytica. Yet, for some reason, when the NSA drag net collects your data, or when Obama’s team uses data to influence elections, it is acceptable in the end. But when data collection is done for the Trump capaign, its an international conspiracy that gets a serious spotlight.

In the end though, very few people seem to realise that you can use web scraping algorithms to access Facebook data even if you don’t have an “app” that access the data. So it doesn’t matter if you are happy with democrats or republicans scraping your data depending on your own political beliefs. The fact is that any python coder worth half a sniff can put together a webs spider that can comb through social networks on Facebook in an evening using the modules beautifulsoup, regex, and urllib… how do you think I did it? Heck, you can even use NetworkX and Gephi to make beautiful visualisations if you’re so inclined.

My Facebook network visualised in Gephi (using Force Atlas 2 tension algorithm- Clusters coloured using Louvian Modularity assignment)

This fact demonstrates how much bigger the problem is than Cambridge Analytica. In the paper I linked to above, I discuss how data privacy is a huge issue for academic researchers, and how in many cases research institutions like universities are totally unable (not ready) to deal with the ethical concerns raised by big-data. For example, in my own research at the University of Oxford, the same protocols for reviewing the ethical implications of research are the same if you’re doing survey’s online or if you’re asking people to allow you to harvest data from social media. In the case of the information that is publicly available (such as Facebook posts you mark as “public”), the research counts as archival, so it isn’t even required that one submit for ethics approval in many universities. Yet, millions of people’s data, the very data in question regarding Cambridge Analytica, was gathered by a researcher, abiding by ethical implications until the point that the data was sold.

Logo for alternative to opensource alternative to Facebook, Minds.com

Some people have given up on privacy, stating that social networking sites are a must in today’s world and its the price we pay for their “service”. This may be the case, but Facebook and Twitter are not the only social networking apps.

For example, if you want a social network that doesn’t collect data like Facebook, look at minds.com. It was created by fellow-UVM alum Bill Ottman and friends and uses open source technology, so you even know how their algorithms work on their “news feed”. It also pays its users. No, I’m not kidding, for years now, it has paid users for generating content and now uses similar technology to bitcoin to pay its users. Learn more here. It’s a well known social networking cite because of its openness to all political ideas and types of people and its adamant protection of user data. This has made it popular among data protection hawks and even Anonymous. It’s founders even claim that it has been censored in google searches because it threatens google+. See interview below with their CEO and founder below:

If you want to send quick messages to friends like twitter but don’t want it to be public, as all tweets are, use an app like FireChat, which is available for Apple and Android (Remember, Facebook paid $22 billion for WhatsApp (despite it turning only a few million in profit, let’s not think they are using that data-just look into their terms of service, its clear that they are. In many ways, this shows just how valuable your data is.

I don’t think there is anything we can do to immediately fix this issue. Regulation is not the answer and I suspect that Zuckerberg’s push to regulate social networking sites is really just a corporate move to ensure market share domination in the years to come as the platform begins to slip and new generations fail to catch on to Facebook the way Millennials did.

I do however think there is something we can do in the future. Work with organisations like minds.com to develop new and more open social networking tools or start your own organisation like minds.com. Create competition. Learn to code, or find someone who can help you to code your new big idea. There is a problem with data protection in social networking, the answer though, has not yet been invented. I have faith in humanity and ingenuity that it will be found, but we have to work for it.

Cambridge Analytica isn’t the only problem, but there is an answer

Written by Justin Lane