Skip to main content

Big data and privacy on the Web: how should human research be conducted on the Internet?

“They said, ‘You can’t mess with my emotions. It’s like messing with me. It’s mind control.’” That’s what Cornell communication and information science professor Jeffrey T. Hancock reported in a recent New York Times article about the public outcry over the now infamous Facebook emotional manipulation study (read on for details). Hancock was surprised and dismayed over the response. He sees the advent of massive-scale sociology and psychology research on the Internet as a “new era” and he has a point. The days of mostly relying on college students as research subjects may be coming to an end. But how should research be conducted in this new online setting? Is it even appropriate to use data from web sites as it is collected now with little, if any, user knowledge and informed consent existing only in the form of privacy policies that nobody reads?1 In this post I argue that the Internet is not the Wild West and therefore internet-based research should not be allowed to side step established practices of informed consent. Furthermore, significant changes must be made so that these new research opportunities are maximized in the best way possible.

Via Linkedin

 Earlier this year Facebook, the social network with well over 1 billion users, found itself in hot water after publishing a study in collaboration with academic researchers (including Hancock above) that sought to measure “emotional contagion” online.2
In January of 2012 Facebook researchers altered the News Feed content of nearly 700,000 users without their knowledge or explicit consent. Users in a control group had random posts withheld from their News Feeds, irrespective of emotional content, whereas others either had some posts with a positive valence removed or some with a more negative tone hidden. They found that overall, omitting News Feed content – either negative or positive – seemed to affect the emotional valence of subsequent posts. And all those Facebook users who saw fewer emotion-tinged posts subsequently posted fewer updates. In effect, Facebook researchers were successfully able to manipulate the emotions of hundreds of thousands of users simply by altering their News Feed content. The researchers were able to study a phenomenon – emotional contagion – which can be very difficult to study in person due to a variety of potential experimental confounds and they were also able to achieve extraordinary statistical power. But by not explicitly asking for consent many users felt well, used.

In the paper, the authors state that would-be subjects gave informed consent for this research when creating an account. But this may not be true according to a report, which alleges that the word “research” was actually added to the privacy policy months after the study. Anyway, in my case, consent would have been given in the summer of 2005, nearly a decade ago. Even if I did agree to participate in research at that moment in time, I certainly don’t remember it and I could not have had any idea what would be possible on the Internet of 2014. Would you provide your consent today for a study of your Internet behavior in January of 2023? The journal that published the study, Proceedings of the National Academy of Sciences, updated the article with an attached “Editorial Expression of Concern” wherein the Editor-in-Chief Inder M. Verma noted that, as a private company, Facebook is not beholden to the US Government’s Department of Health and Human Services “Common Rule”, which guides human research best practices and advises that, in addition to documented informed consent, subjects should have the ability to opt out of a trial. This, however, is where journals need to step in to enforce ethical standards regardless of the research setting.

In another example, Christian Rudder, one of the founders of the free online dating site, drew fire recently for revealing the details of an internal study. The website essentially told users who appeared to be poorly matched that they were in fact well-suited and vice versa. His reaction to it all, which may be indicative of the current gulf between traditional brick-and-mortar institution researchers and Silicon Valley entrepreneurs (who may, pardon the pun, have their heads in the cloud) was, effectively, “what’s the big deal?”

The prevailing business model of Web-based companies – offering services for free in exchange for information which they then hope to monetize – inherently disagrees with the ethos of human research. Skipping (for the moment) the issue of informed consent, researchers should first and foremost avoid harming the study participants physically, emotionally or otherwise and should endeavor to protect their subjects’ privacy rather than feed off of it. There does however seem to be an important difference between studying data that is passively collected from users – who should expect that they are forfeiting some amount of privacy in order to get a free service like Facebook in return – and actively manipulating the information that users see and observing their responses.

Via EyeWire

These studies are not limited to social networks and dating sites, however and are quite relevant to neuroscience., perhaps the largest brain training website with more than 50 million users around the world, has embarked on the Human Cognition Project. Preliminary results from this project were published in a proof-of-concept study that had N’s well over 100,000.3 However, Lumosity (which I have written about previously on this blog) is very up-front about the Project and seems to be tapping into a widespread desire among the public to get involved in neuroscience and cognitive science research.

Academic researchers are taking advantage of crowdsourcing in neuroscience as well. MIT computational neuroscientist Sebastian Seung has found a solution to the image analysis bottleneck that was slowing down the progress of his group’s research. He developed Eyewire which, in their words, is “a game to map the brain…Anyone can play and you need no scientific background. Over 130,000 people from 145 countries already do. Together we are mapping the 3D structure of neurons; advancing our quest to understand ourselves.” This work has already produced a milestone paper in Nature.4

The landscape of Internet research is rapidly changing and, while it certainly presents a number of ethical challenges in terms of privacy and informed consent, there is vast potential for breakthrough studies – of a kind that have never been possible before – to answer important questions. The first question, though, is how to create a safe, effective, environment for research in a setting that is increasingly seen as insecure and where personal information is the most common currency. A necessary first step is to provide would-be subjects with the choice to opt out of a study. In addition, Internet privacy expert Helen Nissenbaum suggests there needs to be a fundamental shift in norms for conceptualizing privacy online. It is not reasonable to expect users to read all privacy policies for websites they visit and, even if they could, it is unlikely they could predict all the potential implications of those often nebulous terms.5 Moreover, some sites (such as Facebook) have become so widely used and so necessary for modern communication without any comparable competitors, that it could be argued that users do not really have a choice to take their business elsewhere, and are therefore implicitly coerced to participate.

The Internet offers incredible research opportunities, particularly to understand human behavior. But just because participants never actually step into the lab does not necessarily mean they don’t deserve the same protections as those who do. Besides, the literature is filled with clever, elegant experiments designed to prevent the participants from knowing the goals and hypotheses of the study – there is no reason that couldn’t also be done online. A major challenge facing academia is how to best partner with private companies who may be primarily interested in improving website functionality and in so doing collect large datasets that may be of interest to researchers. They must also address the issue of whether it is ethical to use data collected by private companies without real informed consent or the choice to opt out to answer academic questions. Internet connectedness has opened up incredible new opportunities for research into human behavior and cognition as well as for “citizen scientists” to help expedite progress in neuroscience. Now is the time to codify ethical standards for these studies to ensure that growth in this area continues in the best way possible.


1. Kelley, P. G., Bresee, J., Cranor, L.F., Reeder, R.W. in SOUPS ’09 Proceedings of the 5th Symposium on Usable Privacy and Security. (ACM).

2. Kramer, A. D., Guillory, J. E. & Hancock, J. T. Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences of the United States of America 111, 8788-8790, doi:10.1073/pnas.1320040111 (2014).

3. Sternberg, D. A. et al. The largest human cognitive performance dataset reveals insights into the effects of lifestyle factors and aging. Front Hum Neurosci 7 (2013).

4. Kim, J. S. et al. Space-time wiring specificity supports direction selectivity in the retina. Nature 509, 331-336, doi:10.1038/nature13240 (2014).

5. Nissenbaum, H. A Contextual Approach to Privacy Online. Daedalus 140, 32-48 (2011).

Want to cite this post?

Purcell, R. (2014). Big data and privacy on the Web: how should human research be conducted on the Internet? The Neuroethics Blog. Retrieved on , from


Emory Neuroethics on Facebook