Tuesday, November 22, 2016

Debating the Replication Crisis - Why Neuroethics Needs to Pay Attention

By Ben Wills

Ben Wills studied Cognitive Science at Vassar College, where his thesis examined cognitive neuroscience research on the self. He is currently a legal assistant at a Portland, Oregon law firm, where he continues to hone his interests at the intersections of brain, law, and society.

In 2010 Dana Carney, Amy Cuddy, and Andy Yap published a study showing that assuming an expansive posture, or “power pose,” leads to increased testosterone levels, task performance, and self-confidence. The popular media and public swooned at the idea that something as simple as standing like Wonder Woman could boost performance and confidence. A 2012 TED talk that author Amy Cuddy gave on her research has become the site’s second-most watched video, with over 37 million views. Over the past year and change, however, the power pose effect has gradually fallen out of favor in experimental psychology. A 2015 meta-analysis of power pose studies by Ranehill et al. concluded that power posing affects only self-reported feelings of power, not hormone levels or performance. This past September, reflecting mounting evidence that power pose effects are overblown, co-author Dana Carney denounced the construct, stating, “I do not believe that ‘power pose’ effects are real.”

What happened?

Increasingly, as the power pose saga illustrates, famous findings and the research practices that produce them are being called into question. Researchers are discovering that many attempts to replicate results are producing much smaller effects or no effects at all when compared to the original studies. While there has been concern over this issue among scientists for some time, as the publicity surrounding the rise and fall of the power pose indicates, discussion of this “replication crisis” has unquestionably spilled over from scientists’ listservs into popular culture.

Though replicability issues pervade many areas of experimental science, cognitive neuroscience and psychology are particularly susceptible. One main reason is the relatively high number and great impact of choices that researchers in this field make in methodology, data collection, and analysis (collectively known as “researcher degrees of freedom”). The consequences of shoddy science in psychology are outsized as well. More than perhaps most experimental disciplines, cognitive neuroscience and psychology directly impact popular culture, influencing how people interact and think of themselves. Phrenology, strict behaviorism, and the pathologizing of queerness are obsolete psychological doctrines that caused extensive harm before being shown to be roundly false. The shaky foundation of the power pose effect may be benign compared to the utter nonsense of phrenology, but both illustrate a distinct need to make sure that psychological results are true and valid.

Enter neuroethics. From debate on the ethics of cognitive enhancement to guidelines for the ethical use of neuroimaging research in the courtroom, neuroethics is fundamentally located at the intersection of society and the mind and brain sciences. A comprehensive neuroethics must consider not only society’s engagement with technology and scientific results, but the very process of research and the production of those results. After all, a policy recommendation or ethical analysis is only as valuable as the data on which it’s based. Consequently, neuroethics is obliged to keep an eye on the theories, methods, and findings of the mind and brain sciences. The replication crisis is a problem for psychology, society, and neuroethics as well.

Though this “replication crisis” is regarded by many as a major issue in the field, just how big of a problem it is and what the most appropriate response should be are questions whose answers have little consensus. This was the focus of a public debate hosted by The Center for Brain and Consciousness at NYU on Thursday, September 29, between Brian Nosek, psychologist at UVa and the director of the Center for Open Science, and Jason Mitchell, cognitive neuroscientist at Harvard. The title for the debate was, “Do Replication Projects Cast Doubt On Many Published Studies in Psychology?” but, as their goal was to dial in on the starkest differences of opinion between them, the debaters focused on the process of replication rather than quantifying the unreproducability of psychological research.

Flier for the NYU public debate
Brian Nosek, taking the affirmative position, presented first. He began by defining a replication attempt as a study that is identical to the original such that the only difference is their order. That one study comes before the other, Nosek argued, is irrelevant for evaluating results. At the same time, due to differences in time, place, sample, and other variables, he acknowledged that no replication is truly exact.

Nosek also expressed concern that the field, in determining a study’s value, often over-emphasizes statistical significance, in particular the famous p-value of 0.05 used in traditional statistics to determine if an effect is significant. While traditional statistics and p-values have their place, he argued, an over-emphasis on statistical significance is at odds with best scientific practices. For one, just as there can be many reasons why a study does not reach significance, there can be many reasons outside of a possible “true” effect why experiments can yield a statistically significant result (for more on this, Nosek suggested reading Greenwald 1975, a piece that presages much of the current debate). Thus, the potential causes of falsely concluding that there is an effect are more numerous than it might seem. This issue is exacerbated by science journals’ well-known bias toward flashy, statistically significant results, which provides incentive for researchers to take liberties with their data collection and analyses in ways that are more likely to yield statistically significant results.

Beyond encouraging questionable research practices, Nosek argued that over-emphasizing statistical significance makes researchers prone to gloss over results that, while not significant, are nevertheless informative. Even replication “failures” are not utter losses, but can contribute valuable information about the robustness of an effect.

Overall, Nosek pushed for a more inclusive understanding of what makes results in psychological research valuable while making a strong case that, among other causes, journals’ publication bias and misuse of aforementioned researcher degrees of freedom have led to a startling number of questionable findings in psychological science. Given the somewhat dire results of replication projects so far (the original Reproducibility Project: Psychology found that about 60% of studied failed to replicate, and a sneak peek Nosek offered of the data from the sister Many Labs 2 project was not much different), Nosek argued that widespread replication is essential for the health of the field.

In presenting the opposing position, Jason Mitchell did not object to replication (indeed, Mitchell stated that it is very important, especially within labs) so much as to the methods of the Reproducibility Project: Psychology (which Nosek led). In seeking to reproduce findings, Mitchell stated, there is an overemphasis on reproducing the minutiae of the procedure (direct replication) rather than capturing the essence of the original study (conceptual replication). The question psychologists seek to answer is rarely simply, “what is the effect of x stimulus on y mental event or behavior,” but rather, “what are the changes in y mental state or behavior brought about by mental causes that are in turn effects of x stimulus?” In an example he gave, if you’re studying the effect of mood on people’s tendency to socialize (operationalized as a happy mood elicited by listening to the Beach Boys or a sad mood elicited by listening to Adele), what you’re fundamentally not interested in is the effect of Adele or the Beach Boys on socialization per se. Rather, you’re interested in the songs’ effects only to the extent that they cause subjects to experience a certain mood. Mitchell argued that direct replication studies, in prioritizing the similarity of the individual labs’ attempts and their faithfulness to the original procedure, are likely missing the forest for the trees.

Amy Cuddy on "power poses," courtesy of Vimeo.
Touching on the nominal topic of the debate, Mitchell also made the important point that some effects in psychology, though “real,” are much harder to elicit than others. That cognitive dissonance is difficult to elicit in no way means it’s not a real effect. Resources of funding, knowledge, expertise in the paradigm, etc. all influence the likelihood of researchers successfully finding a real effect. Following this, he argued, it’s difficult to get any kind of accurate idea of the overall rate of replicability in the mind sciences, and results from the reproducibility project and similar efforts cannot simply be extrapolated to the field at large.

As noted by the hosts David Chalmers and Ned Block, there was little “bloodshed” – throughout the discussion, it was clear that the researchers respect each other and are committed to doing good science. They agreed on many fundamental issues, including that the mind and brain sciences are not without problems and that replication is valuable tool. Further discussion would have illuminated their differing opinions on direct versus conceptual replication and the plausibility of an overall suspected rate of replicability in cognitive neuroscience and psychology.

If this debate was any indication, neuroethics as a field may find some comfort in the general direction of the mind and brain sciences. Most researchers, both on the stage and in the audience, seemed to be taking replication seriously, and the field has displayed a push to self-correct. In the meantime, when it comes to psychological research, neuroethics may trust but must verify.


Carney, D. R., Cuddy, A. J. C., & Yap, A. J. (2010). Power poses: Brief nonverbal displays cause neuroendocrine change and increase risk tolerance. Psychological Science, 21, 1363-1368.

Farah, M. (2015). The unknowns of cognitive enhancement. Science 350, 379-80. DOI: 10.1126/science.aad5893.

Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1-20.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. DOI: 10.1126/science.aac4716.

Ranehill, E., Dreber, A., Johannesson, M., Leiberg, S., Sul, S., & Weber, R. A. (2015). Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women. Psychological Science, 33, 1-4.

Want to cite this post?

Wills, B. (2016). XXXX. The Neuroethics Blog. Retrieved on , from http://www.theneuroethicsblog.com/2016/11/wills-title-pending.html


James Burkett said...

I'm quite surprised that there was no mention of Dan Gilbert's technical comment on the Reproducibility Project, which essentially proved that their results were due to overly stringent statistical methods and overly lax definitions of what counts as a "replication."

While an increased emphasis on rigor is always welcome, there is no reproducibility crisis.

Bob Mekowsky said...

What is strict behaviorism? Do you mean John Watson?