P-Hacker Confessions: Daryl Bem and Me

Behavior & Belief

Stuart Vyse

June 13, 2017

We Didn’t Know Any Better, p < .05

Cornell University psychologist Daryl Bem and I have something in common. Yes, we are both research psychologists, but that’s not what I mean.

For me, it started when I was just a young graduate student. Statistics courses are a standard part of graduate training in psychology, because statistical methods are still the coin of the realm in psychological research. Most graduate students are required to conduct empirical research as part of their doctoral dissertations, and if they go on to academic positions, they often continue to do quantitative studies throughout their careers. Training in statistics is important because statistical number crunching techniques are the way we determine whether our results mean anything or not. Most of my graduate school cohorts hated anything that looked like math, but—to my surprise—I discovered that I liked statistics courses. I took more of them than were required, and my relatively strong background in stats was an important factor in landing an academic position. (Let that be a lesson to any psychology students who might be reading this.) In graduate school, I coached my math-phobic friends on how to enter data into the computer and analyze it, and in my academic life, I did the same with students and colleagues.

With all this background, I got to be pretty good at statistical consulting, and as a result, needy researchers often came knocking. Publishing trends are gradually changing, but even now, most studies need to report statistically significant results to have any chance of getting published. Journal editors are much less interested in studies in which nothing happened, so everyone is on a quest to achieve the vaunted p (for probability) < .05 that indicates the findings are unlikely to have happened by chance. When a friend’s research or my own appeared to have come up short, I was pretty good at salvaging something from the rubble. I might suggest altering the design of the study by combining data from previously separated groups of participants, or massaging the numbers in some way. These were techniques I’d learned at my mentors’ knees, and although we had some inkling that we were fudging the results a bit, we consoled ourselves by openly reporting the steps we’d gone through and supplying some plausible-sounding justification for each manipulation. We didn’t think we were doing anything wrong.

But now we know better. Today, the process I just described is called “p-hacking,” a pejorative term that suggests an unethical manipulation of data in search of statistical significance. We also know that admitting you manipulated your data is not like going to confession. It doesn’t wash away the sins of p-hacking, and in the modern world of research it is no longer acceptable. Most reputable journals will not accept an article based on such questionable techniques. Recent research aimed at trying to replicate previously published psychology studies has demonstrated—shockingly—that a large number of classic phenomena cannot be reproduced, and the popularity of p-hacking is thought to be one of the culprits.

Textbooks will have to be re-written in light of this “reproducibility crisis,” but there has also been a very positive outcome. In just a very few years, the standards for psychology research have been ratcheted up substantially with the introduction of the “open science” movement, which urges researchers to publicly state their plans for a study before they start data collection. Methods and results are also publicly posted so that it is harder to fudge your data and other researchers can reanalyze your results if they wish. Journals and professional organizations have quickly endorsed the principles of open science, which is spreading far beyond the field of psychology. I wrote about the open science movement in my December 2016 column, “The Parable of the Power Pose and How to Reverse It.”

Daryl Bem, p-Hacking, and the Paranormal

Daryl Bem is unquestionably one of the world’s most fascinating psychologists. His career spans five decades during which he’s made substantial contributions to the field. He got a BA in physics from Reed College, but when the civil rights movement got underway in the 1960s, he changed fields and earned a PhD in social psychology at the University of Michigan. He went on to teach at Stanford, Carnegie Mellon, Harvard, and ultimately Cornell University (Bem, D. n.d.). His early contributions include self-perception theory, which offered an alternative interpretation of the phenomenon we know as cognitive dissonance, and the exotic becomes erotic theory of sexual orientation, which suggested that biological differences in sexual orientation are mediated by early childhood experiences (Bem, D. 1996).

Even Bem’s personal life, about which he has been exceedingly open, is remarkable. When he first met his future wife, Sandra, in the mid-1960s, he reportedly told her two things: that he was from Colorado and that his sexual preference was primarily homoerotic. She replied that she had never met anyone from Colorado before (Nussbaum 1998).

From the beginning, they committed to a completely egalitarian and gender non-conforming partnership, long before this kind of marriage was fashionable, and they held to these principles while raising their two children. In 1972, the couple was profiled in an article called “A Marriage of Equals” for the first issue of Ms. magazine. Sandra became an eminent feminist scholar, author of the Bem Sex Role Inventory, and director of the women’s studies program at Cornell. In the mid-1990s the couple separated amicably but remained married, and both went on to have same-sex relationships (Nussbaum 1998). Through it all, they remained close, as Sandra recounted in her memoir “An Unconventional Family” (Bem, S. 1998). In 2009, when Sandra began to show the signs of Alzheimer’s disease, she resolved to take her own life before she became too ill to be capable of the act. Her death, which was documented in a New York Times Magazine story, happened on May 20, 2014, with Daryl at her side (Henig 2015).

Is Psi Real?

The foregoing already describes a remarkable life and career, but it is unlikely Daryl Bem will be best remembered for these things. His most widely publicized work has been in extrasensory perception, where he represents the rare combination of an accomplished mainstream psychologist who is also a believer.

Like much of his life and career, Bem’s path to ESP was somewhat unique. Beginning when he was just a young boy, Daryl was fascinated with magic shows, and as a teenager he amused his friends with magic tricks. He continued to dabble in magic and used it for demonstrations in his classes, but unlike The Amazing Randi, Banachek, or Penn & Teller, Bem’s interest in magic eventually led him to belief rather than to skepticism.

At approximately 4m 30s into this video, Bem uses a magic trick to explain the importance of adequate controls in a psychology experiment.

Bem was invited to perform some mentalism tricks at a meeting of the Parapsychological Association, where he met Charles Honorton (Tsakiris 2012). Honorton had set up a lab where he was doing mental telepathy experiments using the Ganzfeld technique, which employed sensory deprivation to sift out noise in the hope of allowing weak telepathic signals to be received. Bem was impressed with Honorton’s work, and in 1994, he and Honorton coauthored a meta-analysis of Honorton’s Ganzfeld data titled “Does Psi Exist: Replicable Evidence for an Anomalous Process of Information Transfer.” The article, which appeared in the prestigious journal Psychological Bulletin, purported to show an overall correct reception rate of 32 percent, which was statistically greater than the chance expectation of 25 percent on a four-choice test (Bem and Honorton 1994).

As you might expect, Bem and Honorton’s Ganzfeld study caused quite a stir in the scientific community. Most of the earlier ESP research had been published in parapsychology journals that were viewed with skepticism by mainstream psychology. Here was a highly respected scientist publishing evidence of psi in one of psychology’s most respected journals, and for the first time ESP was getting a serious look.

Daryl Bem speaking at a CSICOP meeting in Buffalo, New York in 1983. (Source: wikipedia)

CSI Fellow Ray Hyman, who earlier published a joint paper with Honorton suggesting additional experimental controls for future Ganzfeld studies (Hyman and Honorton 1986), wrote a critique of Bem’s statistical methods in the same issue of Psychological Bulletin (Hyman 1994), and many other critiques followed. In 1999, CSI Fellow Richard Wiseman and coauthor Julie Milton published a meta-analysis of eleven new Ganzfeld studies, comprising more participants than in the Bem and Honorton article, and they found no significant psi effect (Milton and Wiseman 1999). These studies were followed by much back and forth (see, e.g., Storm and Ertel 2001), but the current state of the Ganzfeld debate does not support the reality of psi (Bierman et al 2016).

A “receiver” in a Ganzfeld experiment sitting in a comfortable chair. Halves of ping pong balls cover the participant’s eyes, white noise is played through their ears, and the entire area is bathed in red light.

Feeling the Future

For his next big foray into psi research, Bem conducted his own research at Cornell over a period of ten years. The resulting 2011 paper, “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect,” which included nine different experiments, was published in another very prestigious journal and purported to show evidence that future events could influence the present (Bem 2011). According to Bem, participants who were asked to predict whether an object would be behind one of two curtains showed choices that were influenced by events that happened after they made their selections.

This time the media reacted in a big way. Many newspaper articles were written about the study, and even before it appeared in print, Bem was invited on the The Colbert Report. Stephen Colbert’s predictably comical interview with Bem is shown in the clip below.

As in the past, skeptics responded to Bem’s 2011 paper. CSI Fellow James Alcock wrote a highly critical review in Skeptical Inquirer, and Ray Hyman called the publication “crazy” in an article in the New York Times (Carey 2011). Finally, last summer, Bem and two coauthors reported the results of a pre-registered replication that was conducted using the more rigorous methods of open science (Engber 2017). Unfortunately, the results were negative: no evidence of feeling the future effect.

Traditionally, Bem has been a staunch defender of his work, responding to every criticism, but lately he seems to be softening. In May 2017, Daniel Engber wrote an extensive article on Bem’s parapsychological research for Slate, in which Bem came remarkably close to an outright admission of p-hacking:

“I’m all for rigor,” he continued, “but I prefer other people do it. I see its importance—it’s fun for some people—but I don’t have the patience for it.” It’s been hard for him, he said, to move into a field where the data count for so much. “If you looked at all my past experiments, they were always rhetorical devices. I gathered data to show how my point would be made. I used data as a point of persuasion, and I never really worried about, ‘Will this replicate or will this not?’” (Engber 2017).

I hear a bit of weariness in this statement—perhaps the sentiments of a battered researcher near the end of his career—and to some extent, it is a feeling I can understand. The standards for research are changing rapidly, and after a career of doing things a certain way, it is a daunting task to retool in the new procedures of open science. Like Bem, my empirical research days are mostly behind me now, and part of me is relieved to get out before having to confront the new research realities. Open science is a welcome and hugely important development. It may end up being the salvation of psychological science. But, like Bem, part of me is happy to leave it to younger researchers.

If there is an upside for Bem, it might be that, in retrospect, he has been given substantial credit for stimulating the movement to tighten the standards for research. In the 2011 paper he chose methods that he knew would be easy for other researchers to use, and he has been quite vocal in encouraging replication. Some observers, including Jade Wu, who was a research assistant on the 2011 study, have voiced the opinion that this was all a deliberate plan to expose the failings of the field and spark a movement for reform: “I still think it’s possible that Daryl Bem did all of this as a way to make plain the problems of statistical methods in psychology,” Wu is quoted as saying in the Engber article (Engber 2017).

Bem brushes off these suggestions, minimizing his role: “I get more credit for having started the revolution in questioning mainstream psychological methods than I deserve,” Bem told Engber. “I was in the right place at the right time. The groundwork was already pre-prepared, and I just made it all startlingly clear” (Engber 2017).

Whatever Bem’s personal role might be in the new wave of rigor in psychology, two things are abundantly clear: p-hackers like Bem and me are out of business, and science is better for it.


Stuart Vyse

Stuart Vyse's photo

Stuart Vyse is a psychologist and author of Believing in Magic: The Psychology of Superstition, which won the William James Book Award of the American Psychological Association. He is also author of Going Broke: Why American’s Can’t Hold on to Their Money. As an expert on irrational behavior, he is frequently quoted in the press and has made appearances on CNN International, the PBS NewsHour, and NPR’s Science Friday. He can be found on Twitter at @stuartvyse.