Meta-Analysis and the Filedrawer Effect

Victor Stenger

Volume 12.4, December 2002

In the natural sciences, an extraordinary phenomenon is normally not considered to be even tentatively established until it has been observed, in precisely the same form, in two or more independent experiments in which each experiment stands alone as being statistically significant and free of other errors. It is safe to say that this condition has not yet been achieved for psi, despite experiments going back to the 1850s. Any other phenomenon that had failed to be confirmed after all this time would have been long abandoned as a lost cause. But, since scientific evidence for psi would support the belief of so many people that they possess mental or spiritual powers transcending the limitations of matter, the search goes on.

While many parapsychologists admit that the existence of psi is still not conclusively demonstrated, a few have insisted that the evidence in its favor is now overwhelming. Since they are not able to make this assertion based on conventional scientific criteria, they invent other criteria. Current claims rest on a dubious procedure called meta-analysis in which the statistically insignificant results of many experiments are combined as if they were a single, controlled experiment.

Where several similar experiments are available, the meta-analyzed probability for the whole package resulting from chance is estimated using statistical techniques. If the combined result is statistically significant, then the phenomenon is regarded as confirmed. But this procedure is fraught with dangers. I cannot think of a single example of a new phenomenon that has been established by meta-analysis.

The most prominent proponent of using meta-analysis to demonstrate the reality of psi is Dean Radin, whose 1997 book The Conscious Universe is subtitled The Scientific Truth of Psychic Phenomena. Radin asserts that when one looks at the aggregate of data collected over time, you can only conclude that psychic phenomena are scientifically validated.

For example, Radin notes that 186 ESP card tests involving four million trials were published worldwide from 1882 to 1939. He takes these results at face value, downplaying any possibilities of cheating or other plausible conventional explanations that skeptics have been able to uncover in virtually every case where sufficient information about the data and procedures has been made available. Making the disputable assumption that the ESP data are all trustworthy, Radin claims that the odds against chance are more than a billion trillion to one.

Radin is aware of the file-drawer effect, in which only positive results tend to get reported and negative ones are left in the filing cabinet. This obviously can greatly bias any analysis of combined results and Radin cannot ignore this as blithely as he ignores other possible, non-paranormal explanations of the data. Even the most fervent parapsychologists recognize this problem. Meta-analysis incorporates a procedure for taking the file-drawer effect into account. Radin says it shows that more than 3,300 unpublished, unsuccessful reports would be needed for each published report in order to “nullify” the statistical significance of psi.

In his review of Radin’s book for the journal Nature, statistics professor I.J. Good disputes this calculation, calling it “a gross overestimate.” He estimates that the number of unpublished, unsuccessful reports needed to account for the results by the file drawer effect should be reduced to fifteen or less. How could two meta-analyses result in such a wide discrepancy? Somebody is doing something wrong, and in this case it is clearly Radin. He has not performed the file-drawer analysis correctly.

Douglas Stokes is a specialist in statistical analysis who is very sympathetic to the psi movement. In his book The Nature of Mind, Stokes considers the wide range of reports of psychic phenomena, from the anecdotal to the experimental. Although he concludes that psi has not been scientifically demonstrated, he still wants to believe in it based on the “compelling stories” he has heard and his own personal “spontaneous psi experiences.” Nevertheless, in an article published in Skeptical Inquirer (25[3], May/June 2001, 22-25), Stokes describes the fundamental errors that Radin and others have made in their calculations.

The file-drawer problem is not limited to meta-analyses but applies to single experiments as well. Stokes looked specifically at an experiment by Alan Vaughn and Jack Houck involving ESP-card guessing questionnaires sent to them by twelve subjects, each significant at the 5 percent level. The authors claimed that over 33,000 subjects would have had to be tested to produce the reported effect as a statistical fluctuation. Since they did not send out anywhere near that number of tests, they conclude that the net effect was real.

To test this calculation, Stokes simulated the experiment on a computer, which the authors also could have easily done-and indeed should have done. This is called a “Monte Carlo analysis.” I spent a good portion of my research career in particle physics doing such analyses, in which you try to learn all the possible sources of error in an experiment, statistical and systematic, by “doing the experiment in the computer.” This procedure is simple, straightforward, and does not rely on any problematic statistical techniques or packaged programs that the users apply blindly. As an example, Stokes generated random data for thirty subjects and selected out the twelve highest scores, all of which had statistical significances (p-values) of one percent or better. This left only eighteen in the file-drawer, not 33,000 as Vaughn and Houck claimed were needed. The experiment was designed so that the subjects knew their scores before mailing them in. One can easily imagine eighteen people with low scores not bothering to report.

Why this vast difference? In a detailed analysis of the file-drawer problem, Jeffrey Scargle has shown that a widely-used technique called “Fail Safe File Drawer” analysis is deeply flawed because it assumes that the experiments in the file-drawer are unbiased. In fact, as we saw in the above case, they are biased, by definition.

I regard it as especially significant that psi sympathizer Stokes has spoken out against meta- (and file-drawer-) analyses. Critics of paranormal claims are often accused of being closed minded, dogmatic worshippers of the “religion of scientism.” This accusation is false. We are not closed to any paranormal claim nor prejudiced against any individual adherent. Show us the evidence and we will consider it. However, we will steadfastly insist on applying the same rules that we would to claims for a new particle or a new drug. In particular, we refuse to agree to adopt new criteria, such as proposed by Radin, just for the benefit of researchers in a field of study that cannot seem to get positive results any other way.

Victor J. Stenger was emeritus professor of physics and astronomy at the University of Hawaii and Visiting Fellow in Philosophy at the University of Colorado. He died on August 25, 2014. His final book was God and the Multiverse: Humanity’s Expanding View of the Cosmos, and his previous books include Not By Design, Physics and Psychics, The Unconscious Quantum, and Timeless Reality: Symmetry, Simplicity, and Multiple Universes, and The Fallacy of Fine-Tuning: How the Universe is Not Designed for Humanity.