When Big Evidence Isn’t: The Statistical Pitfalls of Dean Radin’s Supernormal
Dean Radin’s new book claims that the scientific evidence for supernormal human abilities is now overwhelming. Radin relies upon meta-analyses and misrepresentations of published results to produce outlandish confidence numbers that work against the very belief he is trying to foster.
We all want superpowers. I don’t think there is a single person my age who didn’t, as a kid, close their eyes and reach out toward some random object, hoping to will it into their hands with The Force. And while most of us stopped trying to manifest supernatural powers, hardly any of us ceased wishing we had them. In that spirit, Dean Radin’s new book, Supernormal: Science, Yoga, and the Evidence for Extraordinary Psychic Abilities, attempts gamely to use science to revive our belief in our superpotential. That it fails to do so is a result of Radin’s tendency to laud showy results over responsible ones, leaving the inquisitive reader utterly unsure about what to trust and what to leave behind.
Most people coming to this book are looking for the evidence promised in the title. Unfortunately, they’re going to have to wait, as the first third of the book is primarily a drawn out lament about parapsychology’s lack of recognition by the academic community and its rejection by skeptics. We are offered a hundred pages of Radin saying that skeptics are wrong rather than showing that they are, with repeated assurances that he’ll get to the proof…later. Had he retitled the book Dozens of Pages of Bitching about Michael Shermer, and Then Some Evidence about Yoga he could have saved us all a great deal of time.
Eventually, however, Radin purges himself of bilious resentment and is ready to get down to the business of providing the long-awaited evidence. Or rather, nonevidence, for he begins with an analysis of the Big Siddhis—those manifestations of yogic mastery that sound highly improbable to modern ears, like levitation, invisibility, and self duplication. He sets out to answer the reasonable question, “If the yoga sutras are a faithful account of the supernormal abilities of the human body, why can’t I YouTube, right now, a hundred videos of yogis making themselves disappear in controlled situations?”
Radin’s answer is that the sutras instruct yoga masters not to show off their perfection of the siddhis, lest it affect their ego. So, they refuse to demonstrate their mastery in order to keep themselves from the possibility of being tainted. This is a curiously selfish thing to do for a being supposedly concerned about minimizing humanity’s suffering. Faced with the opportunity of demonstrating the power of yoga to overcome the thousand natural shocks of fleshy existence, and thereby potentially helping untold millions lead better lives, the yogi chooses to do nothing in order to preserve his personal purity. In the final analysis, we are left with two choices, either the big siddhis are real, and yogis are selfish jerks who value their own magnificent purity over helping their fellow man through maintaining a conspiracy of secrecy, or they aren’t, and these yogis are simply charlatans or charmingly delusional. Either way, it doesn’t look too good.
The real, statistically analyzed evidence doesn’t come until another thirty pages later, with Radin’s chapter on precognition. Right out of the gate, he produces a study that supposedly demonstrates the existence of psi effects with odds against chance of ten million billion billion (10 15) to one! Except it’s not a study, it’s a meta-analysis carried out in 1989 by Charles Honorton and Diane Ferrari in which they took all of the studies of forced-choice recognition (think of Bill Murray’s experiment in the first scene of Ghostbusters but without the shocks) between 1935 and 1987 and combined them into one super-study. I was intrigued by the profoundly high number quoted by Radin, so I looked up the initial paper and found that Radin’s representation left out a number of significant caveats found in the original.
For example, the studies gathered by Honorton and Ferrari are, by their own admission, “extremely heterogeneous” with z-scores ranging from -5.1 (extreme negative correlation) to 19.6 (extreme positive correlation). The studies are all over the map, with some highly uncharacteristic outliers. This made the paper authors uncomfortable, and they responsibly shaved off the top and bottom 10 percent of their data to get a better idea of what most of the studies were coming up with. That results in an effect size of .012, half that of their original result. This is very small, but it is still an interesting number. The fact that Radin decided not to report it, but rather to report the .02 result from the extremely heterogeneous data, is troubling. Why neglect a responsible but still intriguing result in favor of a larger result that even the original authors were uneasy about? And if this is how he reports data from readily available studies, how can I, as a reader trying to figure out what the evidence actually says, entirely trust his reporting on numbers for the studies he references that are much harder to find? With his very first statistically measurable study, Radin made a choice that casts a pall on all the numbers he will go on to convey, numbers that are far less towering than ten million billion billion (1015) to one.
Ultimately, he decides that these forced choice tests aren’t suitable to probe what is really happening with precognition and moves on to free-response experiments, where a subject is asked to describe the randomly selected location of an agent miles upon miles away. Of the 653 tests performed at Princeton, he selected a single example to show just how successful these tests were. This is his polished gem, his shimmering exemplar of the very best possible result obtained. It’s worth quoting in full, because it lets us know what passed as a “success” in these trials, and therefore informs us of how confident we should be in the reported odds. This is what the subject described as the agent’s future location:
A rather strange yet persistent image of [the agent] inside a large bowl—a hemispheric indentation in the ground of some smooth man-made materials like concrete or cement. No color. Possibly covered with a glass dome. Unusual sense of inside/outside simultaneity. That’s all. It’s a large bowl. If it was full of soup [the agent] would be the size of a large dumpling!
Radin reports the actual location of the agent as “the radio telescope at Kitt Peak.” The problem being that there is more than one radio telescope at Kitt Peak, and they look rather different. The most significant is the VLBA radio telescope run by the National Radio Astronomy Observatory, and is in no regard similar to anything that the subject mentioned. If you decide to keep digging until you get the result you want, you stumble upon the much smaller KP12m radio telescope run by the Arizona Radio Observatory, which is as close as Radin is going to get. It features (1) No indentation in the ground of any sort, (2) a white fabric retractable dome, (3) the colors white and silver, (4) a giant thirty two foot telescope sitting right in the center of it. The subject’s response, then, is a mixture of plainly wrong positive statements, an absence of the most significant feature of the observatory (namely, its telescope), and a somewhat correct statement about the rough shape of the building (it isn’t technically a hemisphere, but we can let that go).
This is not an average example chosen at random from the study—this is the single best piece of data that Radin could find from a study extending over two decades, and it is a mess. If this is the sort of thing that qualifies as a sing-it-upon-the-mountaintops success, little wonder, then, that he reports the odds against chance at a whopping thirty-three million to one. As American standardized testing learned years ago, the best way to improve results is to redefine success.
The rest of the section is a cataloging of other studies with ever-declining odds-against-chance that demonstrate a front-loading of favorable data, something that journalists tend to do but that scientists oughtn’t. The most telling detail is the disappearance of error bars in his graphical representations of data. Radin is all too happy to include them when they make the data look good, as in his skin conductivity experiments, but then they have a tendency to disappear where they would make the data look very bad, as in his lab’s occipital lobe experiments. It’s a curious omission, compounded by his weakness for presenting outlier-responsive means in these graphs when medians would have been more responsible, if less dramatic.
It is one of the deep ironies of this book that it begins with such extended flights of frustration about skeptics’ responses to parapsychological data, and then spends its middle section doing everything that one can possibly do to rouse a skeptical response—misrepresenting report data, lowering success criteria, and playing a somewhat loose game with how rigorously confidence information is presented. I find it all immensely irritating. I very much want to know what the data is, and how trustworthy it is. I would gladly take a small but sure result over a larger but fishy one, but there is something of the showman in Radin that gravitates toward the latter, leaving the reader who wants to think about the data, rather than merely accept Radin’s interpretation of it, with nothing to take away from the experience except frustration at what might have been. Radin is ever in this book his own worst enemy.
The subsequent sections are largely a repetition of the pattern established in his precognition chapter. In the telepathy section, we have another handpicked trial example along the lines of the dismal radio observatory one, and it is only mildly more compelling. There are more meta-analyses that accomplish little more than casting doubt on the rigor of the whole statistical method of meta-analysis. Take, for example, this characterization of several different studies of ganzfeld telepathy experiments: “Of the seven meta-analyses, six reported statistically significant evidence with odds against chance ranging from a modest 20 to 1 to over a trillion to 1.” The seventh study, he goes on to add, found no statistically significant evidence at all.
What are we to make of this egregious spread of reported odds? It stands to reason that, if I ask a chemist what the mass of a mole of oxygen is, and he says, “Studies suggest somewhere between one gram and a trillion grams,” I would probably have pretty good cause to doubt the mass-determining mechanisms of chemistry. Meta-analyses appear to be the ideal subject of scientific debate—by combining dozens or hundreds of previous studies, they allow us to have a truly massive set of trials to work our statistics on, and at the same time seem to offer balance in so far as irregularly positive studies are often balanced out by uncharacteristically negative ones. However, there is a dire power within a meta-analysis, secretly wielded by the author, and it is this highly subjective aspect that lends each analysis its unique end result.
Put simply, the author gets to weigh how much an experiment counts to the aggregate through his evaluation of its quality. Ray Hyman, the author of one of the seven ganzfeld meta-analyses mentioned by Radin, highlighted how this weighing process often bends data toward desired outcomes in an article for Skeptical Inquirer in 1996:
I did a meta-analysis of the original ganzfeld experiments as part of my critique of those experiments. My analysis demonstrated that certain flaws, especially quality of randomization, did correlate with outcome. Successful outcomes correlated with inadequate methodology. In his reply to my critique, Charles Honorton did his own meta-analysis of the same data. He too scored for flaws, but he devised scoring schemes different from mine. In his analysis, his quality ratings did not correlate with outcome. This came about because, in part, Honorton found more flaws in unsuccessful experiments than I did. On the other I found more flaws in successful experiments than Honorton did. Presumably, both Honorton and I believed we were rating quality in an objective and unbiased way. Yet, both of us ended up with results that matched our preconceptions.
In other words, if you want a study to count less, you tend to find more flaws with it, and if you want it to count more, you tend to gloss over flaws that might exist. In a normal study, this power would wreak comparatively minor havoc, because the trial number is low enough that a modest result doesn’t lead to massive odds-against-chance numbers. However, when you exercise this power with millions of pieces of data, the impact is colossal, and the odds-against-chance skyrocket, resulting in “trillion to one” type numbers whose immensity belies their tenuousness.
There is perhaps phenomenal data within these studies, but Radin’s tendency to reach for the astronomical keeps him focused on meta-analyses wherever he can get them, and, as the wild variation in results shows, there just isn’t enough standardization of method in this approach yet for us, the readers, to really trust what is going on. Radin veers between hyper-specific reports of individual trials that are entirely unconvincing and massive billions-upon-billions meta-analyses that grow progressively less trustworthy in the telling, and thereby does himself disservice. Every time I found myself wondering, “Is there really a new physical principle at work here, or perhaps an extension of a known but ill-defined one?” that curiosity was soon stamped out by the author’s eager proclivity for over-reaching.
Radin’s intention with this book is to convince us average folks of the existence of supernormal abilities. In my case, it didn’t succeed. After a certain point, I felt that my trust had been abused one too many times, and “Radin fatigue” set surely in so that data that might have been convincing presented on its own became suspect when issuing from the pen of the fellow who had taken me on so many rides already. I come away from this book not convinced, but not unconvinced as to certain details either, and if Radin can adjust his expectations from “convincing people” to “not unconvincing them” then, well, he might have a good night’s sleep yet.