More Options

Hyman’s Reply to Schwartz


Ray Hyman

Skeptical Inquirer Volume 27.3, May / June 2003

I cannot, of course, respond in detail within the allotted space to each of Schwartz’s arguments. Instead, I will comment on his major points and conclude with a general reaction to his rebuttal.

1. "Hyman resorts to . . . selectively ignoring important information that is inconsistent with his personal beliefs."

In preparing my critique of his research program, I not only read The Afterlife Experiments carefully, I also scrutinized in detail every report of his research that was available. It was not possible to discuss each separate piece of information in my critique. I took each item into account, however, in making my assessment of the research. I chose to focus my discussions on those items that Schwartz and his colleagues had emphasized as the strongest outcomes amongst their findings. I have refereed and reviewed research reports for more than fifty years for many of the major scientific publications and for major granting agencies. I applied the same standards to my evaluation of the afterlife experiments that I have used in my other assessments.

2. ". . . Hyman failed to mention the important historical fact that our mediumship research actually began with double-blind experimental designs."

As his example he refers to his experiment with the mediums Susy Smith and Laurie Campbell that "was completed almost a year before we conducted the more naturalistic multi-medium/multi-sitter experiments involving John Edward, Suzanne Northrop, George Anderson, Anne Gehman, and Laurie Campbell. The early Smith-Campbell double-blind studies did not suffer from possible subtle visual or auditory sensory leakage or rater bias — and strong positive findings were obtained."

This is a peculiar example to use as a model of a controlled, double-blind experiment. The experiment involved having Susy Smith, designated as Medium One, apparently contact four deceased persons: her own mother, William James, Linda Russek’s father, and Schwartz’s father. Smith made a drawing for each of these departed individuals supposedly with their input. She also made a "control" drawing. Laurie Campbell, designated as Medium Two, was then requested to independently attempt to contact these departed individuals and, using the information obtained from them, to try to match each drawing to the associated departed individual. Campbell attempted to contact the departed entities during two sessions in the presence of three experimenters. Campbell is described as being "blind" to personalities of the four departed individuals. However Schwartz, who was not blind to the personalities of these entities, was not only present during these sessions but actively trying to convey this information (through¬† "telepathy") to Campbell. This unnecessary blunder compromises whatever blinding would have existed between Medium Two and the personalities of the departed individuals. No psychic investigator would be surprised if Laurie Campbell came up with some correct information such as the gender and other descriptors of the departed individuals under these conditions.

Another defect of this phase of the experiment is that no provisions were made to use a systematic and objective method for assessing the accuracy of Medium Two’s descriptions. The evaluation of the information for this stage of the experiment was subjective.

During the sittings with Medium Two, all the experimenters were blind as to which drawing was associated with which departed individual. (Although it is plausible that one might be able to make some reasonable guesses, given the characters of each of the departed individuals, which type of drawing would go with each one.) Unfortunately, the experimenters then make another serious, and completely unnecessary, blunder when it came time to see if Medium Two could accurately match the drawings with the appropriate individual. The experimenters brought Medium Two and Medium One together.  Medium One then displayed the drawings she had made to represent each individual. Medium Two then attempted to match the drawings to the appropriate sources in the presence of Medium One. Ironically, the experimenters openly admit that this could allow clues about the correct matching through the "Clever Hans" phenomenon. They dismiss this as possibility because Campbell was able to correctly match only one of the five drawings to its appropriate source.

At this point in the experiment the report becomes especially murky. Presumably, the experiment has failed. However, the experimenters inexplicably have Medium Two try again to match the drawings to their appropriate source. This second attempt is made after she is shown an explicit summary of her comments about the pictures and the departed individuals. Campbell correctly matches the five drawings (including the control) in this second attempt. No reason is given for giving the medium two tries at matching the drawings, nor do the experimenters tell us how they justify asking the medium to redo her matching. Probably these and other questionable aspects of the procedure are moot given that the possibility of blinding was compromised.

Schwartz and his colleagues, in their published paper, describe this as an "exploratory study." The proceedings seem to have been improvised at each stage. Certainly, no competent investigator would plan to unnecessarily compromise experimental blinding at the two most critical points of the data collection. Nor does it make sense to design an experiment wherein the medium is given two chances at getting the matching correct. I simply was applying the principle of charity in not discussing this botched experiment.

3. "In an exploratory double-blind long-distance mediumship experiment . . . Hyman states 'because nothing significant was found, the results do not warrant claiming a successful replication of previous findings.’ However, Hyman minimizes the fact that the number of subjects in this exploratory experiment was small (n=6). More importantly, Hyman fails to cite a(n) important conclusion that we reached in the discussion: If the binary 66 percent figure approximates (1) LC’s actual ability to conduct double-blind readings, coupled with (2) the six sitters’ ability, on the average, to score transcripts double-blind, the 66 percent figure would require only an n of 25 sitters to reach statistical significance (e.g. < .01)."

This part of Schwartz’s rebuttal, like all the other parts, strikes me as both bizarre and off the mark. First, we need to clear up some mistakes and/or misunderstandings. Schwartz confuses the sample statistic with the population (or hypothesized true value). Given twenty-five sitters and a sample outcome of seventeen correct identifications (success rate of 68 percent) of their actual readings (which, given the discrete nature of the binomial distribution is the closest we can get to 66 percent correct) the one-tailed probability would be .054 and not less than .01 as Schwartz claims. Regardless of the correct probability value here, this has little to do with power. Schwartz is hypothesizing that the true (population) proportion of correct binary choices in this situation is close to the 67 percent (4 out of 6) that he observed in his sample. If, indeed, this value is correct, then, given his use of a one-tailed test and a significance level of .01, the probability of getting a significant outcome with twenty-five sitters would be slightly more than 0.54. To have a reasonable power (say close to 90 percent) one would need over 100 sitters.

Schwartz appears to be begging the question here. He begins by observing that four out of six sitters correctly identified which of two readings was meant for them. Because of the small sample, this outcome is consistent with a number of possibilities including the chance value of 50 percent. If he had obtained the same proportion of correct hits with a larger sample, then it would have been significant. However, since we cannot tell what the true proportion is from a sample outcome based on only six cases, we have no basis for predicting the outcome for a larger sample. His argument reduces to the trivial one: If the true proportion is 67 percent then we will be able to get a significant outcome with a larger sample. From his actual outcome, we can just as well say: If the true proportion is 50 percent (and this, too, is consistent with his data), then he will very likely not get a significant outcome with a larger sample.

I find it difficult to understand why Schwartz considers this point worthy of mention. Of course a binary outcome with only six trials has very low sensitivity. However, he did not rely on this outcome. He used two other measures, the number of dazzle shots and the hits and misses, which are clearly much more sensitive. These also failed to provide overall significance. For these measures (as well as for the actual choice of the relevant reading), the overall sensitivity would have been greatly enhanced if each sitter actually rated all six readings. In addition to greatly enhanced sensitivity, this would have avoided the unfortunate situation where each sitter was rating his or her own reading against a foil that differed for each rater. Another plus would have been the opportunity to determine which readings had more general appeal independent of any specific information peculiar to a given sitter.

In his longer rebuttal to my critique which he posted on the Web (see his reference in his rebuttal) Schwartz claims he actually predicted that GD would successfully differentiate his own reading from the accompanying foil reading. The claim that this particular outcome was predicted does not square with the opening sentence of the report wherein the experimenters state, "This paper reports an unanticipated replication and extension. . . ."

I have already pointed out in my critique how Schwartz has an unusually liberal interpretation of "replication." Not only is the statistical and experimental evidence suspect, but the qualitative analysis of the actual reading for GD in the second experiment does not overlap in any important respect with the reading in the earlier experiment. In particular, none of the apparently striking examples of names, events, and places that are reported for the first reading are in the second reading. I agree with Schwartz that the outcome of this "double blind" experiment is consistent with "individual differences in sitter characteristics." However, borrowing from Schwartz’s propensity to resort to Occam’s Razor, I believe it is prudent to suggest a much more mundane explanation. We need only assume two very plausible and non-extraordinary assumptions to account for the results: 1) Luck: GD had a 50-50 chance of choosing the correct reading; 2) Rater bias: given that he has chosen the correct reading, he would show a strong response bias to give high marks to the chosen reading and low marks to the rejected one. Note that this is consistent with the qualitative evidence that I provided in my critique. However, note that the burden of proof is not upon the critic to show that this explanation is correct. Rather, the burden of proof should be on Schwartz to show, as the claimant, that he has ruled out this and other possible mundane explanations. This is what good experimental methodology, which is so far lacking in the afterlife experiments, is intended to accomplish.

Unfortunately, I do not have space to respond to other specifics of Schwartz’s rebuttal. In his rebuttal he attributes motives, preferences, and biases to me. These are based on assumption unsupported by facts. For example, he characterizes me as "reluctantly" agreeing that fraud is unlikely. In fact, I have no reluctance at all to make such an assertion. He attributes certain preferences to me that are, in some cases, just not true. He also is factually incorrect on some matters. He says that I was one of the group of cold readers who declared that I could, with training, duplicate what his mediums had accomplished in his laboratory. This is wrong. I deliberately refrained from such a commitment. My major point during the meeting with him on cold reading was that the determination of whether his mediums are using cold reading is a separate matter from the question of whether they were conveying any information of a paranormal nature. If he wanted to study the role of cold reading in the readings given by his mediums, that was an experimental goal that was separate from determining if his mediums are providing evidence for the survival of consciousness.

Nor did I conclude, contrary to Schwartz’s implication, that his mediums were using cold reading. I did observe — and I specifically emphasized that this was a subjective opinion — that I could see little difference between the utterings of his mediums and those of the typical psychic reader. I want to emphasize again, it is not for me, or other critics, to show that his mediums are using cold reading or some other ploys. The burden of proof is on Schwartz to show that he has convincingly eliminated such possibilities.

So far as I can tell, Schwartz has really not answered my criticisms. A close reading reveals that he does not deny the various failings I have divulged in his research. Instead, he defends the departures from proper experimental methodology on a number of grounds: 1) he and his colleagues were aware of these defects and actually admitted so in their reports (but such admissions do not somehow neutralize the defects); 2) there were practical reasons such as wanting to provide a more naturalistic context (but this does not excuse using inappropriate control comparisons, failing to correct for rater bias, using inappropriate probability and statistical computations, etc.); 3) some of the "defects" were deliberately included to check on certain questions (but this does not justify drawing strong conclusions); and 4) that taken in their totality the experiments somehow provide powerful evidence for anomalous communication even if the individual experiments are flawed (actually, repeatedly making similar mistakes from experiment to experiment compounds rather than compensates for the errors).

Despite the deficiencies in his experiments, Schwartz seems convinced that his mediums have provided, in some cases, specific and unique information including names, places, etc., that the critics cannot explain away. For one thing, these apparently specific items are much fuzzier than he believes. His examples are selected just because they appeared to contain such specifics. This raises the difficult question of how to actually assess how much of this is just coincidence. Furthermore, even the most specific and concrete match is problematical because practically no constraints are placed upon the sitter in finding a suitable match (e.g., it can be a dead or a living person; it can be someone close to the sitter or a mere acquaintance; etc.). No actual check is made as to how close the match actually is. My point here is that Schwartz really has provided us with nothing to explain. We do not know if he has produced anything worth taking seriously until he can convincingly demonstrate that he has obtained his data under methodologically appropriate conditions. Science demands this in the conventional fields of inquiry. We should demand no less from Schwartz.

Ray Hyman

Ray Hyman is professor emeritus of psychology, University of Oregon.