Home
:
Skeptical Inquirer magazine
:
Mar 2003 :
Buy back issue

Buy this back issue
Follow Up Reply
Ray Hyman
I cannot, of course, respond in detail within the allotted space to each of
Schwartz’s arguments. Instead, I will
comment on his major points and conclude with a general reaction to his
rebuttal.
1. "Hyman resorts to . . . selectively ignoring important information
that is inconsistent with his personal beliefs."
In preparing my critique of his research program, I not only read The
Afterlife Experiments carefully, I also scrutinized in detail every report of
his research that was available. It was not possible to discuss each separate
piece of information in my critique. I took each item into account, however, in
making my assessment of the research. I chose to focus my discussions on those
items that Schwartz and his colleagues had emphasized as the strongest outcomes
amongst their findings. I have refereed and reviewed research reports for more
than fifty years for many of the major scientific publications and for major
granting agencies. I applied the same standards to my evaluation of the
afterlife experiments that I have used in my other assessments.
2. ". . . Hyman failed to mention the important historical fact that
our mediumship research actually began with double-blind experimental
designs."
As his example he refers to his experiment with the mediums Susy Smith and
Laurie Campbell that "was completed almost a year before we conducted the
more naturalistic multi-medium/multi-sitter experiments involving John Edward,
Suzanne Northrop, George Anderson, Anne Gehman, and Laurie Campbell. The early
Smith-Campbell double-blind studies did not suffer from possible subtle visual
or auditory sensory leakage or rater bias -- and strong positive findings were
obtained."
This is a peculiar example to use as a model of a controlled, double-blind
experiment. The experiment involved having Susy Smith, designated as Medium
One, apparently contact four deceased persons: her own mother, William James,
Linda Russek’s father, and Schwartz’s father. Smith made a drawing for each of
these departed individuals supposedly with their input. She also made a
"control" drawing. Laurie Campbell, designated as Medium Two, was
then requested to independently attempt to contact these departed individuals
and, using the information obtained from them, to try to match each drawing to
the associated departed individual. Campbell attempted to contact the departed
entities during two sessions in the presence of three experimenters. Campbell
is described as being "blind" to personalities of the four departed
individuals. However Schwartz, who was not blind to the personalities of these
entities, was not only present during these sessions but actively trying to
convey this information (through "telepathy") to Campbell. This
unnecessary blunder compromises whatever blinding would have existed between
Medium Two and the personalities of the departed individuals. No psychic
investigator would be surprised if Laurie Campbell came up with some correct
information such as the gender and other descriptors of the departed
individuals under these conditions.
Another defect of this phase of the experiment is that no provisions were
made to use a systematic and objective method for assessing the accuracy of
Medium Two’s descriptions. The evaluation of the information for this stage of
the experiment was subjective.
During the sittings with Medium Two, all the experimenters were blind as to
which drawing was associated with which departed individual. (Although it is
plausible that one might be able to make some reasonable guesses, given the
characters of each of the departed individuals, which type of drawing would go
with each one.) Unfortunately, the experimenters then make another serious, and
completely unnecessary, blunder when it came time to see if Medium Two could
accurately match the drawings with the appropriate individual. The
experimenters brought Medium Two and Medium One together. Medium One then
displayed the drawings she had made to represent each individual. Medium Two
then attempted to match the drawings to the appropriate sources in the presence
of Medium One. Ironically, the experimenters openly admit that this could allow
clues about the correct matching through the "Clever Hans"
phenomenon. They dismiss this as possibility because Campbell was able to
correctly match only one of the five drawings to its appropriate source.
At this point in the experiment the report becomes especially
murky. Presumably, the experiment has failed. However, the experimenters
inexplicably have Medium Two try again to match the drawings to their
appropriate source. This second attempt is made after she is shown an explicit
summary of her comments about the pictures and the departed individuals.
Campbell correctly matches the five drawings (including the control) in this
second attempt. No reason is given for giving the medium two tries at matching
the drawings, nor do the experimenters tell us how they justify asking the
medium to redo her matching. Probably these and other questionable aspects of
the procedure are moot given that the possibility of blinding was compromised.
Schwartz and his colleagues, in their published paper, describe this as an
"exploratory study." The proceedings seem to have been improvised at
each stage. Certainly, no competent investigator would plan to unnecessarily
compromise experimental blinding at the two most critical points of the data
collection. Nor does it make sense to design an experiment wherein the medium
is given two chances at getting the matching correct. I simply was applying the
principle of charity in not discussing this botched experiment.
3. "In an exploratory double-blind long-distance mediumship experiment
. . . Hyman states 'because nothing significant was found, the results do not
warrant claiming a successful replication of previous findings.’ However, Hyman
minimizes the fact that the number of subjects in this exploratory experiment
was small (n=6). More importantly, Hyman fails to cite a(n) important
conclusion that we reached in the discussion: If the binary 66 percent figure
approximates (1) LC’s actual ability to conduct double-blind readings, coupled
with (2) the six sitters’ ability, on the average, to score transcripts
double-blind, the 66 percent figure would require only an n of 25 sitters to
reach statistical significance (e.g. < .01)."
This part of Schwartz’s rebuttal, like all the other parts, strikes me as
both bizarre and off the mark. First, we need to clear up some mistakes and/or
misunderstandings. Schwartz confuses the sample statistic with the population
(or hypothesized true value). Given twenty-five sitters and a sample outcome of
seventeen correct identifications (success rate of 68 percent) of their actual
readings (which, given the discrete nature of the binomial distribution is the
closest we can get to 66 percent correct) the one-tailed probability would be
.054 and not less than .01 as Schwartz claims. Regardless of the correct
probability value here, this has little to do with power. Schwartz is
hypothesizing that the true (population) proportion of correct binary choices
in this situation is close to the 67 percent (4 out of 6) that he observed in
his sample. If, indeed, this value is correct, then, given his use of a
one-tailed test and a significance level of .01, the probability of getting a
significant outcome with twenty-five sitters would be slightly more than
0.54. To have a reasonable power (say close to 90 percent) one would need over
100 sitters.
Schwartz appears to be begging the question here. He begins by observing
that four out of six sitters correctly identified which of two readings was
meant for them. Because of the small sample, this outcome is consistent with a
number of possibilities including the chance value of 50 percent. If he had
obtained the same proportion of correct hits with a larger sample, then it
would have been significant. However, since we cannot tell what the true
proportion is from a sample outcome based on only six cases, we have no basis
for predicting the outcome for a larger sample. His argument reduces to the
trivial one: If the true proportion is 67 percent then we will be able to get a
significant outcome with a larger sample. From his actual outcome, we can just
as well say: If the true proportion is 50 percent (and this, too, is consistent
with his data), then he will very likely not get a significant outcome with a
larger sample.
I find it difficult to understand why Schwartz considers this point worthy
of mention. Of course a binary outcome with only six trials has very low
sensitivity. However, he did not rely on this outcome. He used two other
measures, the number of dazzle shots and the hits and misses, which are clearly
much more sensitive. These also failed to provide overall significance. For
these measures (as well as for the actual choice of the relevant reading), the
overall sensitivity would have been greatly enhanced if each sitter actually
rated all six readings. In addition to greatly enhanced sensitivity, this would
have avoided the unfortunate situation where each sitter was rating his or her
own reading against a foil that differed for each rater. Another plus would
have been the opportunity to determine which readings had more general appeal
independent of any specific information peculiar to a given sitter.
In his longer rebuttal to my critique which he posted on the Web (see his
reference in his rebuttal) Schwartz claims he actually predicted that GD would
successfully differentiate his own reading from the accompanying foil
reading. The claim that this particular outcome was predicted does not square
with the opening sentence of the report wherein the experimenters state,
"This paper reports an unanticipated replication and
extension. . . ."
I have already pointed out in my critique how Schwartz has an unusually
liberal interpretation of "replication." Not only is the statistical
and experimental evidence suspect, but the qualitative analysis of the actual
reading for GD in the second experiment does not overlap in any important
respect with the reading in the earlier experiment. In particular, none of the
apparently striking examples of names, events, and places that are reported for
the first reading are in the second reading. I agree with Schwartz that the
outcome of this "double blind" experiment is consistent with
"individual differences in sitter characteristics." However,
borrowing from Schwartz’s propensity to resort to Occam’s Razor, I believe it
is prudent to suggest a much more mundane explanation. We need only assume two
very plausible and non-extraordinary assumptions to account for the results: 1)
Luck: GD had a 50-50 chance of choosing the correct reading; 2) Rater bias:
given that he has chosen the correct reading, he would show a strong response
bias to give high marks to the chosen reading and low marks to the rejected
one. Note that this is consistent with the qualitative evidence that I provided
in my critique. However, note that the burden of proof is not upon the critic
to show that this explanation is correct. Rather, the burden of proof should be
on Schwartz to show, as the claimant, that he has ruled out this and other
possible mundane explanations. This is what good experimental methodology,
which is so far lacking in the afterlife experiments, is intended to
accomplish.
Unfortunately, I do not have space to respond to other specifics of
Schwartz’s rebuttal. In his rebuttal he attributes motives, preferences, and
biases to me. These are based on assumption unsupported by facts. For example,
he characterizes me as "reluctantly" agreeing that fraud is
unlikely. In fact, I have no reluctance at all to make such an assertion. He
attributes certain preferences to me that are, in some cases, just not true. He
also is factually incorrect on some matters. He says that I was one of the
group of cold readers who declared that I could, with training, duplicate what
his mediums had accomplished in his laboratory. This is wrong. I deliberately
refrained from such a commitment. My major point during the meeting with him on
cold reading was that the determination of whether his mediums are using cold
reading is a separate matter from the question of whether they were conveying
any information of a paranormal nature. If he wanted to study the role of cold
reading in the readings given by his mediums, that was an experimental goal
that was separate from determining if his mediums are providing evidence for
the survival of consciousness.
Nor did I conclude, contrary to Schwartz’s implication, that his mediums
were using cold reading. I did observe -- and I specifically emphasized that
this was a subjective opinion -- that I could see little difference between the
utterings of his mediums and those of the typical psychic reader. I want to
emphasize again, it is not for me, or other critics, to show that his mediums
are using cold reading or some other ploys. The burden of proof is on Schwartz
to show that he has convincingly eliminated such possibilities.
So far as I can tell, Schwartz has really not answered my criticisms. A
close reading reveals that he does not deny the various failings I have
divulged in his research. Instead, he defends the departures from proper
experimental methodology on a number of grounds: 1) he and his colleagues were
aware of these defects and actually admitted so in their reports (but such
admissions do not somehow neutralize the defects); 2) there were practical
reasons such as wanting to provide a more naturalistic context (but this does
not excuse using inappropriate control comparisons, failing to correct for
rater bias, using inappropriate probability and statistical computations,
etc.); 3) some of the "defects" were deliberately included to check
on certain questions (but this does not justify drawing strong conclusions);
and 4) that taken in their totality the experiments somehow provide powerful
evidence for anomalous communication even if the individual experiments are
flawed (actually, repeatedly making similar mistakes from experiment to
experiment compounds rather than compensates for the errors).
Despite the deficiencies in his experiments, Schwartz seems convinced that
his mediums have provided, in some cases, specific and unique information
including names, places, etc., that the critics cannot explain away. For one
thing, these apparently specific items are much fuzzier than he believes. His
examples are selected just because they appeared to contain such
specifics. This raises the difficult question of how to actually assess how
much of this is just coincidence. Furthermore, even the most specific and
concrete match is problematical because practically no constraints are placed
upon the sitter in finding a suitable match (e.g., it can be a dead or a living
person; it can be someone close to the sitter or a mere acquaintance; etc.). No
actual check is made as to how close the match actually is. My point here is
that Schwartz really has provided us with nothing to explain. We do not know if
he has produced anything worth taking seriously until he can convincingly
demonstrate that he has obtained his data under methodologically appropriate
conditions. Science demands this in the conventional fields of inquiry. We
should demand no less from Schwartz.