<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:content="http://purl.org/rss/1.0/modules/content/">
    
    <channel>
    
    <title>Special Articles - Committee for Skeptical Inquiry</title>
    <link>http://www.csicop.org/</link>
    <description></description>
    <dc:language>en</dc:language>
    <dc:rights>Copyright 2013</dc:rights>
    <dc:date>2013-05-21T20:27:18+00:00</dc:date>    


    <item>
      <title>Martin Gardner: A Polymath to the Nth Power</title>
      <pubDate>Thu, 06 Jan 2011 20:15:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/martin_gardner_a_polymath_to_the_nth_power</link>
      <guid>http://www.csicop.org/si/show/martin_gardner_a_polymath_to_the_nth_power</guid>
      <description><![CDATA[
        



			<p class="intro">Martin 
not only wrote the seminal textbook for the modern skeptical movement, 
but he was also central to the actual founding of the movement.</p>

<p>Persi Diaconis 
phoned me on May 17, 2010. He told me he recently spoke with Martin 
Gardner by phone. Among other things, they had talked about me. He also 
said that Martin sounded fine and seemed to be as cognitively sharp 
as always. I had not spoken with Martin for quite a while. I made a 
note on my calendar to call him on Saturday, May 22. On that Saturday, 
I was about to call Martin when I got a phone call from Martin’s son, 
James. James told me that his father had passed away a few moments earlier.</p>
<p>  Many 
persons—too many—would seek mystical meaning in this “coincidence.” 
Martin, of course, devoted much of his life to teaching us how easily 
our minds create meaning out of post hoc juxtapositions of random events. 
Although he thought that most believers were impervious to reason, he 
persevered in his quest to show that most, if not all, paranormal claims 
cannot be supported by the evidence. He felt that his background as 
a magician enabled him to explain how many alleged psychic occurrences 
were due to trickery or mundane causes.</p>
<p>  I 
first met Martin in 1950 at the home of Bruce Elliot in Greenwich Village 
in New York. Bruce published a magazine on magic, The 
Phoenix, and wrote several 
books about magic. Every Saturday he hosted a gathering for magicians 
from New York or who happened to be in the vicinity. I was twenty-one 
years old when I was invited to attend. This was the first time I met 
many celebrity magicians such as Dai Vernon, Jay Marshall, and Martin 
Gardner. </p>
<p>  Martin 
and I became good friends. I knew him as a magician, a creator of magic 
effects, and a writer of excellent books on magic. In addition, we shared 
an interest in investigating and challenging paranormal claims. Soon 
after our first meeting, Martin published his classic In 
the Name of Science (1952). 
The book was re-issued in 1957, with some updating, under the title Fads and Fallacies in the 
Name of Science. It 
serves as the prototype for modern skeptical criticism.</p>
<p>  From 
1958 to 1961, while I was doing psychological research for General Electric, 
I lived in Hartsdale, about twenty-five miles from Martin Gardner’s 
home on Euclid Avenue in Hastings-on-the- Hudson, New York. During this 
period my wife and I would get together with Martin and his wife, Charlotte, 
for dinner. I also was able to visit and talk with him about our mutual 
interests.</p>
<p>  When 
I moved to Oregon in 1961 to work at the University of Oregon, Martin 
phoned Jerry Andrus and told him I had moved into his neighborhood. 
He suggested that Jerry contact me. Jerry did and we became close friends 
until Jerry’s unfortunate death in August 2007. Martin and Jerry are 
the two most impressive individuals I have ever known. Both were essentially 
self-taught in magic, philosophy, science, and other areas.</p>
<p>  You 
can gain some insights into the range and impact of Martin’s productive 
life by reading the many obituaries that have appeared online. In the 
remaining few lines at my disposal, I will discuss only a couple of 
my many personal stories involving this Renaissance man.</p>
<p>  I 
have always been interested in how productive individuals organize their 
lives and manage their data. Soon after Martin’s operation for cataracts, 
I asked him how he managed to read and review so many books while continuing 
his prodigious literary output and maintaining a colossal correspondence. 
Martin told me that, in most cases, he did not actually read the books 
he reviewed. Instead, he simply scanned the index, which provided all 
the information he needed for his review.</p>
<p>  I 
was incredulous at first, but on second thought I realized that this 
was consistent with my research on information theory and redundancy. 
I had already discovered that I could scan the indices of textbooks 
in statistics, perception, and cognitive psychology and know all I needed 
to know about how the book handled its topic. For example, by noting 
the topics the author listed and, more importantly, the ones she did 
not, I could confidently guess her stance on various issues. This was 
because I already knew these areas quite well. Martin’s ability to 
exploit redundancy induced me to conduct research on speed reading. 
I discovered that graduates from speed reading classes who claim to 
be reading 1,000 or more words per minute are actually skipping over 
large chunks of text by exploiting redundancy. When they are given text 
to read from domains with which they are unfamiliar, their reading drops 
to the same speed as those who have never taken a special course.</p>
<p>  Martin 
not only wrote the seminal textbook for the modern skeptical movement, 
but he was also central to the actual founding of the movement. In December 
1972, I was sent by the Defense Department to observe Uri Geller and 
the researchers at the Stanford Research Institute (SRI). My report, 
which I shared with Martin, made it clear that nothing that this alleged 
psychic did had anything to do with the paranormal. Soon after that, 
Randi observed Geller at the offices of Time magazine in New York. He, too, 
saw through Geller’s pretensions.</p>
<p>  In 
1973, Randi phoned me from Portland, Oregon. He was touring with Alice 
Cooper and asked me to travel from  Eugene to Portland to meet 
him. While I was in Portland, Randi reviewed our experiences with Geller 
and suggested that we get together with Martin Gardner and form a group 
to counter false claims of the paranormal. He suggested we call the 
group SIR (Sanity in Research), which evoked the acronym SRI.</p>
<p>  Randi 
and I soon afterwards spent a day with Martin at his home in Hastings-on-Hudson 
preparing a detailed document of the goals and hopes for our new group. 
In 1976, SIR joined forces with Paul Kurtz, who was already publishing 
skeptical articles in The Humanist, which he edited at that time. The 
resulting organization became known as CSICOP (now CSI), and the contemporary 
skeptical movement was born.</p>




      
      ]]></description>
    </item>

    <item>
      <title>The Eighth Gathering for (Martin) Gardner</title>
      <pubDate>Mon, 01 Sep 2008 13:19:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/eighth_gathering_for_martin_gardner</link>
      <guid>http://www.csicop.org/si/show/eighth_gathering_for_martin_gardner</guid>
      <description><![CDATA[
        



			
<p class="intro">
Who attends these gatherings? What takes place? How do they serve as a tribute to this remarkable man?</p>
<p>The Eighth Gathering for Gardner (G4G8) took place at the Ritz-Carlton Hotel in Atlanta, Georgia, from March 26 through March 30, 2008. A Gathering for Gardner occurs in Atlanta every two years, celebrating the many facets of the polymath Martin Gardner. Martin, who rarely attends public meetings, attended the first two gatherings. Although he has not come to the latter six meetings, the organizers make sure that he receives a full report of all the presentations.</p>
<p>Attendance is by invitation only. To receive an invitation, a person must have some connection with Martin and share one or more of his myriad interests. At the earlier gatherings, most of the attendees were chosen by Martin himself. Since then, the criteria have become more inclusive. People are invited if their activities have been influenced or inspired by Martin and his writings. Most of the attendees have not met Martin personally, but many of them have communicated with him. Among many other amazing qualities, Martin somehow manages to engage in extended correspondence with hundreds of admiring disciples.</p>
<p>I first met Martin in 1950 at a &ldquo;sodality&rdquo; at Bruce Elliot&rsquo;s apartment in Greenwich Village, New York. Bruce Elliot was a magician who held these sodalities every Friday night. Every major New York magician usually attended. I was a young college student in transition from finishing my undergraduate degree and beginning my graduate work at The Johns Hopkins University in Baltimore. Because I had recently published some of my creations in a magic magazine, I got the attention of some of the New York magicians. They invited me to attend Bruce Elliot&rsquo;s sodality whenever I came to New York. I made the trip to New York just to attend.</p>
<p>At my first sodality I was dazzled by the gathering of famous magicians whom I knew by reputation but had never met in person. I met Jay Marshall, Dai Vernon, and other luminaries, including Martin. Because Martin and I shared interests, not only in magic but also in skeptically evaluating paranormal and other pseudoscientific claims, we became good friends. We began a correspondence that still continues. Later, when I worked for General Electric Company at its New York headquarters from 1958 through 1961, Martin and I were neighbors. We lived only a few miles from each other and frequently had dinner together.</p>
<p>Martin&rsquo;s mastery of magic; philosophy of science; recreational mathematics; optical illusions; the literary subtleties of L. Frank Baum, Lewis Carroll, and Arthur Conan Doyle; debunking pseudoscience; theology; and other topics have always impressed me. However, having attended all of the eight Gatherings for Gardner, I have become even more awed with the variety of subjects to which he contributed and for which many disciples credit him for inspiring. It is beyond my comprehension how one individual, without benefit of computers or assistants, can consistently write books, articles, reviews, and commentaries on so many different topics and at the same time maintain continuing correspondence with so many individuals around the world. And just as amazing is that the quality of the content of his writing and correspondence is consistently of the highest caliber.</p>
<p>So who attends these gatherings? What takes place? How do they serve as a tribute to this remarkable man? At the most recent gathering, registration began on Wednesday evening. Registrants assembled in many small groups. In some of the groups, individuals displayed new puzzles or challenged one another with puzzles. In other groups, magicians demonstrated new tricks. Discussions on a variety of topics were held. And, of course, participants reminisced about some of the previous attendees who have sadly passed away, such as Jerry Andrus and Jay Marshall. Very soon, news spread that Lennart Green, the dazzling magician and puzzle devotee who has been a fixture at all previous gatherings, would not be able to make it to this one. He had broken his hip on his way to the airport in Sweden. Although unable to attend, word reached us later that he was doing well.</p>
<p>Three hundred people registered for G4G8. In addition to participants from the United States, attendees came from England, Scotland, the Netherlands, Israel, Serbia, Italy, Japan, China, Denmark, Germany, Portugal, Spain, Hungary, and Canada. I suspect that several other countries might also have been represented.</p>
<p>The formal conference began at 8:30 Thursday morning, and later that evening we had dinner at the Sun Dial Restaurant, which sits atop the tower of the Westin Peachtree Hotel. The restaurant continually rotates so that each diner obtains a 360-degree panoramic view of Atlanta and its surroundings. We could see many boarded-up windows and other traces of the recent tornado that had swept through downtown Atlanta. Several magicians went from table to table entertaining the attendees. These included Dan Garrett, Thomas Fraps and Pit Hartling from Germany, and several others.</p>
<p>On Friday afternoon, chartered buses took the participants to Tom Rodger&rsquo;s unique house for a Japanese lunch and dinner. Tom, who is the primary coordinator of the gathering, hired Japanese architects to construct his house in authentic Japanese style. His environs were landscaped with Japanese gardens, ponds, waterfalls, a tea house, and the like. Visitors were treated to Japanese Taiko Drumming, the erection of a group sculpture, a puzzle hunt, and a special performance by a dancer from Japan, as well as street magic.</p>
<div class="image left">
<img src="/uploads/images/si/Hyman-Gardner2.jpg" alt="Compass Points, a fifty-two-inch scupture designed by George W. Hart, is composed of sixty stainless-steel pieces, 120 brackets, and 510 nuts and bolts." />
<p>Compass Points, a fifty-two-inch scupture designed by George W. Hart, is composed of sixty stainless-steel pieces, 120 brackets, and 510 nuts and bolts.</p>
</div>
<p>The dinner on Saturday at the Ritz-Carlton featured a tribute to the late Jerry Andrus as well as a magic show performed by Dan Garrett (filling in for Lennart Green), Pit Hartling, Thomas Fraps, and Mark Mitton. The mathematician Arthur Benjamin performed his very impressive and entertaining lightning calculator act.</p>
<p>Throughout the gathering, an exhibition room and sales room were available for participants. The exhibition room displayed rare puzzles, puzzles made out of special woods or precious metals, kinetic art, optical illusions, and other interesting items. The sales room provided the opportunity to buy a variety of puzzles, books, and gadgets related to the various themes of the gathering.</p>
<p>The major focus of the gathering was the presentations&mdash;ranging in length from ten to thirty minutes (each speaker who finished before his or her allotted time was awarded one dollar). More than ninety presentations were given. At previous gatherings, the range of topics&mdash;all related to Martin Gardner&rsquo;s interests&mdash;included the construction of mazes, juggling, joggling (juggling while racing on foot), analysis of ancient puzzles, introduction of new puzzles, knot theory, the history of magic, mathematical magic, critiques of paranormal claims, paradoxes, new mathematical proofs, kinetic art, new and old optical illusions, various themes based on Escher&rsquo;s art and geometry, specially designed Frisbees, demonstration of a model aircraft kept aloft by the slight breeze created by a person walking, and other delights.</p>
<p>This year&rsquo;s contributions were no less varied. Several presentations focused on different ways to make mathematics fun for students&mdash;an endeavor that is dear to Martin&rsquo;s heart. At one extreme, we had some rather arcane presentations of new mathematical proofs, such as one that dealt with a new computerized proof. Thomas Banchoff showed an excerpt of <cite>Flatland: The Movie</cite>. Several presenters provided proofs for various puzzles. We had talks on soap bubbles, the Knight&rsquo;s Tour, puzzle food, Sudoku, and puzzle locks from India.</p>
<p>Adam Atkinson from the United Kingdom discussed &ldquo;Applications of Vampires in Law and Medicine.&rdquo; He used what we know about vampires from shows such as <cite>Buffy</cite> and <cite>Ultraviolet</cite> to suggest various ways they could be used to solve legal and medical problems. George Bohigian explained that in ancient times the ability to recognize and identify constellations and celestial bodies was used to test vision. The ability to detect the separation between the two stars that make up the double star in the Big Dipper was a common test. Bohigian showed how this ability correlated with the 20/20 line in the current Snellen visual acuity test. Arthur Benjamin performed and then taught us a wonderful card trick that depends on a subtle mathematical principle. And Michael Ecker regaled us with a survey of fun paradoxes such as &ldquo;going back in time and killing both of your parents before they met.&rdquo;</p>
<div class="image right">
<img src="/uploads/images/si/Hyman-Gardner.jpg" alt="Participants of the Gathering for Gardner pose nect to the sculpture they helped assemble." />
<p>Participants of the Gathering for Gardner pose nect to the sculpture they helped assemble.</p>
</div>
<p>Peter Lamont, from the University of Edinburgh, discussed &ldquo;The Rise of the Indian Rope Trick.&rdquo; His presentation was an interesting addendum to his book <cite>The Rise of the Indian Rope Trick: the Biography of a Legend</cite> (reviewed in SI, November/December 2005). According to historians, the trick was witnessed by Marco Polo, and the Viceroy of India offered in 1875 a large reward for a single performance of the trick. Lamont&rsquo;s careful research revealed that neither of these statements is true. The Indian Rope Trick, the stimulus for endless debates and speculation, was created as a journalistic hoax by John Wilkie that was published in the <cite>Chicago Daily Tribune</cite> on August 8, 1890. It was quickly picked up by newspapers around the world.</p>
<p>I was especially intrigued by the physicist David Finkelstein&rsquo;s &ldquo;Decoding D&uuml;rer.&rdquo; When D&uuml;rer created his famous engraving <cite>Melencolia</cite> in 1514, he was one of the most gifted artists of his time. He was also, like Leonardo Da Vinci, talented in many other fields, such as technology, philosophy, and the like. Many, including myself, have found D&uuml;rer&rsquo;s engraving fascinating because of its many occult, Biblical, and other symbols. Among other images in the engraving is a 4 x 4 magic square. Many art and other scholars have speculated about the message they think D&uuml;rer was trying to convey. Finkelstein argues that the engraving contains a &ldquo;double message.&rdquo;</p>
<p>&ldquo;The overt message . . . is that absolute truth and beauty are inaccessible to the artist/scientist, causing the melancholy of the legend. The covert message, however, is that Natural Philosophy, Gateway I to Heaven, is superior to Mathematical and Theological Philosophy. The innocuous admission of the limitations of science veils a manifesto of the impending scientific revolution that would otherwise have been a capital offense,&rdquo; said Finkelstein. His analysis is compelling, but I suspect many scholars will not buy it.</p>
<p>As you can surmise from this small sample of the content of the presentations, the topics not only ranged widely but all dealt with challenging puzzles, paradoxes, mysteries, and other themes to which Martin Gardner has contributed to or inspired others to contribute to. Each of the presenters gratefully acknowledged that they were inspired by Martin in pursuing these themes further.</p>




      
      ]]></description>
    </item>

    <item>
      <title>Anomalous Cognition? A Second Perspective</title>
      <pubDate>Tue, 01 Jul 2008 13:20:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/anomalous_cognition_a_second_perspective</link>
      <guid>http://www.csicop.org/si/show/anomalous_cognition_a_second_perspective</guid>
      <description><![CDATA[
        



			<p class="intro">Challenged by findings of leading parapsychologists that the evidence for anomalous cognition was not repeatable and has otherwise failed to meet scientific standards, participants in a conference on the subject simply ignored the challenge.</p>
<p>In the previous article, Amir Raz provides an interesting and insightful account of the Meeting of the Minds (MoM) conference. His report offers the viewpoint of a young neuroscientist who is newly encountering the world of parapsychology and its claims. I thought I might complement his description with a few comments from my perspective. I have been a critic&mdash;I hope a constructive one&mdash;of parapsychological claims for fifty years. Together, our two accounts can better convey some of the issues stemming from this meeting.</p>
<p>I was invited to speak as a representative of the skeptical community. As a presenter, I felt my responsibility was to directly address the issues in the statement of the meeting agenda and goals. These issues, as spelled out in advance by the organizers, were:</p>
<ul>
<li>to bring together a distinguished set of researchers to consider the state of the evidence for anomalous cognition</li>
<li>to discuss the methodological and theoretical challenges presented by such phenomena</li>
<li>to address sociological barriers that have constrained academic discussion of this topic</li>
<li>to examine the process and potential impact of the meeting.</li>
</ul>
<p>My attention was captured by the following two quotations from the agenda:</p>
<blockquote>
<p>&hellip;meta-analyses of several classes of experiments published in peer-reviewed journals suggest that some effects, while small in magnitude, are highly repeatable&hellip;.</p>
<p>Because of these and other reasons once commonly used to dismiss contemplation of anomalous cognition are becoming increasingly debatable, we believe the time has come to examine the taboo that has constrained serious scientific consideration of this evidence.</p>
</blockquote>
<p>The obvious subtext of this meeting statement can be summarized in three propositions:</p>
<ol>
<li>Psi (ESP and Psychokinesis) is real.</li>
<li>The evidence for psi is consistent and independently replicable.</li>
<li>The time has come for the scientific community to seriously consider the claims for psi.</li>
</ol>
<p>My presentation directly challenged each of these propositions. I did so by using data and arguments provided by leading figures in parapsychology. I began by considering parapsychological claims of having demonstrated an &ldquo;anomaly.&rdquo; Since the beginnings of modern science, the scientific community has repeatedly been confronted with claims of anomalies. Some, such as meteorites, discrepancies in the orbit of Uranus, discrepancies in the advancement of Mercury&rsquo;s perihelion, X-rays, and continental drift, eventually were shown not to be the result of mistaken observations or flawed methodology. Furthermore, they were supported by evidence that was consistent and independently verifiable. Given these circumstances, the claims were accepted, and scientific theories were appropriately modified to accommodate them.</p>
<p>Other claims, such as those for Martian canals, N-Rays, polywater, mitogenetic radiation, the &ldquo;discovery&rdquo; of the planet Vulcan, and cold fusion, were rejected because the evidence was inconsistent and could not be independently replicated. Interestingly, some of the defenders of these claims argued that the inconsistencies and failures of replication were properties of the claimed phenomena. The parapsychologists, who now admit that their evidence cannot be replicated, also argue that this failure to replicate is one of the unusual properties of psi!</p>
<p>I first addressed the apparent inconsistencies in parapsychological claims about the status of the evidence. Some parapsychologists, such as Jessica Utts and Dean Radin, repeatedly declare that the evidence for anomalous cognition is compelling and meets the most rigorous scientific standards of acceptability. Others such as Dick Bierman, Walter Lucadou, J.E. Kennedy, and Robert Jahn, openly admit that the evidence for psi is inconsistent, irreproducible, and fails to meet acceptable scientific standards. I quoted Radin&rsquo;s statement in his 1997 book <cite>The Conscious Universe</cite> that &ldquo;we are forced to conclude that when psi research is judged by the same standards as any other scientific discipline then the results are as consistent as those observed in the hardest of the hard sciences&rdquo; (italics in the original). I also quoted from Jessica Utts&rsquo; 1995 Stargate report that, &ldquo;Using the standards applied to any other area of science, it is concluded that psychic functioning has been established.&rdquo; Both Radin and Utts were present during my presentation. Neither took this opportunity to retract these claims. I can only assume that they still stand behind these strong assertions.</p>
<p>I was hoping Radin and Utts would provide an explanation of how they can maintain such a position in the face of mounting evidence and arguments within the parapsychological community that the reality of psi cannot be justified according to accepted scientific standards. Dick Bierman, the Dutch parapsychologist, for example, carefully re-analyzed major meta-analyses of parapsychological research on mentally influencing the fall of dice, the Ganzfeld psi experiments, precognition with ESP cards, psychokinetic influence on RNGs, and mind over matter in biological systems (Bierman 2000). He looked especially at the relationship between effect size and the date when the studies in each of these research areas was conducted.</p>
<p>Bierman fitted a regression line to the data in each area. In all cases, the regression line revealed a consistent trend for the effect sizes to decrease with time and to eventually reach zero.<sup>1</sup> In addition to these linear trends from the meta-analyses, Bierman and other parapsychologists point to dramatic failures of direct attempts to replicate major parapsychological findings. These particular failed replications cannot be dismissed as being due to low power, which is the excuse commonly offered by Utts, Radin, and a few others. Bierman concluded, &ldquo;In spite of the fact that the evidence is very strong, these correlations are difficult to replicate.&rdquo;</p>
<p>Other major parapsychologists also agree with Bierman&rsquo;s conclusions. Lucadou put it this way, &ldquo;The usual classical criteria for scientific evidence are effect oriented. Experimental results of parapsychology seem unable to fulfill these requirements. One gets the impression that an erosion of evidence rather than an accumulation of evidence is taking place in parapsychology&rdquo; (Lucadou 2001). Kennedy put it this way, &ldquo;Many parapsychological writers have suggested that psi may be capricious or actively evasive. The evidence for this includes the unpredictable, significant reversal of direction for psi effects, the loss of intended psi effects while unintended secondary or internal effects occur, and the pervasive declines in effect for participants, experimenters, and lines of research. Also, attempts to apply psi typically result in a few very impressive cases among a much larger number of unsuccessful results. The term unsustainable is applicable because psi is sometime impressive and reliable, but then becomes actively evasive&rdquo; (Kennedy 2003).</p>
<p>As the preceding quotations indicate, many leading parapsychologists acknowledge that the existence of psi cannot be demonstrated with evidence that meets currently accepted scientific standards. Most critically, these standards include the essential ingredient that the evidence has to be capable of being reliably reproduced by independent investigators. Lacking this basic ingredient, a claim cannot be considered seriously by the scientific community. Above all, it is this basic standard that has made contemporary science the preeminent&mdash;and the only&mdash;method for gaining trustworthy knowledge. The parapsychologists who admit that the evidence for psi cannot achieve this standard, however, still believe that psi exists and, in most cases, want the scientific community to take their claim seriously. How can they justify such a position?</p>
<p>My presentation dealt directly with this issue. Again, I was hoping for some sort of explanation or justification of this demand for special treatment. It seems to me that the parapsychologists, especially the organizers of the MoM conference, were pleading for a special exemption from the standard scientific criteria. They want the scientific community to accept their claims without having to pass the usual tests. I pointed out that this was not going to happen. N-Rays, Martian canals, and other claims of anomaly that did not pass these tests occupy the junk heap of discarded science. Why should claims of psi be treated any differently?</p>
<p>Finally, I speculated about what might happen if the parapsychologists, especially the organizers of the conference, achieved their goal of getting the scientific community to take their claims seriously. For example, what if the National Academy of Sciences appointed a committee to examine in detail the current evidence for psi? The committee members would obviously find the same results that the parapsychological community has already uncovered. The effect size for psi, in every major research program in parapsychology, declines over time and reaches zero. Major attempts to directly replicate a key parapsychological finding, even when possessing adequate power, fail. The scientific community, when apprised of these findings, would dismiss the parapsychological claims with even more force than they now do. Rather than earning the respect that it seeks, parapsychology&rsquo;s reputation as a serious research program would suffer greatly.</p>
<p>I ended with a quotation from Martin Johnson, a respected parapsychologist of a previous generation:</p>
<blockquote>
<p>I must confess that I have some difficulties in understanding the logic of some parapsychologists when they proclaim the standpoint, that findings within our field have wide-ranging consequences for science in general, and especially for our world picture. It is often implied that the research findings within our field constitute a death blow to materialism. I am puzzled by this claim, since I thought that few people were really so unsophisticated as to mistake our concepts for reality. . . . I believe that we should not make extravagant and, as I see it, unwarranted claims about the wide-ranging consequences of our scattered, undigested, indeed rather &lsquo;soft&rsquo; facts, if we can speak at all about facts within our field. I firmly believe that wide-ranging interpretations based on such scanty data tend to give us, and with some justification, a bad reputation among our colleagues within the more established fields of science.</p>
</blockquote>
<p>Johnson wrote those words over thirty years ago. However, they apply with even more force today. In spite of the many new directions that parapsychology has taken since 1976, the only consistent feature of parapsychological evidence is its inconsistency.</p>
<p>As I indicated, I took the organizer&rsquo;s agenda seriously. My presentation dealt directly with each of the issues raised by the organizers: the claim that psi was real and supported by replicable evidence; the implication that the scientific community was unfairly refusing to accept parapsychological claims; and the consequences of having the scientific community seriously consider such claims. I pointed to the apparent contradictions in the claims of the organizers that the evidence for psi was convincing and scientifically warranted and the admission by many contemporary parapsychologists that the evidence for psi does not and cannot meet scientific criteria. I suggested that if, indeed, the organizers succeed in their quest to gain the attention of the scientific community, the result would be a serious blow to the status of parapsychology.</p>
<p>I was expecting serious consideration of my specific challenges to the assumptions of the agenda. I also was expecting that the other presenters and discussants at the conference would deal directly with these issues. I wish I could relay to you how the presenters and the conference attendees responded. Unfortunately, there was a disconnect between the stated agenda and goals and what actually took place. As far as I could tell, I was the only presenter to directly address the issues spelled out in the meeting statement. The majority of presentations were irrelevant to the conference goals. Indeed, several appeared to actively distract from the goals.</p>
<p>The conference organizers and the parapsychologists in attendance failed to respond or even discuss my challenges to the claims in the meeting statement. No one seemed bothered by the contradictions inherent in the claim that the evidence for psi is rock solid or the admissions within the parapsychological community that the evidence for psi is capricious and irreproducible. I have no idea why the conference failed to follow its stated agenda. Perhaps I misunderstood. Maybe the agenda was advanced not as something for discussion but rather as a set of &ldquo;truths&rdquo; that were to be presupposed by the participants. I do not know.</p>
<p>What I do know is that although I attempted to put the stated goals of the agenda on the table for debate and discussion, no one seemed eager to do so. What I also know is that the conference agenda implies two requests that the parapsychological community is putting to the scientific community. Both of them are radical and unrealistic. The first request is that the scientific community accept the claim that psi is real. The second is that they do so by exempting parapsychologists from the requirement that they provide evidence according to acceptable scientific standards. Both requests amount to changing science as we know it. Obviously, the scientific community will not, and should not, acquiesce to these requests.</p>
<h2>Note</h2>
<ol>
<li>In two cases, Bierman suggests that after reaching zero, the effect size shows signs of increasing again. However, this is questionable and appears to be an artifact of fitting a polynomial to data where the zero effect size has existed for a while. Under such circumstances, a second-degree polynomial will better fit the data than will a linear regression line.</li>
</ol>
<h2>References</h2>
<ul>
<li>Bierman, Dick J. 2000. On the nature of anomalous phenomena: Another reality between the world of subjective consciousness and the objective world of physics? P. Van Loocke, ed., <cite>The Physical Nature of Consciousness</cite>. Benjamins Publishing: New York. pp. 269&ndash;292.g</li>
<li>Johnson, Martin. 1976. Parapsychology and education. B. Shapin and L. Colby eds. <cite>Education in Parapsychology</cite> 130&ndash;15. New York: Parapsychology Foundation.g</li>
<li>Kennedy, J.E. 2003. The capricious, actively evasive, unsustainable nature of psi: A summary and hypotheses. <cite>The Journal of Parapsychology</cite> 67, 53&ndash;74.g</li>
<li>Lucadou, Walter. 2001. Hans in luck: The currency of evidence in parapsychology, <cite>Journal of Parapsychology</cite> 65, p. 3.</li>
</ul>




      
      ]]></description>
    </item>

    <item>
      <title>Statistics and the Test of Natasha</title>
      <pubDate>Tue, 07 Jun 2005 09:02:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/specialarticles/show/statistics_and_the_test_of_natasha</link>
      <guid>http://www.csicop.org/specialarticles/show/statistics_and_the_test_of_natasha</guid>
      <description><![CDATA[
        



			<p>This commentary is a supplement to <a href="/specialarticles/show/testing_natasha/">my previous report</a> on CSICOP&rsquo;s test of Natasha Demkina&rsquo;s claim (Hyman, 2005). It is also a response to the many criticisms of our test. Some criticisms were reactions to the airing of the Discovery Channel&rsquo;s <cite>The Girl with the X-ray Eyes</cite>. At the time of this writing, this television program had aired a few times in Europe and Asia, but not in the United States. Nevertheless, it generated a debate in the media and the internet. The reports of the test by Andrew Skolnick (2005) and me further provoked several emails and letters. Many focused on the statistical aspects of our test.</p>
<p>The initial drafts of my report on testing Natasha described the reasoning behind the statistics. My colleagues persuaded me to omit this statistical discussion. They said that such details would confuse the reader. Ironically, the majority of the critics focus on the statistics. The commentators make the following assertions: 1) our test set the critical value for &ldquo;success&rdquo; unreasonably high; 2) our test lacked sufficient power to detect a real effect; 3) Natasha&rsquo;s four correct matches out of seven should have sufficed to reject the null hypothesis (i.e., to justify a conclusion that the outcome was not due to chance).</p>
<p>Before I respond to these statistical issues, I will put our test in its proper context. A few critics said that we should never have conducted the test. The conditions were such that the test was bound to be flawed. I sympathize with this viewpoint. However, I would emphasize the following points:</p>
<ol>
<li>When I was asked to help design and conduct the test, CSICOP had already accepted the producer&rsquo;s invitation to supervise the test. Cancelling the test was not an option. 
  </li><li>My colleagues, Andrew Skolnick and Richard Wiseman, agreed with me that, given the circumstances, we could not conduct a &ldquo;definitive&rdquo; test of Natasha&rsquo;s claim.</li>
<li>Instead, we could use the test for a preliminary screening. This screening could tell us whether we should continue investigating Natasha&rsquo;s claim. If the outcome showed that continuing was worthwhile then we would have gone to the next step. We would have advocated investigating Natasha&rsquo;s claim with more sophisticated and costly procedures.</li>
<li>Although Natasha apparently can &ldquo;see&rdquo; through many kinds of cloth (her subjects are fully clothed when she diagnoses them) she would not let us test her with an opaque screen between her and the subject. Within the parameters of our test, the best we could do would be to reduce possible external clues. We could do this by selecting subjects who were similar in age, apparent health, and other potential external clues about their internal state. In addition, we could try to keep the subjects from behaving in ways that might provide indications about their conditions. Such precautions, however, would not exclude the possibility of her picking up external, possibly subtle, indications of the subjects&rsquo; internal states.</li>
<li>Ideally, our test would allow us to decide whether the number of correct matches excluded the possibility both of chance and of the use of external clues. We recognized that this was not possible in our situation. So we devised our test to decide between two alternatives: 1) Natasha could get a sufficient number correct to make it worthwhile to pursue her claim with more adequate procedures; 2) Natasha could not get a sufficient number correct even in a task in which we stacked the deck in her favor. In the latter case, we would have no reason to pursue her claim further.</li>
<li>We made it clear to the producer what our test could and could not achieve. We expected that the program would make this clear to the viewers. When the producer interviewed me after the test, I emphasized that the test was simply a preliminary probe to see if Natasha&rsquo;s claim is worth additional investigation. To our dismay, the television program did not include our warnings about the limitations of our test. The creators of the program did many things well. By failing to include our cautions about the limitations of the test, however, they added to the confusions and misunderstandings.</li>
<li>The critics of our test, for the most part, claim that the outcome of our test does provide justification for taking her claim seriously. These critics believe that getting four correct matches in our test is impressive. So I will explain why this is not so.</li>
</ol>
<h2>Establishing the Test Criterion </h2>
<p>Some critics said that we had used the wrong probabilities. These critics apparently failed to realize that our test involved the matching problem. Consequently, their suggested alternatives were simply wrong. My commentary deals with those criticisms that recognized that our test involved the matching problem. The calculations of the probabilities for the matching problem are tricky. My main sources were Feller (1950) and Mosteller (1965).<sup><a href="#notes">1</a></sup> Richard Wiseman confirmed the results of my calculations with tables of probabilities for the matching problem that he found in the <cite>Journal of the Society for Psychical Research.</cite></p>
<p>The most difficult problem for me was finding a way to specify the matching distribution for the case where the expected mean was other than one. In our case, I wanted the distribution for the case where the expected value would be five. Fortunately, Persi Diaconis and Susan Holmes, both professors in the Department of Statistics at Stanford University, came to my rescue. They calculated the necessary Bayes factor that I required.</p>
<p>Conducting a good test involves two phases. The first is the <em>design</em> of the test. The second is the <em>execution</em> of the test. Although we had only a month to prepare, Andrew Skolnick, Richard Wiseman and I thought very carefully about the design. We agreed that the matching procedure was appropriate. Initially, we planned to use five subjects. Circumstances made it much harder to use a larger number. The requirements of the television show would not allow us to have Natasha evaluate many subjects. The logistics of finding a suitable set of subjects further restricted the number we could use. We also wanted to avoid the possibility of overworking Natasha.</p>
<p>However, I quickly realized that five subjects would not be enough to provide a reasonable opportunity for Natasha to display her powers. I insisted that we needed at least seven subjects for our test to have sufficient power. This placed even more demands upon Andrew and the people who were helping him to obtain subjects. Getting this number of suitable subjects within the available time frame required heroic efforts. It also clashed with our desire to find a group that differed in internal conditions, yet looked similar in outward appearance.</p>
<p>Andrew, Richard and I each independently agreed that the critical value for our test should be five correct matches. Interestingly, we each chose the same critical value, but for different reasons. Given the limitations of our test and the nature of Natasha&rsquo;s claim, Richard wanted the outcome to be large before he would recommend continued investigation. Andrew was concerned with the practical implications. He wanted Natasha&rsquo;s accuracy to be sufficiently high to justify her making medical diagnoses. I arrived at the critical value of five using Bayesian considerations which I will describe later in this commentary.</p>
<p>I believe the design was adequate. The <em>execution</em> of the test was less so. We had to rely on third parties to arrange crucial aspects of the test. These included finding a location for the test and assembling a suitable group of subjects. Despite the best efforts of our volunteers, we encountered last minute problems. Two of the subjects dropped out of the test a few hours before it was to begin. Adding two subjects at the last moment aggravated our problem of getting a group that was homogenous in outward appearance.</p>
<h2>The Matching Test </h2>
<p>The table below lists the probabilities for each possible outcome of our test. These outcomes range from 0 to 7. Note that six correct matches cannot occur in this test. I will leave it to the reader to figure out why. For our test, the third column is the most relevant. It gives the probability, for a given outcome, of getting that number <em>or more</em> correct if the matches are due to random guessing. We chose five as the critical value. The table indicates that the probability of getting five <em>or more</em> correct just by guessing is.0044. This is approximately one out of 227. Some of our critics presumably consider this overly stringent.</p>
<p>Natasha got four correct matches. The table shows that the probability of getting four or more correct is .0183. This is roughly two out of 100. Our critics argue that the probability of getting four or more correct is sufficiently low that we should have rejected the null hypothesis. These critics probably are operating under one or more of the following assumptions: 1) an outcome that has a probability less than .05 of occurring just by chance should be declared &ldquo;significant&rdquo; (This a conventional procedure within the null hypothesis testing framework); 2) the probabilities listed in the last column of this table are the appropriate probabilities for judging if Natasha&rsquo;s claim is true; and 3) the outcomes are due either to chance or to Natasha&rsquo;s X-ray vision. The third assumption, as I have already discussed, is questionable. So are the other two.</p>
<div class="image center">
<table class="zebra">
<tr>
<th>Number of Correct Matches (y) </th>
<th>Probability of y </th>
<th>Probability of y <em>or more</em> correct matches </th>
</tr>
<tr>
<td>0 </td>
<td>.3679</td>
<td>1.0000</td>
</tr>
<tr>
<td>1 </td>
<td>.3681</td>
<td>.6322</td>
</tr>
<tr>
<td>2 </td>
<td>.1833</td>
<td>.2641</td>
</tr>
<tr>
<td>3 </td>
<td>.0625</td>
<td>.0808</td>
</tr>
<tr>
<td>4 </td>
<td>.0139</td>
<td>.0183</td>
</tr>
<tr>
<td>5 </td>
<td>.0042</td>
<td>.0044</td>
</tr>
<tr>
<td>6 </td>
<td>0</td>
<td>.0002</td>
</tr>
<tr>
<td>7 </td>
<td>.0002</td>
<td>.0002</td>
</tr>
</table>
</div>
<h2>The Null Hypothesis Test </h2>
<p>The critics of our test seem to be operating within the Null Hypothesis Test (NHT) framework. This is understandable, because this framework has dominated the testing of hypotheses in the social and biological sciences for the past 75 years. Throughout this history, the NHT has been controversial. You can find an accessible introduction to this matter in Pigliucci (2002) and Stenger, et al (2003). I did not use the NHT framework to choose the criterion for our test. However, I will use this framework to discuss the rationale for choosing the critical value.</p>
<p>The NHT framework is the one most likely to be familiar to the critics. Our choice of five correct matches for the critical value makes sense in this framework and in others. As the name suggests, the NHT involves setting up a hypothesis to be tested. Usually the hypothesis is that the outcome will be consistent with chance. This null hypothesis has been called a &ldquo;straw man&rdquo; because the investigator usually wants to knock it down. In our test, the null hypothesis is that the outcome comes from a distribution that would result if the matches were just random guesses. The distribution on the null hypothesis for our test is the one given in the preceding table. If the null hypothesis is true in our situation, then, on average, we would expect Natasha to get one correct match. However, even if the null hypothesis is true, she could achieve any one of the possible outcomes from zero to seven. Some of these outcomes are much less likely to occur than others. Outcomes of zero, one, or two correct matches have high probabilities of occurring. Outcomes of five or seven correct are highly unlikely to occur.</p>
<p>R.A. Fisher introduced NHT in 1925. The underlying logic involved computing the probability of an observed result given that the null hypothesis is true. If this probability is sufficiently small, then the researcher can reject the null hypothesis. I want to emphasize two points about the NHT. Usually, the experimenter has framed the test so that he or she hopes to reject the null hypothesis. If the researcher can reject the null hypothesis, he or she states that the outcome is &ldquo;significant.&rdquo;<sup><a href="#notes">2</a></sup> The second point involves the critical region or &ldquo;level of significance.&rdquo; Fisher suggested using the .05 level for significance, and subsequent investigators have followed his advice.</p>
<p>If we had used the .05 level of significance for our test, we would have chosen four as the critical value. An outcome of three correct would be too low because when the null hypothesis is true the probability of three or more correct matches is .08. This value exceeds the .05 level. When the null hypothesis is true, the probability of four or more correct matches is .018. Because this value is less than the .05 level, this would warrant choosing four as the critical value. So we would choose four as the critical value if we were conducting our test at the conventional .05 level of significance. This is the reason that the critics are arguing that we should have declared Natasha&rsquo;s performance &ldquo;significant.&rdquo;</p>
<p>However, the proponents of the NHT, traditionally insisted that the use of the .05 level was warranted only if the alternative hypothesis was plausible compared with the existing knowledge in a domain. They recognized that not all hypotheses or claims are equal. The alternative hypothesis, as contrasted with the null hypothesis, is the one the investigator is typically trying to confirm. The vast majority of such hypotheses are highly plausible and consistent with the existing body of theory and data in a given area of inquiry. It was for these plausible hypotheses that the pioneers of NHT advocated using the .05 level of significance.</p>
<p>These pioneers recognized that implausible hypotheses needed a stronger degree of evidence. For such implausible hypotheses, the recommendation was to use a stricter level of significance such as the .01 or the .001 level of significance. Indeed, statistical textbooks contain tables not only for doing significance tests at the .05 level, but also at the .01 and the .001 level. The idea is the familiar one: extraordinary claims require extraordinary proof. In a way, these advocates of the NHT were implicitly recognizing principles that are explicit in the Bayesian approach. Proponents of NHT not only recognized that the plausibility of the alternative hypothesis had to be taken into account. They also realized that the consequences of accepting the alternative hypothesis had to be considered. Again, this implicit recognition of the utility of the test decision is consistent with its explicit recognition in Bayesian approaches. Both these considerations relate to our test.</p>
<p>A paranormal claim, by definition, is one that is implausible or highly unlikely given the accepted scientific framework. Even J.B. Rhine acknowledged that ESP claims need to be tested at a more stringent level than the traditional .05. (Unfortunately, contemporary parapsychologists have departed from Rhine&rsquo;s advice and routinely test their paranormal claims at the .05 level.) Except for some parapsychologists, the vast majority of scientists would demand a level of significance of .001 or less. (Many physical scientists argue that such claims should be tested at an even more stringent level.) This implies that we should have used the .001 level. Unfortunately, to achieve this level of significance, we would have had to set the critical value at seven successful matches. As a compromise, I chose to use the .01 level. Setting the critical value at five or more correct matches achieves a significance level less than .01( The probability of getting five or more correct matches given the null hypothesis is .004). This is greater than the desired .001 level but consistent with the .01 level.</p>
<p>I decided against setting the critical level at seven because this would require Natasha to be 100% accurate in our test. We wanted to give her some leeway. More important, setting the critical value at seven would make it difficult to detect a true effect. On the other hand, I did not want to set the critical value at four because this would be treating the hypothesis that she could see into people&rsquo;s bodies as if it were highly plausible. The compromise was to set the value at five. This provides reasonable protection against falsely rejecting the null hypothesis. It also provides a reasonable level of power to detect evidence in favor of the alternative hypothesis. This is our next topic.</p>
<h2>Type I and Type II Errors </h2>
<p>Fisher&rsquo;s prescription for testing the null hypothesis recognizes only one type of error. The investigator can reject the null hypothesis when, in fact, it is true. By choosing a critical value, the experimenter can control the probability of making such an error. In typical tests of the null hypothesis, as I have discussed, this &ldquo;level of significance&rdquo; is set at .05. The statisticians Neyman and Pearson openly clashed with Fisher on this and other issues regarding testing hypotheses. Although the two approaches are logically incompatible, the NHT framework includes both Fisher&rsquo;s and Neyman-Pearson&rsquo;s procedures. Despite the continuing attacks on this hybrid of discrepant assumptions, the NHT method of testing hypotheses persists.</p>
<p>Neyman and Pearson argued that the testing of hypotheses could result in two types of errors. Type I errors occur when the investigator falsely rejects the null hypothesis when it is true. It is the probability of this error that the investigator controls by choosing a critical value for the test. Type II errors occur when the investigator fails to reject the null hypothesis when it is false. In most situations, the researcher has little direct control over the size of a Type II error. To determine the power of a test, the researcher needs to know the expected value and the distribution under the alternative hypothesis.</p>
<p>Our test uses the matching procedure. The preceding table shows the probabilities that I calculated for the null hypothesis. To estimate the power of our test, we need to specify the expected value of the outcome if the alternative hypothesis is true. The alternative hypothesis, remember, is the one that we are comparing to the null hypothesis. We also have to specify the critical value.</p>
<p>We set the critical value at five. This means that Natasha would have had to make five or more correct matches for the outcome to be declared &ldquo;significant.&rdquo; In calculating power, I assumed that if Natasha&rsquo;s claim is true we could expect her to get at least five correct matches. (I explain my rationale for this in more detail below.) Under these conditions, the power is approximately .75.That is, if, in fact, the alternative hypothesis is true, the odds are better than 3:1 that our test will detect it. These odds are not as great as we would like, but they are adequate. They are much better than some critics claim.</p>
<p>Of course, we could have increased the power of our test even more by reducing the critical value to four. While such a reduction would have decreased the probability of a Type II error, it would have done so by increasing the probability of a Type I error. In any test of a hypothesis, the investigators must strike a balance between these two possible errors.</p>
<h2>Effect Size and Power </h2>
<p>Researchers increasingly emphasize &ldquo;effect size,&rdquo; especially because of the recent popularity of meta-analysis. The effect size refers to the difference between the value expected on the null hypothesis and that expected given the alternative hypothesis. In the NHT framework the focus is upon the probability of the outcome given that the null hypothesis is true. If the probability is high or moderate, then the investigator cannot justify rejecting the null hypothesis. If the probability is low, especially if it is equal to or less than the specified significance level, the investigator rejects the null hypothesis.</p>
<p>A criticism of NHT is that it fosters erroneous beliefs. One is that the lower the probability of the observed outcome, the more meaningful or important is the finding. Investigators can dismiss a large effect as unimportant because the probability of the outcome is large. The same investigators can hail a small and trivial outcome as important because the probability level is very low. The problem is that the probability of an outcome depends upon the sample size. A large effect can be non-significant if the sample size is small. Even a very small effect can produce a very small probability if the sample size is large.</p>
<p>Researchers measure effect size in a variety of ways. However, all of them correct for sample size. Beyond removing the influence of sample size, the measures are standardized so that they are comparable across different studies. Focusing upon effect sizes can be helpful, but it is not a panacea. Just because the measures are standardized, they become detached from the specific context from which they arose. In many situations, it is this specific context that provides the basis for a realistic assessment of when an effect size is large or meaningful. Failure to consider the original context, can result in such errors as treating effect sizes as equivalent just because they are the same &ldquo;size,&rdquo; or using an arbitrary, one-size-fits-all, scale for deciding when an effect is small or large. At least some of our critics used such arbitrary, context-free measures to conclude that our test lacked adequate power.</p>
<p>Mindless application of &ldquo;effect size&rdquo; to our test can result in the false belief that four correct matches in our test corresponds to a large effect. The same reasoning leads to the equally false belief that five correct matches corresponds to a very large effect. If we carefully examine the context of our situation, I argue that five correct matches do not make a large effect. Further, I claim that four correct matches, given the context of our test, yield a weak and, if real, trivial effect. Let me explain.</p>
<p>To begin, I must emphasize that the matching test we gave Natasha was a highly simplified version of what she does during her typical diagnoses. Consider the following points:</p>
<ol>
<li>In her typical consultations with patients, Natasha allegedly has no prior knowledge about the client&rsquo;s problems. She has to scan the entire body, look at every organ, and even look for problems at the cellular level. In our task, we tell her exactly what the condition is that she should be looking for. On each trial, she not only knows what she is looking for, she also knows where to look for it.</li>
<li>In her typical consultations, she not only does not know where to look and what to look for, but she often has to diagnose conditions whose detection involves very subtle cues such as slight changes in texture or discoloration. In our task, we chose conditions whose detection was non-problematic. If her X-ray vision operates as she claims, she did not have to rely on subtle indications. We presented her with conditions that were clear-cut and unambiguous. We were not asking her to look for changes in cells, slight alterations in size or shape, or malfunctioning processes. Instead we presented her with conditions that should stand out boldly for a person with X-ray vision&mdash;a large hole in the skull; a sizeable portion of a lung missing; metal surgical staples in the chest; a hip replacement; etc.</li>
<li>In her typical consultations she has to make an <em>absolute judgement</em> about each condition she is evaluating. An absolute judgment is one in which she has to decide, say, if an organ deviates from its normal state without the benefit of a comparison example. Our task allowed her to make <em>comparative judgments</em>. On each trial, she had the benefit of six normal examples to contrast with the one deviant case for which she was looking. Perceptual psychologists have shown that comparative judgments are several orders of magnitude easier than absolute judgments.</li>
</ol>
<p>Her supporters vouch for the accuracy in her typical diagnoses (her mother even claimed that she never errs). Natasha informed the producer that our proposed test would be much less than demanding than her typical reading. This is because she would not have to scan the entire body for each subject. So if her claim is correct, any one of the reasons listed above should insure many correct matches on our test. Taken together, they should guarantee close to perfection.</p>
<p>These are the reasons why I felt justified in expecting almost perfect performance on our test if she really has the claimed X-ray vision. I softened this expectation to allow for less than perfection. I also lowered this expectation to guarantee adequate power for our test. Setting the expectation at five correct matches is the lowest value I could justify in our test. Four or less correct would simply be inconsistent with her claim.</p>
<p>The preceding observations assume an ideal situation. As everyone now realizes, our situation was not ideal. When we were planning the test, we knew it could not be ideal. We could not exclude the possibility of her picking up external clues with her normal vision. Realistically, the &ldquo;null&rdquo; distribution, as a result, would have an expected mean greater than one. Beyond getting one or more correct matches just by chance, we could expect her to get a few additional matches from external clues. These clues would be subtle if we had achieved our goal of making our group of subjects homogeneous in all external aspects. My previous report makes it clear, however, that the situation provided obvious clues that might have given her information about the subjects&rsquo; conditions. These possibilities provide additional reasons why getting four correct is not enough to show that Natasha has X-ray vision.</p>
<h2>A Bayesian Perspective </h2>
<p>Until now, I have discussed our test within the NHT framework. Within that framework, I found it easy to justify the criteria for our test. However, I chose the criterion for the test within a Bayesian framework. Within this framework, the probabilities for the matching procedure given in the preceding table are only part of the story. We have to consider explicitly the prior odds of both the null and the alternative hypotheses. The information provided by the outcome does not, in itself, provide us the probabilities that the null or the alternative hypothesis is correct. Instead, the information from the outcome is used to <em>revise</em> the prior odds.</p>
<p>A logical problem with the NHT was recognized by Fisher. For both the null and the alternative hypotheses we can calculate the probabilities for each possible outcome. For example, given the null hypothesis, the preceding table gives the probability of four correct matches as .0139. The probability of four correct matches given the alternative hypothesis is .1562. The problem is that the investigator is not interested in these probabilities. Rather, he or she wants to know the probability that the null or the alternative hypothesis is true given the outcome of the test. This is a subtle, but crucially important difference. The experiment or the test provides us with data (an outcome). We can compute the probability of this outcome given the hypotheses.</p>
<p>What we want, however, is the probability of the hypotheses given the outcome. This is the problem of &ldquo;inverse probabilities.&rdquo; Philosophers and statisticians engage in complicated and never-ending debates about whether such inverse probabilities can be justified. The difficulty is that we need to know the prior probabilities of null and alternative hypotheses before we can get the probabilities for these hypotheses after we have observed the outcome.</p>
<p>In our case, the Bayesian context requires us to specify two hypotheses to compare. In addition, we have to specify a prior probability that each is true. Consider the claim that Natasha has X-ray vision and can use this ability to diagnose medical conditions. What are the odds that this claim is true? The Bayesian approach is often criticized because the assignment of prior odds to hypotheses is subjective and arbitrary. This article is not the place to debate this matter. I only need to say that we have an empirical basis for assigning prior odds to Natasha&rsquo;s claim. The assignment does not need to be exact. A crude approximation will do.</p>
<p>Natasha&rsquo;s claim belongs to a large family of similar ones where medical sensitives declared that they could diagnose illness by &ldquo;seeing&rdquo; inside patients. Such claims go back as far as the early 19th Century when mesmerized individuals allegedly displayed such abilities. Since then, and continuing into our time, thousands of individuals have made these claims. Yet, not one of these claims has withstood a scientific test. Natasha&rsquo;s claims and the anecdotes about her achievements places her into this class of medical sensitives. The probability that the claims of any individual in this class are true is obviously quite low. Indeed, given that not one of these claimants have produced scientific evidence in support of their ability, it would be reasonable to assign odds of several thousand to one <em>against</em> the truth of the claim.</p>
<p>I took a more conservative approach. I decided to assume that the prior odds <em>in favor</em> of the null hypothesis were 99:1. This means that I was also assuming that the prior odds <em>against</em> the alternative hypothesis are also 99:1. The null hypothesis in our test is that the average number of correct matches will be one. The alternative hypothesis is that the average number of correct matches will be five. These two hypotheses are <em>statistical hypotheses</em>. The statistical procedure uses the outcome of the test as a basis for deciding between these two statistical hypotheses. We should distinguish these statistical hypotheses from <em>conceptual</em> or <em>substance</em> hypotheses.<sup><a href="#notes">3</a></sup></p>
<p>In this context, the probabilities in the preceding table do not directly tell us how likely the null hypothesis is true given a particular outcome. In the Bayesian framework, we also need to compare the probability of the outcome on the null hypothesis with the probability of the same outcome on the alternative hypothesis. For an outcome of four correct matches this comparison yields odds of approximately 11:1 <em>in favor</em> of the alternative hypothesis. These odds are called a likelihood ratio. They tell us that an outcome of four correct matches provides evidence in favor of the alternative.</p>
<p>The Bayesian approach combines the likelihood ratio (determined by the evidence provided in the test) with the prior odds for each hypothesis. This combination yields posterior odds for each hypothesis. The posterior odds represent how the prior odds were revised because of the evidence provided by the test. An outcome of four correct matches does revise the original odds so they move closer to the alternative. The question is, do they revise the original odds enough to reject the null hypothesis in favor of the alternative?</p>
<p>The answer in our case is &ldquo;no.&rdquo; The four correct matches lower the odds against the alternative hypothesis from 99:1 to 9:1. This is big reduction, but not enough to revise the evidence in favor of the alternative hypothesis. However, if the outcome had been five correct matches, the revised odds would have been close to 1:1. In other words, this latter outcome would have resulted in the conclusion that the odds are even that the alternative hypothesis is true. Although such an outcome still does not favor the alternative, I was willing to conclude that an outcome that reduced the original odds against the alternative from 99:1 to 1:1 was impressive enough to justify additional investigation of her claim.</p>
<p>Some critics of our test argued that we should have considered four correct outcomes as &ldquo;significant.&rdquo; Within the Bayesian framework, such an argument implicitly assumes that the prior odds in favor of Natasha&rsquo;s claim are 1:1. As I have explained, I think such an assumption is unreasonable. Before the test, I think all but her dedicated proponents would have placed the odds against her claim as much higher than the 99:1. Even with my setting the prior odds at this modest level, the evidence provided by the outcome still fell far short of swinging the odds in her favor.</p>
<p>As I previously mentioned, we designed our test to decide between two statistical hypotheses. The null hypothesis was that the outcome comes from a distribution with mean of one. This is the distribution we would expect if the number of correct matches is due to chance. The alternative hypothesis is that the outcome comes from a distribution whose mean is five. I have discussed the many different reasons why we concluded in favor of the null hypothesis. However, if we had decided in favor of the alternative hypothesis this would not be the same as confirming the hypothesis that Natasha has X-ray powers.</p>
<p>The statistical test enables us to decide (with a certain degree of confidence) between two statistical hypotheses. The alternative <em>statistical</em> hypothesis is that the observed outcome comes from a distribution whose mean is five. The major alternative <em>conceptual</em> hypothesis is that Natasha&rsquo;s correct matches are the result of her alleged X-ray vision. When we reject the null hypothesis, we are deciding that the statistical alternative is more likely to be true. This is not the same thing as confirming the conceptual alternative. This is because many other conceptual alternatives might be consistent with the statistical alternative.</p>
<p>Some possible conceptual alternatives in our situation are: 1) Natasha&rsquo;s correct matches are due to X-ray vision; 2) Natasha&rsquo;s correct matches are due to external clues; 3) Natasha&rsquo;s correct matches are due to a combination of external clues and X-ray vision. Because Natasha sees the subjects with her normal vision when she is allegedly using her X-ray powers, we cannot rule out the alternative conceptual hypothesis of reliance on external, normal clues. We designed our test to be the first step in a potentially sequential procedure. The first step would enable us to decide between two statistical hypotheses. If the outcome did not allow us to reject the null hypothesis, then it would provide no support for the alternative statistical hypothesis. Such an outcome would also provide no support for any of the conceptual hypotheses. Given such an outcome, we would have no reason to continue the investigation.</p>
<p>On the other hand, if the outcome allowed us to reject the null hypothesis, then it would provide support for the alternative statistical hypothesis. However, this would not be the same thing as supporting the conceptual hypothesis of paranormal X-ray powers. The alternative statistical hypothesis could be consistent with an array of possible conceptual alternatives. The most likely one would be that Natasha was using external clues to get her correct matches. Obviously, we would have to do further testing with extensive resources and clever procedures to eliminate many other conceptual possibilities before we could say that her matches were due to X-ray vision.</p>
<h2>What is the Difference Between Four and Five Correct Matches? </h2>
<p>Some critics, including Natasha herself, claim that her score of four correct matches is close enough to the criterion of five. They say we should give her credit for getting so close. The difference between four and five correct matches, however, is not trivial. This is because we are dealing with a discrete distribution with only seven possible values. For the situation as I set it up, an outcome of four correct guesses yields posterior odds of 9:1 in favor of the null hypothesis. An outcome of five correct guesses, on the other hand, yields posterior odds closer to 1:1.</p>
<h2>Conclusions </h2>
<p>This commentary has been somewhat technical and repetitious. I wanted to explain as fully as possible the reasons behind the planning and interpretation of our test. We devoted much thought to the planning of the test. My colleagues and I each agreed that the appropriate critical value was five correct matches. We each reached this conclusion for different reasons. The fact that we converged upon the same critical value suggests that this was a reasonable choice.</p>
<p>The limitations of our test were those of execution rather than design. All these limitations favored Natasha.The requirements of the test were much less demanding than what occurs in her typical diagnoses. These and other factors probably worked to increase the number of correct guesses. Despite these flaws, Natasha still could not achieve the number of critical matches to pass the test. This number was one to which all parties had agreed to in advance. All parties to the agreement were committed to the two possible conclusions. If she got five or more, we would have advocated further and more conclusive testing. If she got less than five, as was the case, we would drop any additional interest in her claim.</p>
<p>The outcome was insufficient to pass our criterion. Moreover, the specific correct matches and misses added additional evidence to weaken her claim. As I explained in my previous article, the pattern of her matches was inconsistent with the operation of X-ray vision. However, this pattern was fully consistent with the possibility that her matches relied upon external clues.</p>
<h2><a name="notes"></a>Notes</h2>
<ol>
<li>Diaconis and Holmes (2002) deal with the matching problem from a Bayseian perspective. This is interesting because I chose the critical value for our test based on elementary Bayesian reasoning. Their paper provides a basis for designing a test that could accommodate different probabilities for correctly matching each subject.</li>
<li>Almost from the beginning, critics of NHT have bemoaned the fact that the rejection of the null hypothesis is labeled a &ldquo;significant&rdquo; outcome. &ldquo;Significance&rdquo; implies importance. A statistically &ldquo;significant&rdquo; result simply means that the outcome was sufficiently different from the one expected on the null hypothesis that it equaled or surpassed a critical value. With small samples, the outcome has to be very different from the expected value to achieve &ldquo;significance.&rdquo; However, with sufficiently large samples, even a trivially small difference can achieve &ldquo;significance.&rdquo;</li>
<li>For a discussion of this distinction see Denis (2001).</li>
</ol>
<h2>References</h2>
<ul>
<li>Denis, D.J. (2001). Inferring the alternative hypothesis: risky business. <cite>Theory &amp; Science, 2</cite>. <a href="http://htpprints.yorku.ca/archive/00000234/" target="_blank">http://htpprints.yorku.ca/archive/00000234/</a></li>
<li>Diaconis, P., &amp; Holmes, S. (2002). A Bayesian peek into Feller Volume I. <cite>Indian Journal of Statistics</cite>, 64, 820-841.</li>
<li>Feller, W. (1950). <cite>An introduction to probability theory and its applications.</cite> (Vol. 1). New York: Wiley.</li>
<li>Hyman, R. (2005, May/June). Testing Natasha. <cite>Skeptical Inquirer, 29</cite> (No. 3), 27-33.</li>
<li>Mosteller, F. (1965). <cite>Fifty challenging problems in probability with solutions.</cite> New York: Dover.</li>
<li>Pigliucci, M. (2002, November/December). Hypothesis testing and the nature of skeptical investigations. <cite>Skeptical Inquirer, 26</cite> (No. 6), 27-30, 48.</li>
<li>Skolnick, A.A. (2005, May/June). Natasha Demkina: the girl with the normal eyes. <cite>Skeptical Inquirer, 29</cite> (No. 3), 34-37.</li>
<li>Stenger, V., Wasserman, L., Edis, T., Gat, Y., &amp; Pigliucci, M. (2003, March/April). Letters to the editor: testing hypotheses. <cite>Skeptical Inquirer, 27</cite> (No. 2), 68-69.</li>
</ul>




      
      ]]></description>
    </item>

    <item>
      <title>Testing Natasha</title>
      <pubDate>Sun, 01 May 2005 13:22:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/testing_natasha</link>
      <guid>http://www.csicop.org/si/show/testing_natasha</guid>
      <description><![CDATA[
        



			<p class="intro">Can a seventeen-year-old girl truly &ldquo;see&rdquo; inside a person&rsquo;s body? Ray Hyman and colleagues conducted tests to search for the truth inside <cite>The Girl with X-Ray Eyes</cite>.</p>
<p>Our assignment might seem straightforward. A seventeen-year-old Russian girl, Natasha Demkina, says she can look at people and &ldquo;see&rdquo; the status of their internal organs. The Discovery Channel asked Richard Wiseman, Andrew Skolnick, and me to test her claim for their television program, <cite>The Girl with X-ray Eyes</cite>. You might think that testing Natasha&rsquo;s claims would be routine. The test of a psychic claim, however, is rarely cut-and-dried. Most such claims do have much in common. Each also offers unique challenges. We had to conduct the test of Natasha&rsquo;s claim to fit the constraints of a television program. We had only a month to devise a protocol that would be acceptable to all parties. After everyone agreed to the procedure, we had less than a week to locate a testing site in New York City and to find seven willing and suitable test subjects. [<a href="#notes">1</a>]</p>
<h2>The Claim and Its Support</h2>
<p>Monica Garnsey, director and producer of the program, told us how Natasha operates and what she claimed to do. Many news sources and reports on the Internet described her accomplishments. (This information was consistent with what we observed when Natasha diagnosed volunteers at the Open Center in New York City the day before the test.) Garnsey e-mailed us the following information from Russia, where she was taping material for the television program:</p>
<blockquote>
<p>I double-checked a few things with her last night. Since the age of ten, a few days after having a religious dream, and also having had an operation to have her appendix removed that went wrong, swabs were left in her and she had to have another operation, Natasha has claimed to be able to see into people. . . . Natasha can see through clothing, but not see what someone is holding behind their back. She cannot see inside people if she shuts her eyes. Daylight is better. She does not need to talk to them to diagnose. She can also diagnose from a photograph. She usually scans people all over first, by making them stand up fully clothed and looking them up and down; delivers a general diagnosis; and then goes into more detail when the patients have discussed their concerns with her. She says she can <em>certainly</em> see ribs, heart, lungs, initially in general &ldquo;like in an anatomy book,&rdquo; but can see right down to the cell level if she concentrates. She says that she can examine the whole body, but it can give her a bad headache if she does too much. The idea of restricting the test to the chest area appeals [to her], though her claims extend further than that.</p>
</blockquote>
<p>Natasha&rsquo;s story is like thousands of other accounts. Alleged psychics and their supporters make claims that, if true, defy the physical limitations and laws of modern science. The proponents support the reality of these claims with testimonials of outstanding successes. They argue for the reality of the claim passionately and unreservedly. Although some proponents have had scientific training, none of the supporting evidence comes from well-controlled scientific studies.</p>
<p>In the long history of psychical research, not one of these claims has produced convincing scientific evidence for the existence of paranormal ability (see Joe Nickell&rsquo;s column in this issue, p. 18). A few researchers have claimed that they did have scientific proof for a paranormal claim. Scrutiny by other scientists, however, showed that the &ldquo;scientific proof&rdquo; had serious flaws. Furthermore, none of these claims could be independently replicated. 

</p><p>Natasha Demkina poses for photographs after being tested by CSICOP and CSMMH for the Discovery Channel program <cite>The Girl with X-ray Eyes</cite>. Her friend, Svetlana Skarbo, who acted as her translator, holds a cell phone over which they had sent and received text messages from unknown parties during the test (in violation of test protocols.) On the left is Barrie Cassileth, Ph.D., Chief of Integrative Medicine Services at Memorial Sloan-Kettering Cancer Center, who helped to recruit subjects for this preliminary study. Photo credit: Andrew A. Skolnick.</p>
<p>The evidence supporting Natasha&rsquo;s abilities comes from selected anecdotes of reactions to her readings. No matter how subjectively compelling, the context of such readings makes it impossible to separate how much of the apparent success is due to such possibilities as: guessing; external clues from the client&rsquo;s physical appearance and observable behavior; feedback from the client&rsquo;s spoken and bodily reactions; or actual paranormal powers. A meaningful test would allow Natasha to show her powers and, simultaneously, control for guessing and the use of normal sensory clues.</p>
<h2>Problems With Testimonial Support of Natasha&rsquo;s Claims</h2>
<div class="image left">
<img src="/uploads/images/si/nat-01.jpg" alt="/uploads/images/si/nat-01.jpg" />
</div>
<p>The stories told by Natasha&rsquo;s proponents are consistent with her having X-ray vision. This does not show that she does have X-ray vision because the same stories are consistent with many other alternatives. Two possibilities are the following: 1) her statements have no connection with the client&rsquo;s condition but appear to do so because of luck, selective reporting, and/or other reasons that I will discuss; or 2) her statements accurately reflect the subject&rsquo;s condition, but this information comes through normal means such as the subject&rsquo;s appearance and behavior. Consider, first, the ways that her statements can falsely appear to describe the patient&rsquo;s condition. 

</p><p>For Discovery Channel publicity photos following CSICOP-CSMMH&rsquo;s test, Natasha Demkina examines the seven volunteer test subjects. The subjects are wearing opaque glasses to prevent communication through eye movement. The head test proctor, Ray Hyman, is sitting in the bottom right corner. Photo credit: Andrew A. Skolnick.</p>
<h2>She Might Have No Knowledge About the Client&rsquo;s Condition But Get Credit Anyway</h2>
<p>Natasha has been giving readings to a steady flow of clients for more than six years. By now the number of such readings is huge. Her supporters naturally emphasize the most striking examples of apparent hits. The number of diseases and internal parts that could be defective is limited. Some conditions, such as cancer and heart problems, are more common than others. We should expect that her supporters will find some examples of &ldquo;correct&rdquo; diagnoses. With so many diagnoses, a certain number will match the client&rsquo;s condition just by chance.</p>
<div class="image right">
<img src="/uploads/images/si/nat-03.jpg" alt="/uploads/images/si/nat-03.jpg" />
</div>
<p>To evaluate a diagnostic procedure properly we need to clearly decide what is a &ldquo;hit&rdquo; and what is a &ldquo;miss.&rdquo; Most important, we should set the criteria before we know the outcome. In Natasha&rsquo;s readings, no clear and objective standards were ever established. This allows for her generally vague utterances to be retrofitted to what the client or observer knows to be true. An example of such retrofitting occurred when Natasha was doing a reading in London. Dr. Chris Steele, described by <cite>The Daily Mail</cite> (January 29, 2004) as one of her champions, was observing. The newspaper quotes him as saying, &ldquo;Natasha doesn't know any medical terms at the moment. With one person this week she was trying to describe a kidney stone, and her translator came up with the words, 'sand' and 'gravel' before I suggested stones. When kidney stones start off, they do look like sand.&rdquo; Dr. Steele gives her credit for correctly diagnosing kidney stones. Yet we have no idea what Natasha was &ldquo;seeing&rdquo; or what she had in mind. Dr. Steele made the medical diagnosis, not Natasha.</p>
<p>Other features of Natasha&rsquo;s readings foster the illusion of accuracy. When she tells clients something that agrees with previous medical diagnoses, they credit her with a hit.</p>
<p>Similarly, when she tells the clients something that <em>disagrees</em> with previous medical diagnoses, they still credit her with a hit; the clients and her supporters argue that she picked up on something that the medical professionals missed. We witnessed some examples of this when we watched her giving readings to volunteers at the Open Center in New York. She told one volunteer that she saw a problem with his right shoulder. After the reading, this volunteer told Monica that he had not previously realized something was wrong with his shoulder. Neither his previous medical examinations nor anything in his experience suggested something was wrong with his shoulder. I thought, as a result, he might be skeptical about Natasha&rsquo;s claim. Instead, he was impressed. He decided she had detected a problem that neither he nor his doctors had noticed.</p>
<p>Natasha Demkina stands between her friend Svetlana Skarbo (who served as her translator instead of the one hired by Discovery Channel) and Richard Wiseman, who helped to design and conduct the CSICOP-CSMMH test. Photo credit: Andrew A. Skolnick.</p>
<h2>Possibilities of Natasha Picking Up Clues by Non-paranormal Means</h2>
<p>I have described just some ways that testimonials can appear to support Natasha&rsquo;s claim even if she is picking up no information about her clients. Those possibilities would suffice to make such testimonials useless as evidence for her ability. The testimonials become even more suspect when we realize how the circumstances of her readings allow her to pick up information about her client without having X-ray vision. Natasha is looking directly at her client when she does her diagnosis. This means that we cannot rule out the possibility that she is picking up clues from subtle (and not-so-subtle) client reactions. To make matters worse, the clients begin a session by asking Natasha questions about their concerns. This provides obvious clues about their condition. I watched one reading where the client began asking Natasha about her back. This narrows considerably the number of possibilities that Natasha needs to consider. Natasha can also gain considerable information from verbal exchanges with the client.</p>
<p>Another source of clues is how the clients react, both verbally and nonverbally, to her statements. Some of her clients say that they find it unsettling when Natasha is staring at them. This could enhance the tendency for individuals to react to her statements with subtle, unwitting bodily movements, breathing changes, pupil dilations, and other signs of emotional and cognitive states. Although psychological research has documented how humans frequently provide unconscious clues to their current thoughts and emotions, most people seem unaware of this possibility. The research also shows that subtle clues can influence us without our consciously realizing it.</p>
<p>One classic case involved the German horse Clever Hans. [<a href="#notes">2</a>] In the early twentieth century, Hans became a celebrity in Germany and throughout the world. People could ask him questions about addition, the identity of musical pieces, about foreign words, spelling, and many other topics. Hans would answer by tapping his hoof or by nudging an alphabet board with his nose. He usually was correct. Prominent educators certified that he had the intelligence and competence of a thirteen- or fourteen-year-old German student. Oskar Pfungst, a German psychologist, investigated Hans with exemplary thoroughness. He eventually discovered that Hans was clever only in having &ldquo;horse sense.&rdquo; Typically, a questioner would focus on the horse&rsquo;s right hoof, which Hans used to tap out the answer. When questioners focused on the hoof, they would almost imperceptibly lean forward and become tense as they watched the horse tap out the answer. This slight leaning and tensing were Hans&rsquo;s cues to begin tapping. When Hans had tapped the appropriate number of times, the questioner would unconsciously relax and move his or her head upwards very slightly. Often this movement was one millimeter or less. This was Hans&rsquo;s clue to stop tapping.</p>
<p>Pfungst then carried out experiments to confirm this finding. He played the role of Hans. He would invite people to stand beside him and think of a number. Pfungst would then begin tapping with his right hand. He would stop when he thought he detected a very slight bodily movement-usually a very slight displacement of the subject&rsquo;s head. These movements were extremely subtle, rarely more than a millimeter in extent. Pfungst amazed his volunteers, stopping his tapping at the number they had in mind.</p>
<p>Pfungst tried this experiment with twenty-five persons ranging in age from five years to adult. He succeeded in picking up cues from all but two of them. They insisted they were unaware of giving him any information. Pfungst used the same method to divine other kinds of thoughts the subjects had in mind. The subjects again denied that they had provided any clues about what they were thinking. Other psychological experiments have confirmed these results. Some skilled performers have made careers out of pretending to read minds when, in fact, they were relying upon subtle and unwitting clues provided by their volunteers.</p>
<p>Some reports supporting Natasha&rsquo;s claim describe outcomes consistent with the possibility that she is picking up such clues. For example, a Russian reporter says that he became a convert to Natasha&rsquo;s cause when she found the exact spot on his arm where he had fractured his wrist many years before. In another case, a reporter from a British tabloid validated Natasha&rsquo;s ability when Natasha succeeded in identifying the location of the fractures she had received in an accident. Both cases seem ideal for picking up the sorts of clues that Pfungst found that most people provide without realizing they are doing so.</p>
<p>What I have just written does not show that Natasha lacks X-ray vision. We do not know from the evidence offered by her proponents whether she does or does not have a paranorm-al capacity to see into people&rsquo;s bodies. What we do know is that the accounts that seem to support Natasha&rsquo;s claim are consistent with both normal and paranormal possibilities. We also know that nonparanormal mechanisms can and do operate in the real world. We <em>do not know</em> that paranormal ability, such as that claimed for Natasha, exists. So far, no one has displayed such ability with scientific credibility. Given these two possible explanations for Natasha&rsquo;s apparent successes, rationality tells us to bet on the nonparanormal one. We should demand convincing evidence that is scientifically acceptable before we give credence to the paranormal claim.</p>
<h2>The Test Protocol</h2>
<p>With input from Richard and me, Andrew wrote the test protocol, titled &ldquo;Test Design and Procedures for Preliminary Study of Natasha Demkina.&rdquo; The goal was to make every aspect of the test explicit. The protocol stated how we would conduct the test and how we would interpret the results. We wanted all parties to be clear about what would and would not be considered a &ldquo;successful&rdquo; outcome. What makes a scientific experiment or a test meaningful is just such an explicit commitment to the interpretation of the outcome <em>before we observe the data</em>. This is a critical distinction between the post hoc interpretation of testimonial evidence and the prior commitment to specified outcomes of a meaningful test. Natasha&rsquo;s defenders apparently fail to grasp this essential point.</p>
<p>The written protocol protects the interests of all parties. Natasha and her supporters had the opportunity to study the document, to suggest modifications, and finally to agree or disagree with its provisions. The protocol also protects the investigators against a variety of false accusations about how we conducted the test.</p>
<p>We made sure to include in the protocol the statement that the &ldquo;test is not in any way a definitive test. Deciding the truth of Natasha&rsquo;s claims with comfortable certainty is too simple and brief. It can only help to decide whether further studies of Natasha&rsquo;s claimed abilities are warranted.&rdquo; This statement is worth elaborating. Understanding what the test can and cannot do is essential. Even under ideal circumstances this test could not clearly decide if Natasha does or does not have X-ray vision. Any scientific hypothesis-especially a paranormal one-cannot be confirmed or disconfirmed by one test or one experiment. Scientific investigation requires a series of experiments. Each new experiment builds on the results of previous ones. The more we learn from the early experiments, the better we can understand what we need to control and what we can safely ignore. If the hypothesis is implausible and/or controversial-as Natasha&rsquo;s claim certainly is-then the original investigators must replicate their findings. In addition, independent investigators must also replicate the findings before they gain scientific credibility.</p>
<p>We knew that our test could not distinguish between two possibilities: (1) she can make correct matches using external clues; or (2) she can make correct matches using paranormal X-ray vision. The alternatives we could control or reduce were that she gets correct matches just by luck or that her correct matches are due to those factors that make vague statements seem like hits.</p>
<p>We were also aware that our test could only detect a large effect. Natasha&rsquo;s claim can be considered in several contexts. The testimonials imply that she is highly accurate. This has practical consequences. If clients are depending upon her for medical diagnoses, Natasha&rsquo;s readings should be reliable. Otherwise, she can do much harm. Of course, Natasha could possess paranormal powers, but they could be weak and erratic. Such unreliable and weak ability would be useless for medical diagnosis, but would still be of theoretical interest. We lacked the resources and time to try to detect such a weak effect. We used all our resources to obtain seven subjects. If we had been trying to test for a moderate or weak effect, we would have had to use many more subjects. Given the constraints of our task, this was impossible. Our test, then, was aimed at detecting a large effect. We reasoned that if she possessed the reliability of diagnosis that her proponents claimed, our test would reveal this. Such an effect would encourage us to investigate her abilities in more detail.</p>
<p>The outcome of the test could be from zero to seven correct matches. We set the criterion for success at five correct matches. We clearly stated this criterion in the test protocol and all parties agreed to this in advance. Although Natasha&rsquo;s mother says that her daughter never makes a mistake, we did not want to demand that Natasha perform perfectly. We wanted to give her some margin for error. Keep in mind that if she got five or more correct this would be consistent with her having the X-ray power that she claims. Yet it would also be consistent with the possibility that she was matching the target condition by normal means such as the appearance and behavior of the subjects.</p>
<h2>The Test</h2>
<p>Richard Wiseman, Andrew Skolnick, and I collaborated in designing the test. We arrived at a mutually satisfactory plan after exchanging several e-mails. The task of finding appropriate subjects, and coordinating the many details was left to Andrew. He had less than one week to accomplish all this. He had to do this from Amherst, more than 350 miles from New York City.</p>
<p>Austin Dacey, executive director of the Center for Inquiry-<cite>MetroNY</cite>, obtained an excellent set of rooms for the test at the City College of New York and helped recruit several subjects. Dr. Barrie Casselith, Chief of Integrative Medicine Service at Memorial Sloan-Kettering Cancer Center, helped us with the daunting task of assembling seven appropriate and willing subjects. On the morning of the day of the test we learned that two of the subjects had withdrawn. Again, Andrew and Austin saved the day by finding two replacements at the last moment. (Andrew&rsquo;s separate article about certain aspects of the tests follows mine.)</p>
<p>During the test, we seated the seven subjects in a semicircle facing the chair where Natasha sat. Each volunteer had an internal condition that should be easy to detect if Natasha&rsquo;s claim is correct. The target conditions were as follows: One patient had metal surgical staples in his chest from open heart surgery; one had a section of her esophagus surgically removed; one had a large section of one lung removed; one had an artificial hip replacement; one had a missing appendix (we discovered afterwards that another subject also had a missing appendix, which he didn't mention when we recruited him. Natasha chose neither of these two as the one with the missing appendix); one had a large brain tumor removed and now has a large hole in his skull covered by a metal plate; and the final subject had none of these target conditions.</p>
<p>During the test, when Natasha was looking at the subjects, the subjects wore sunglasses whose lenses were covered with opaque tape. This prevented the subjects from knowing when Natasha was looking at them. This also prevented Natasha from picking up clues from their eye movements or pupillary dilations (which are a sign of emotional reaction). Before the test, I instructed and rehearsed the subjects on how to behave. They were to sit as still as possible when Natasha was in the room. If Natasha needed to observe them in a standing position, I would tell Natasha to turn her back while they stood up and when they sat again. We used similar precautions if Natasha needed to look at them in profile. These precautions reduced the possibility of reactions by the subjects from knowing which target condition Natasha was currently studying. We also wanted to reduce external movements (for example, the subject with a hip replacement might give herself away from her efforts to stand or to change the position of her body). [<a href="#notes">3</a>]</p>
<p>The test room was large and had chairs for our seven subjects, for Natasha and two interpreters. One interpreter was Natasha&rsquo;s friend Sveta Skarbo. We allowed her in the test room to make Natasha feel comfortable. The other interpreter was supplied by the Discovery Channel. Ideally, only I, as the head proctor, Richard Wiseman as my co-investigator, Natasha and the two interpreters, and the seven subjects should have been present during the test. The realities of television production and the requests of Natasha&rsquo;s companions forced us to compromise here, and in some matters of protocol. The test room also included a television crew of three persons from the production company (Shine, Ltd.); Austin Dacey, who was videotaping the proceedings for CSICOP; Joe Nickell as an observer; a still photographer from the Discovery Channel; and Will Stewart, a British journalist living in Russia who was acting as a representative for Natasha. Except for the subjects (and Austin Dacey), everyone in the test room, including myself, was blind to the condition of each subject.</p>
<p>A small room, attached to rear of the test room, was used for briefing Natasha. We could retreat to this room when we wanted to discuss matters out of sight and hearing of the subjects. Because Andrew was in charge of recruiting the subjects and was not completely blind to their conditions, he stayed out of the testing room. He remained in the briefing room during the entire test (which lasted more than four hours). We used this room to brief Natasha before each of the six required matches (once she had made six matches, the seventh was determined by default). Before each trial Andrew gave her a clear description, along with images and diagrams, of the target condition that she was to match to a subject. We also discussed any of Natasha&rsquo;s questions or concerns in this room.</p>
<p>Andrew and I met with Natasha in this room before the test to review the procedure and to remind her about the details of the protocol. She had agreed to this protocol, which Monica had shown her five days previously. We reviewed each condition that we would ask her to detect. She expressed concerns about the removed appendix and the resected esophagus. She was worried that if the appendix had been removed long enough ago it might have grown back. Andrew assured her that appendices do not grow back. Her concern about the resected esophagus was that individuals might normally differ in the length of their esophagus and this could mislead her. Andrew told her that instead of the length she should look for the scar that completely encircled the place where the two ends of the resected esophagus had been surgically joined.</p>
<p>The test consisted of six trials. On each trial Andrew gave Natasha a test card that clearly described, in Russian and English, the condition she was to match to a subject. The card contained an illustration of the target organ or condition. Andrew also showed her relevant illustrations from an anatomy text. When she was satisfied, I accompanied Natasha to the test room, where she sat between the two interpreters and equidistant from each subject. After Natasha had studied the subjects for the given condition, she chose the subject she believed had the specified condition. She would circle the subject&rsquo;s number on the test card and both of us would sign the card. We then returned to the back room to prepare for the next condition and trial.</p>
<p>We wanted to make the test as comfortable and nonstressful for Natasha as possible. I made sure not to rush or pressure her in any way. I gave her all the time she wanted to make each match. She took one hour to make the first match-which was to find the subject who had a large section of the top of her left lung surgically removed. She required more than four hours to complete the matches of conditions to the seven subjects. Throughout this process I repeatedly asked her if she was comfortable and if we could do anything to make the process more agreeable to her. She could ask for a break in the proceedings whenever she wished. Her mother had decided to remain outside both the test and briefing rooms because she wanted to be with Natasha&rsquo;s younger sister. Midway through the proceedings, Natasha told us she would feel better if her mother could be in the briefing room. I immediately agreed to her request. [<a href="#notes">4</a>]</p>
<h2>The Outcome</h2>
<p>Natasha succeeded in correctly matching four target conditions out of a possible seven. Our protocol required that Natasha get five or more correct matches to &ldquo;pass&rdquo; our test.</p>
<p>Understandably, Natasha&rsquo;s supporters were disappointed. They expressed their misgivings about the test on the television documentary, in media interviews, on Web sites, and through e-mails. They accused the testers of bias and of deliberately manipulating the procedure to prevent Natasha from succeeding. Natasha has complained that if she had gotten five correct she would have been a success. Isn't four close enough?</p>
<p>Our answer is that five was the minimum score that everyone agreed upon. It was also the minimum score that would convince us of a possible ability to diagnose subjects with sufficient reliability to be useful. We designed our test to detect a large effect. We were looking for something that would distinguish Natasha&rsquo;s claims from many similar ones. We wanted a good reason to justify using the additional time and resources to investigate her ability further.</p>
<p>Although Natasha&rsquo;s score did not meet our criterion for &ldquo;success,&rdquo; it is possible that she can pick up information about the subject&rsquo;s condition. Some of her choices might show some accuracy on her part, although of a low level. If this is true, her correct matches could be the result of three possibilities:</p>
<ol>
<li>She gathered some information paranormally. That is, she can see into people&rsquo;s bodies, but imperfectly.</li>
<li>She gathered information by deliberately exploiting available clues such as outward appearances and behavior of the subjects.</li>
<li>She obtained information unconsciously from available clues. To me, this is the most likely explanation, other than chance or in addition to chance. Much recent work in psychology demonstrates implicit learning: how people unconsciously learn to exploit a variety of clues, often subtle ones.</li>
</ol>
<p>Both inherent and unforeseen limitations of our test provided possible clues to the target conditions for some subjects. I already discussed the daunting task of finding seven appropriate subjects. We had to settle for a less than optimal set of subjects. These subjects differed sufficiently in outward appearance to provide possible clues about their conditions. Another problem occurred through two violations of the test protocol. Together these problems created the possibility for identifying the target conditions-by external, normal means-for the following four subjects:</p>
<ol>
<li>The &ldquo;control&rdquo; subject, the one who had no internal medical condition, was obviously the youngest of the group. He also looked in good physical condition and appeared much healthier. He was a good candidate for the person with no defects.</li>
<li>The subject with the staples in his chest (because of major heart surgery) was male, the oldest of the group and looked the least healthy. He was an obvious choice for the person with the staples in his chest.</li>
<li>A breach of protocol occurred on the first trial. Natasha posed a question and her interpreter translated it aloud in front of the subjects. The question, contrary to our protocol, allowed the subjects to know that Natasha was looking for the subject with part of her lung removed. Here it was possible that, knowing which condition Natasha was looking for, the subject with the missing lung might have given herself away through bodily reaction.</li>
<li>After the test was over, I learned that Natasha and her companions, because of an apparent misunderstanding, had arrived at the test site before we had expected them. They waited outside the test building where they reportedly observed at least two of the test subjects climb the long flight of stairs and enter the test building. This breach of protocol may have provided them clues about which subjects did or did not have the artificial hip.</li>
</ol>
<p>We do not know if Natasha took advantage of the clues I've described in the previous four paragraphs. However, it is suggestive that these were just the four subjects for whom Natasha achieved her correct matches. The probability that she was relying upon nonparanomal clues increases when we consider her misses. She wrongly picked the subject who was wearing a baseball cap as the one who had the metal plate in his head. Conceivably, she picked this subject because one might assume (falsely in this case) that the subject was trying to cover a scar on his head. We should also emphasize that her failure to correctly match the subject with the metal plate in his head further argues against any fledgling paranormal powers. If she truly can see into bodies, she should have easily detected the large area of missing skull along with the metal plate covering the hole.</p>
<p>Our test included five subjects for whom external clues were available concerning their internal condition. The clues correctly pointed to the true target condition for four subjects. The external clue for the fifth subject falsely pointed to the hole in the skull. In each of these five cases Natasha made her choice consistent with how the external clue was pointing.</p>
<p>Because a single test, even one done under ideal conditions, cannot settle a paranormal claim, we conceived our test as the first stage of a potential series. The first stage would not necessarily rule out nonparanormal alternatives. If Natasha could pass the first stage, this would justify continuing onto the next stage. If she passed that stage, then we would continue studying her claim. On the other hand, if she failed at any of the early stages, this would end our interest in her claim.</p>
<p>Keep in mind that the burden of proof belongs to the parties making an extraordinary claim. Extraordinary claims require extraordinary proof. Our test had its limitations. None of these limitations, however, worked against Natasha&rsquo;s claim. If anything, they may have artificially enhanced her score. Our task was not to prove that Natasha does not have X-ray vision. Rather, Natasha and her supporters had the responsibility to show us that she could perform well enough to deserve further scientific investigation. This they failed to do.</p>
<h2>Acknowledgments</h2>

I thank Richard Wiseman (University of Hertfordshire) and Andrew Skolnick (Commission for Scientific Medicine and Mental Health) for their many constructive criticisms to the earlier drafts of this paper. Richard convinced me to eliminate over half the material I had intended to include. This was a great improvement.
<h2><a name="notes">Notes</a></h2>
<ol>
<li>We debated about how to refer to the seven volunteers who had conditions which Natasha had to detect. Each of the candidate terms such as <em>volunteer, participant, patient,</em> or<em> client</em> seemed ambiguous or not quite correct. Although not completely satisfactory, we decided to refer to these individuals as <em>subjects</em>.</li>
<li>Pfungst, O. 1911. <cite>Clever Hans</cite>. New York: Henry Holt &amp; Co. Also see Vogt, E.Z., and Hyman, R. 2000. <cite>Water Witching U.S.A</cite>. Chicago: University of Chicago Press.</li>
<li>Here is another compromise we had to make in the test. Ideally, everyone in the test situation should be blind as to the true target condition for each subject. In our case, the subjects were not blind to their own conditions. Because the subjects had to be in the test room and Natasha had to study them visually, the test lacked this blindness. The use of the opaque sunglasses hopefully kept the subjects blind as to which target condition Natasha was looking for on a given trial, but this is not completely satisfactory.</li>
<li>At the start of the test some initial confusion existed as to who would be allowed into the test and briefing rooms. This was quickly corrected and Natasha&rsquo;s mother and Will Stewart were given the option of staying in one of these rooms.</li>
</ol>




      
      ]]></description>
    </item>

    <item>
      <title>Hyman&amp;rsquo;s Reply to Schwartz</title>
      <pubDate>Thu, 01 May 2003 13:22:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/hymans_reply_to_schwartz</link>
      <guid>http://www.csicop.org/si/show/hymans_reply_to_schwartz</guid>
      <description><![CDATA[
        



			<p>I cannot, of course, respond in detail within the allotted space to each of <a href="/si/show/how_not_to_review_mediumship_research/">Schwartz&rsquo;s arguments</a>. Instead, I will comment on his major points and conclude with a general reaction to his rebuttal.</p>
<blockquote><p>1. &quot;Hyman resorts to . . . selectively ignoring important information that is inconsistent with his personal beliefs.&quot;</p></blockquote>
<p>In preparing my critique of his research program, I not only read The Afterlife Experiments carefully, I also scrutinized in detail every report of his research that was available. It was not possible to discuss each separate piece of information in my critique. I took each item into account, however, in making my assessment of the research. I chose to focus my discussions on those items that Schwartz and his colleagues had emphasized as the strongest outcomes amongst their findings. I have refereed and reviewed research reports for more than fifty years for many of the major scientific publications and for major granting agencies. I applied the same standards to my evaluation of the afterlife experiments that I have used in my other assessments.</p>
<blockquote><p>2. &quot;. . . Hyman failed to mention the important historical fact that our mediumship research actually began with double-blind experimental designs.&quot;</p></blockquote>
<p>As his example he refers to his experiment with the mediums Susy Smith and Laurie Campbell that &quot;was completed almost a year before we conducted the more naturalistic multi-medium/multi-sitter experiments involving John Edward, Suzanne Northrop, George Anderson, Anne Gehman, and Laurie Campbell. The early Smith-Campbell double-blind studies did not suffer from possible subtle visual or auditory sensory leakage or rater bias &mdash; and strong positive findings were obtained.&quot;</p>
<p>This is a peculiar example to use as a model of a controlled, double-blind experiment. The experiment involved having Susy Smith, designated as Medium One, apparently contact four deceased persons: her own mother, William James, Linda Russek&rsquo;s father, and Schwartz&rsquo;s father. Smith made a drawing for each of these departed individuals supposedly with their input. She also made a &quot;control&quot; drawing. Laurie Campbell, designated as Medium Two, was then requested to independently attempt to contact these departed individuals and, using the information obtained from them, to try to match each drawing to the associated departed individual. Campbell attempted to contact the departed entities during two sessions in the presence of three experimenters. Campbell is described as being &quot;blind&quot; to personalities of the four departed individuals. However Schwartz, who was not blind to the personalities of these entities, was not only present during these sessions but actively trying to convey this information (through  &quot;telepathy&quot;) to Campbell. This unnecessary blunder compromises whatever blinding would have existed between Medium Two and the personalities of the departed individuals. No psychic investigator would be surprised if Laurie Campbell came up with some correct information such as the gender and other descriptors of the departed individuals under these conditions.</p>
<p>Another defect of this phase of the experiment is that no provisions were made to use a systematic and objective method for assessing the accuracy of Medium Two&rsquo;s descriptions. The evaluation of the information for this stage of the experiment was subjective.</p>
<p>During the sittings with Medium Two, all the experimenters were blind as to which drawing was associated with which departed individual. (Although it is plausible that one might be able to make some reasonable guesses, given the characters of each of the departed individuals, which type of drawing would go with each one.) Unfortunately, the experimenters then make another serious, and completely unnecessary, blunder when it came time to see if Medium Two could accurately match the drawings with the appropriate individual. The experimenters brought Medium Two and Medium One together.  Medium One then displayed the drawings she had made to represent each individual. Medium Two then attempted to match the drawings to the appropriate sources in the presence of Medium One. Ironically, the experimenters openly admit that this could allow clues about the correct matching through the &quot;Clever Hans&quot; phenomenon. They dismiss this as possibility because Campbell was able to correctly match only one of the five drawings to its appropriate source.</p>
<p>At this point in the experiment the report becomes especially murky. Presumably, the experiment has failed. However, the experimenters inexplicably have Medium Two try again to match the drawings to their appropriate source. This second attempt is made after she is shown an explicit summary of her comments about the pictures and the departed individuals. Campbell correctly matches the five drawings (including the control) in this second attempt. No reason is given for giving the medium two tries at matching the drawings, nor do the experimenters tell us how they justify asking the medium to redo her matching. Probably these and other questionable aspects of the procedure are moot given that the possibility of blinding was compromised.</p>
<p>Schwartz and his colleagues, in their published paper, describe this as an &quot;exploratory study.&quot; The proceedings seem to have been improvised at each stage. Certainly, no competent investigator would plan to unnecessarily compromise experimental blinding at the two most critical points of the data collection. Nor does it make sense to design an experiment wherein the medium is given two chances at getting the matching correct. I simply was applying the principle of charity in not discussing this botched experiment.</p>
<blockquote><p>3. &quot;In an exploratory double-blind long-distance mediumship experiment . . . Hyman states 'because nothing significant was found, the results do not warrant claiming a successful replication of previous findings.&rsquo; However, Hyman minimizes the fact that the number of subjects in this exploratory experiment was small (n=6). More importantly, Hyman fails to cite a(n) important conclusion that we reached in the discussion: If the binary 66 percent figure approximates (1) LC&rsquo;s actual ability to conduct double-blind readings, coupled with (2) the six sitters&rsquo; ability, on the average, to score transcripts double-blind, the 66 percent figure would require only an n of 25 sitters to reach statistical significance (e.g. &lt; .01).&quot;</p></blockquote>
<p>This part of Schwartz&rsquo;s rebuttal, like all the other parts, strikes me as both bizarre and off the mark. First, we need to clear up some mistakes and/or misunderstandings. Schwartz confuses the sample statistic with the population (or hypothesized true value). Given twenty-five sitters and a sample outcome of seventeen correct identifications (success rate of 68 percent) of their actual readings (which, given the discrete nature of the binomial distribution is the closest we can get to 66 percent correct) the one-tailed probability would be .054 and not less than .01 as Schwartz claims. Regardless of the correct probability value here, this has little to do with power. Schwartz is hypothesizing that the true (population) proportion of correct binary choices in this situation is close to the 67 percent (4 out of 6) that he observed in his sample. If, indeed, this value is correct, then, given his use of a one-tailed test and a significance level of .01, the probability of getting a significant outcome with twenty-five sitters would be slightly more than 0.54. To have a reasonable power (say close to 90 percent) one would need over 100 sitters.</p>
<p>Schwartz appears to be begging the question here. He begins by observing that four out of six sitters correctly identified which of two readings was meant for them. Because of the small sample, this outcome is consistent with a number of possibilities including the chance value of 50 percent. If he had obtained the same proportion of correct hits with a larger sample, then it would have been significant. However, since we cannot tell what the true proportion is from a sample outcome based on only six cases, we have no basis for predicting the outcome for a larger sample. His argument reduces to the trivial one: If the true proportion is 67 percent then we will be able to get a significant outcome with a larger sample. From his actual outcome, we can just as well say: If the true proportion is 50 percent (and this, too, is consistent with his data), then he will very likely not get a significant outcome with a larger sample.</p>
<p>I find it difficult to understand why Schwartz considers this point worthy of mention. Of course a binary outcome with only six trials has very low sensitivity. However, he did not rely on this outcome. He used two other measures, the number of dazzle shots and the hits and misses, which are clearly much more sensitive. These also failed to provide overall significance. For these measures (as well as for the actual choice of the relevant reading), the overall sensitivity would have been greatly enhanced if each sitter actually rated all six readings. In addition to greatly enhanced sensitivity, this would have avoided the unfortunate situation where each sitter was rating his or her own reading against a foil that differed for each rater. Another plus would have been the opportunity to determine which readings had more general appeal independent of any specific information peculiar to a given sitter.</p>
<p>In his longer rebuttal to my critique which he posted on the Web (see his reference in his rebuttal) Schwartz claims he actually predicted that GD would successfully differentiate his own reading from the accompanying foil reading. The claim that this particular outcome was predicted does not square with the opening sentence of the report wherein the experimenters state, &quot;This paper reports an unanticipated replication and extension. . . .&quot;</p>
<p>I have already pointed out in my critique how Schwartz has an unusually liberal interpretation of &quot;replication.&quot; Not only is the statistical and experimental evidence suspect, but the qualitative analysis of the actual reading for GD in the second experiment does not overlap in any important respect with the reading in the earlier experiment. In particular, none of the apparently striking examples of names, events, and places that are reported for the first reading are in the second reading. I agree with Schwartz that the outcome of this &quot;double blind&quot; experiment is consistent with &quot;individual differences in sitter characteristics.&quot; However, borrowing from Schwartz&rsquo;s propensity to resort to Occam&rsquo;s Razor, I believe it is prudent to suggest a much more mundane explanation. We need only assume two very plausible and non-extraordinary assumptions to account for the results: 1) Luck: GD had a 50-50 chance of choosing the correct reading; 2) Rater bias: given that he has chosen the correct reading, he would show a strong response bias to give high marks to the chosen reading and low marks to the rejected one. Note that this is consistent with the qualitative evidence that I provided in my critique. However, note that the burden of proof is not upon the critic to show that this explanation is correct. Rather, the burden of proof should be on Schwartz to show, as the claimant, that he has ruled out this and other possible mundane explanations. This is what good experimental methodology, which is so far lacking in the afterlife experiments, is intended to accomplish.</p>
<p>Unfortunately, I do not have space to respond to other specifics of Schwartz&rsquo;s rebuttal. In his rebuttal he attributes motives, preferences, and biases to me. These are based on assumption unsupported by facts. For example, he characterizes me as &quot;reluctantly&quot; agreeing that fraud is unlikely. In fact, I have no reluctance at all to make such an assertion. He attributes certain preferences to me that are, in some cases, just not true. He also is factually incorrect on some matters. He says that I was one of the group of cold readers who declared that I could, with training, duplicate what his mediums had accomplished in his laboratory. This is wrong. I deliberately refrained from such a commitment. My major point during the meeting with him on cold reading was that the determination of whether his mediums are using cold reading is a separate matter from the question of whether they were conveying any information of a paranormal nature. If he wanted to study the role of cold reading in the readings given by his mediums, that was an experimental goal that was separate from determining if his mediums are providing evidence for the survival of consciousness.</p>
<p>Nor did I conclude, contrary to Schwartz&rsquo;s implication, that his mediums were using cold reading. I did observe &mdash; and I specifically emphasized that this was a subjective opinion &mdash; that I could see little difference between the utterings of his mediums and those of the typical psychic reader. I want to emphasize again, it is not for me, or other critics, to show that his mediums are using cold reading or some other ploys. The burden of proof is on Schwartz to show that he has convincingly eliminated such possibilities.</p>
<p>So far as I can tell, Schwartz has really not answered my criticisms. A close reading reveals that he does not deny the various failings I have divulged in his research. Instead, he defends the departures from proper experimental methodology on a number of grounds: 1) he and his colleagues were aware of these defects and actually admitted so in their reports (but such admissions do not somehow neutralize the defects); 2) there were practical reasons such as wanting to provide a more naturalistic context (but this does not excuse using inappropriate control comparisons, failing to correct for rater bias, using inappropriate probability and statistical computations, etc.); 3) some of the &quot;defects&quot; were deliberately included to check on certain questions (but this does not justify drawing strong conclusions); and 4) that taken in their totality the experiments somehow provide powerful evidence for anomalous communication even if the individual experiments are flawed (actually, repeatedly making similar mistakes from experiment to experiment compounds rather than compensates for the errors).</p>
<p>Despite the deficiencies in his experiments, Schwartz seems convinced that his mediums have provided, in some cases, specific and unique information including names, places, etc., that the critics cannot explain away. For one thing, these apparently specific items are much fuzzier than he believes. His examples are selected just because they appeared to contain such specifics. This raises the difficult question of how to actually assess how much of this is just coincidence. Furthermore, even the most specific and concrete match is problematical because practically no constraints are placed upon the sitter in finding a suitable match (e.g., it can be a dead or a living person; it can be someone close to the sitter or a mere acquaintance; etc.). No actual check is made as to how close the match actually is. My point here is that Schwartz really has provided us with nothing to explain. We do not know if he has produced anything worth taking seriously until he can convincingly demonstrate that he has obtained his data under methodologically appropriate conditions. Science demands this in the conventional fields of inquiry. We should demand no less from Schwartz.</p>




      
      ]]></description>
    </item>

    <item>
      <title>How Not to Test Mediums: Critiquing the Afterlife Experiments</title>
      <pubDate>Wed, 01 Jan 2003 13:22:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/how_not_to_test_mediums_critiquing_the_afterlife_experiments</link>
      <guid>http://www.csicop.org/si/show/how_not_to_test_mediums_critiquing_the_afterlife_experiments</guid>
      <description><![CDATA[
        



			<p class="intro">Professor Gary Schwartz makes revolutionary claims that he has provided competent scientific evidence for survival of consciousness and&mdash;even more extraordinary&mdash;that mediums can actually communicate with the dead. He is badly mistaken. The research he presents is flawed, and in numerous ways. Probably no other extended program in psychical research deviates so much from accepted norms of scientific methodology as this one.<a href="#note_1">1</a></p>


Gary Schwartz is professor of psychology, medicine, neurology, psychiatry, and surgery at the University of Arizona. After receiving his Ph.D. in personality psychology from Harvard University, he taught at Harvard and then at Yale University for twenty-eight years as a professor of psychology and psychiatry. He has published more than 400 scientific papers. He came to the University of Arizona in 1988 to do research on, among other things, the relationship between love and health. In 1993 he met Linda Russek and married her soon after. Linda was still grieving over the death of her father. Soon after she met Schwartz, Linda asked him, &quot;Do you think it is possible that my father is still alive?&quot;
<p>That question triggered a research program to answer it and the more general question of survival of consciousness. At first the program was conducted in secret and then became public around 1997. Since 1997, Schwartz has reported a number of studies in which he and his coworkers have observed some talented mediums such as John Edward and George Anderson give readings to sitters in his laboratory. This work has attracted considerable attention because of Schwartz&rsquo;s credentials and position. Even more eye-opening is Schwartz&rsquo;s apparent endorsement of the mediums&rsquo; claims that they are actually communicating with the dead.</p>
<div class="image right">
<img src="/uploads/images/si/medium-schwartz.jpg" alt="Gary E. Schwartz" />
<p>Gary E. Schwartz</p>
</div>
<p>For Schwartz this conclusion follows from the famous principle known as Occam&rsquo;s Razor. Schwartz paraphrases Occam&rsquo;s principle as &quot;All things being equal, the simpler hypothesis is usually the correct one.&quot;<sup><a href="#note_2">2</a></sup> As Schwartz sees it, &quot;The best experiments [supporting the reality of communicating with the dead] can be explained away, only if one makes a whole series of assumptions. . . .&quot; These assumptions include:</p>
<ol>
<li>that mediums use detectives to gather some of their information;</li>
<li>that sitters falsely remember specific facts such as the names of relatives;</li>
<li>that the mediums are super guessers;</li>
<li>that mediums can interpret subtle cues such as changes in breathing to infer specific details about the sitter and her relatives; and</li>
<li>that the mediums use super telepathy to gather facts about the sitter&rsquo;s deceased friends and family.</li>
</ol>
<p>According to Schwartz, such assumptions create unnecessary complexity. &quot;However, if we were to apply Occam&rsquo;s Razor to the total set of data collected over the past hundred years, including the information you have read about in this book, there is a straightforward hypothesis that is elegant in its simplicity. This is the simple hypothesis that consciousness continues after death. This hypothesis accounts for all the data&quot; [p. 254].</p>
<div class="image left">
<img src="/uploads/images/si/medium-book.jpg" alt="book cover for the Afterlife Experiments" />
</div>

Schwartz&rsquo;s new book <cite>The Afterlife Experiments</cite> presents evidence from a series of five reports in which Schwartz and his associates observed mediums give readings to sitters &quot;in stringently monitored experiments.&quot; Schwartz does admit that his experiments were not ideal. For example, only the very last in his sequence of studies used a truly double-blind format. Yet he insists that the mediums, although often wrong, consistently came up with specific facts and names about the sitters&rsquo; departed friends and relatives that the skeptics have been unable to explain away as fraud, cold reading, or lucky guesses. He provides several examples of such instances throughout the book. These examples demonstrate, he believes, that the readings given by his mediums are clearly different from those given by cold readers and less gifted psychics. &quot;If cold readings are easy to spot by anyone familiar with the techniques, the kinds of readings we have been getting,&quot; he asserts, &quot;in our laboratory are quite different in character.&quot;
<h2>Could It Be Cold Reading?</h2>
<p>Now it so happens that I have devoted more than half a century to the study of psychic and cold readings. I have been especially concerned with why such readings can seem so concrete and compelling, even to skeptics. As a way to earn extra income, I began reading palms when I was in my teens. At first, I was skeptical. I thought that people believed in palmistry and other divination procedures because they could easily fit very general statements to their particular situation. To establish credibility with my clients, I read books on palmistry and gave readings according to the accepted interpretations for the lines, shape of the fingers, mounds, and other indicators. I was astonished by the reactions of my clients. My clients consistently praised me for my accuracy even when I told them very specific things about problems with their health and other personal matters. I even would get phone calls from clients telling me that a prediction that I had made for them had come true. Within months of my entry into palm reading, I became a staunch believer in its validity. My conviction was so strong that I convinced my skeptical high school English teacher by giving him readings and arguing with him. I later also convinced the head of the psychology department where I was an undergraduate.</p>
<p>When I was a sophomore, majoring in journalism, a well-known mentalist and trusted friend persuaded me to try an experiment in which I would deliberately read a client&rsquo;s hand opposite to what the signs in her hand indicated. I was shocked to discover that this client insisted that this was the most accurate reading she had ever experienced. As a result, I carried out more experiments with the same outcome. It dawned on me that something important was going on. Whatever it was, it had nothing to do with the lines in the hand. I changed my major from journalism to psychology so that I could learn why not only other people, but also I, could be so badly led astray. My subsequent career has focused on the reasons why cold readings can appear to be so compelling and seemingly specific.</p>
<p>Psychologists have uncovered a number of factors that can make an ambiguous reading seem highly specific, unique, and uncannily accurate. And once the observer or client has been struck with the apparent accuracy of the reading, it becomes virtually impossible to dislodge the belief in the uniqueness and specificity of the reading. Research from many areas demonstrates this finding. The principles go under such names as the fallacy of personal validation, subjective validation, confirmation bias, belief perseverance, the illusion of invulnerability, compliance, demand characteristics, false uniqueness effect, foot-in-the-door phenomenon, illusory correlation, integrative agreements, self-reference effect, the principle of individuation, and many, many others. Much of this is facilitated by the illusion of specificity that surrounds language. All language is inherently ambiguous and depends much more than we realize upon the context and nonlinguistic cues to fix its meaning in a given situation.</p>
<p>Again and again, Schwartz argues that the readings given by his star mediums differ greatly from cold readings. He provides samples of readings throughout the book. Although these samples were obviously selected because, in his opinion, they represent mediumship at its best, every one of them strikes me as no different in kind from those of any run-of-the-mill psychic reader and as completely consistent with cold readings. In August 2001, Schwartz assembled a panel of seven experts on cold reading, including me, to instruct him on the topic. We were shown videotapes of Suzane Northrup and John Edward giving readings in his laboratory. Most members of the panel were openly sympathetic to Schwartz&rsquo;s goals and program. Yet we all agreed that what we saw Northrup and Edward doing was no different from what we would expect from any cold reader.</p>
<p>I am sure that Professor Schwartz will strongly disagree with my observation that the readings he presents as strong evidence for his case very much resemble the sorts of readings we would expect from psychic readers in general and cold readers in particular. This disagreement between us, however, relies on subjective assessment. That is why we have widely accepted scientific methods to settle the issue. That is why it is important, especially for the sort of revolutionary claims that Schwartz wants to make, that it be backed up by competent scientific evidence. Throughout his 2002 book <cite>The Afterlife Experiments</cite>, Schwartz implies that he has already provided such evidence.</p>
<p>This, as I will explain, is badly mistaken. The research he presents is flawed. Probably no other extended program in psychical research deviates so much from accepted norms of scientific methodology as this one does.</p>
<h2>Is the Research Fundamentally Flawed?</h2>
<div class="image right">
<img src="/uploads/images/si/medium-1.jpg" alt="Gary E. Schwartz using a computer" />
<p>Gary E. Schwartz examines data from his experiments. Frames from <cite>Dateline NBC</cite></p>
<img src="/uploads/images/si/medium-2.jpg" alt="two people sitting with a screen between them" />
<p>One of the tested mediums, left, tries to get information from the sitter.</p>
<img src="/uploads/images/si/medium-3.jpg" alt="John Edward is tested by Gary Schwartz" />
<p>John Edward is tested by Gary Schwartz</p>
</div>
<p>Although never going so far as to claim his research methodology is ideal, he apparently believes it is adequate to justify his conclusions that his mediums are communicating with the dead. He writes, &quot;Skeptics who claim that this is some kind of fraud the mediums are working on us have nonetheless been unable to point out any error in our experimental technique to account for the results&quot; (p. xxii). Later he asserts, &quot;The data appear to be real. If there is a fundamental flaw in the totality of the research presented in these pages, the flaw has managed to escape the many experienced scientists who have carefully examined the work to date&quot; (p. 13).</p>
<p>These statements perplex me greatly. I have carefully itemized not one but several &quot;fundamental&quot; flaws in Schwartz&rsquo;s afterlife experiments. I confronted Schwartz with this listing of flaws at two public meetings where we shared the same platform. I also brought them up again at the panel on cold reading that he convened. The other members of the panel also pointed to flaws. And Wiseman and O'Keeffe<sup><a href="#note_3">3</a></sup>
pointed to serious problems with Schwartz&rsquo;s first two published studies in the areas of judging bias, control group biases, and sensory leakage. I would have to make this article almost as long as Schwartz&rsquo;s book to explain adequately each flaw. Because any one of these flaws by itself would suffice to invalidate his experiments as acceptable evidence, I will discuss only a few of these here. First, I will list here the major types of flaws in the experiments described in his first four reports (I will deal with the fifth report separately below):</p>
<ol>
<li>Inappropriate control comparisons</li>
<li>Inadequate precautions against fraud and sensory leakage</li>
<li>Reliance on non-standardized, untested dependent variables</li>
<li>Failure to use double-blind procedures</li>
<li>Inadequate &quot;blinding&quot; even in what he calls &quot;single blind&quot; experiments</li>
<li>Failure to independently check on facts the sitters endorsed as true</li>
<li>Use of plausibility arguments to substitute for actual controls</li>
</ol>
<p>The preceding list refers to defects in the conduct of the experiments and in the gathering of the data. Other very serious problems appear in the way Schwartz interprets and presents the results of his research. These include:</p>
<ol start="8">
<li>The confusion of exploratory with confirmatory findings</li>
<li>The calculation of conditional probabilities that are inappropriate and grossly misleading</li>
<li>Creating non-falsifiable outcomes by reinterpreting failures as successes</li>
<li>Inflating significance levels by failing to adjust for multiple testing and by treating unplanned comparisons as if they were planned.</li>
</ol>
<p>Other problems involve failure to use adequate randomization procedures, using only sitters who are predisposed to the survival hypothesis, inappropriate statistical tests, and other common defects that plague new research programs. Even if the research program were not compromised by these defects, the claims being made would require replication by independent investigators. Perhaps Schwartz&rsquo;s most serious misconception is seen in his attempt to shift the burden of proof from himself to the skeptics.</p>
<p>The worst mistake made by Schwartz and his colleagues was to publish the results they have obtained so far. Instead, they should have first tried to gather evidence for their hypothesis that would meet generally accepted scientific criteria. By submitting their very inadequate studies to public scrutiny and by demanding that skeptics &quot;explain away&quot; their defective data, they have lost credibility. In addition, the journals that did accept these studies for publication and Schwartz&rsquo;s panel of Friendly Devil&rsquo;s Advocates have also suffered greatly in credibility.</p>
<h2>Schwartz&rsquo;s Inadequate and Inappropriate Response to Criticisms</h2>
<p>Schwartz&rsquo;s responses to criticisms such as those made by Wiseman and O'Keeffe obscure rather than clarify matters.<sup><a href="#note_4">4</a></sup> For example, regarding his failure to provide safeguards against sensory leakage, he complains that Wiseman and O'Keeffe &quot;curiously did not mention that we were fully cognizant of such issues and were actively researching them at the time the Schwartz et al. paper was published.&quot; The fact that the researchers were aware that they had not provided adequate safeguards against sensory leakage does not in any way make their data more acceptable. Indeed, if they were aware of how to properly control for this flaw, it is even more inexcusable that they failed to do so. Why did they publish data they knew to be compromised and try to pass them off as legitimate science?</p>
<p>Indeed, Schwartz actually states that he deliberately allowed for some sensory leakage to see if &quot;the remaining subtle cues&quot; could explain the subsequent accuracy of the mediums&rsquo; statements. He also states that he wanted to begin with &quot;a semi-naturalistic design . . . to develop a professional relationship with the mediums. . . .&quot; If, in fact, this was his rationale for using an inadequate design, then he should have treated the study as a preliminary probe to see if the mediums could work under laboratory conditions. Such a preliminary or pilot study, however, should then be followed up with a formal, properly conducted experiment. Knowing how to properly control for sensory leakage in no way licenses the publishing of flawed data to support a hypothesis.</p>
<p>In defending himself against the charge of sensory leakage, Schwartz uses another tactic that violates acceptable scientific conduct. He tries to shift the burden of proof onto the skeptic: &quot;Skeptics who speculate that 'cold reading' can achieve similar results have a responsibility to show that identical findings can be obtained <em>under the conditions used in the Schwartz et al. research</em> (e.g., the single-blind sitter-silent condition that effectively rules out pre-experimental information and verbal feedback). We welcome such experiments.&quot;</p>
<p>Sorry, Professor Schwartz. The skeptics and the scientific community have no responsibility to show anything until you provide them with data collected according to well-established and acceptable standards. The responsibility is yours to first provide us with evidence for your hypothesis of survival of consciousness that is gathered according to the appropriate scientific standards which include controlling for sensory leakage; devising dependent variables that are relevant, reliable, and valid; and using control comparisons that are meaningful.</p>
<p>Schwartz&rsquo;s rejoinders to Wiseman and O'Keeffe&rsquo;s other two topics of criticism are even more disturbing. His response to the charge of possible judging bias is that, &quot;The purpose of the original Schwartz et al. experiments (2001) was <em>not</em> to rule out possible rater bias, but to minimize it.&quot; He again tries to shift the burden of proof to the skeptic, by arguing that it is implausible to speculate that his sitters would exhibit rater bias on such things as names, relationships, and the like. Indeed, it is highly plausible to me that some sitters might acquiesce to statements that are demonstrably false. However, science exists as a way to avoid arguments over plausibility. Minimizing rater bias is not the same as precluding it. If he wants to claim scientific acceptance for his evidence then he has to gather the data under conditions that eliminate or adequately correct for such bias. Even worse is his rejoinder to the claim that he used an inappropriate control group. &quot;The purpose of the original...experiments was not to include an ideal control group, but rather to address, and possibly rule out (or in) one possible explanation for the data&mdash;i.e., simple guessing.&quot;</p>
<p>This last statement is both confusing and wrong. I suspect that Schwartz means by &quot;an ideal control group&quot; one made up of individuals who are the same age and have the same sort of experience as his mediums. Since his actual control group consisted of undergraduate students who had no prior experience as mediums, the group was obviously not ideal in this sense. However, what Wiseman and O'Keeffe are criticizing is that this control group in no way provides a proper comparison or baseline for the &quot;accuracy ratings&quot; of the mediums by the sitters. This is for the simple reason that the control group was given a task that differed in very important ways from that of the mediums. There is no way that the results from this control group could provide a comparison or baseline for simple guessing.</p>
<p>The mediums are free to make statements about possible contacts, names, relations, causes of death, and other matters. In the earlier experiments they were given &quot;yes&quot; and &quot;no&quot; replies from the sitters and in later experiments they typically began a segment without feedback and then went through an additional segment with feedback. The sitters were free to find matches within the output of the medium to fit their particular circumstances. Later the sitter was given a transcript of the entire reading and rated each statement for how accurately it applied to her situation. The statements that got the highest rating were counted as hits. The proportion of such hits varied from approximately 73 to 90 percent in the earlier experiments and somewhat lower in the later ones.</p>
<p>In contrast, the control subjects were given a series of questions based on a reading given to their first sitter. Statements from the readings were converted into questions that could be answered in such a way that the answer could be scored correct or incorrect. For example, if the medium had correctly guessed the cause of the sitter&rsquo;s mother&rsquo;s death, a question given to the controls might be, &quot;What was the cause of her mother&rsquo;s death?&quot; Schwartz and his colleagues report that the average percentage of correct answers by the controls was 36 percent. Because the &quot;accuracy&quot; of the mediums was much higher, the researchers conclude that the mediums had access to true information that cannot be explained away as guessing.</p>
<p>Wiseman and O'Keeffe correctly point out that this is an inappropriate comparison. Although Schwartz claims that, if anything, the controls had an advantage over the mediums, the use of the results for the control groups as a baseline for the mediums is completely meaningless. Wiseman and O'Keeffe provide several reasons why. In addition to the reasons they give, a more fundamental one is that the score for the controls does not involve subjective ratings by the sitters <em>while the accuracy scores for the mediums depend entirely upon the judgment of these sitters</em>. We have no idea how well the mediums could do if given the same task as the controls. I strongly suspect they could not perform any better.</p>
<p>The accuracy score for the medium is completely dependent on the subjective decisions of the sitter. The very first example of a reading provided in this book begins as follows:</p>
<blockquote>The first thing being shown to me is a male figure that I would say as being above, that would be to me some type of father image. . . . Showing me the month of May. . . .They're telling me to talk about the Big H-um, the H connection. To me this an H with an N sound. So what they are talking about is Henna, Henry, but there&rsquo;s an HN connection. (p. xix)</blockquote>
<p>The sitter identified this description as applying to her late husband, Henry. His name was Henry, he died in the month of May and was &quot;affectionately referred to as the 'gentle giant.'&quot; The sitter was able to identify other statements by the medium as applying to her deceased spouse.</p>
<p>Note, however, the huge degree of latitude for the sitter to fit such statements to her personal situation. The phrase &quot;some type of father image&quot; can refer to her husband because he was also the father to her children. However, it could also refer to her own father, her grandfather, someone else&rsquo;s father, or any male with children. It could easily refer to someone without children such as a priest or father-like individual&mdash;including Santa Claus. It would have been just as good a match if her husband had been born in May, had married in May, had been diagnosed with a life-threatening illness in May, or considered May as his favorite month. The &quot;HN&quot; connection would fit just as well if the sitter&rsquo;s name were Henna or her husband had a dog named Hank.</p>
<p>Schwartz concludes that, &quot;No other person in the sitter&rsquo;s family fit the cluster of facts 'father image, Big H, Henry, month of May' except her late husband, Henry.&quot; Of course not! If that person, or any other, also found a match for their personal life, it too would be unique. When I put myself in the shoes of a possible sitter and try to fit the reading to my situation, I can find a good fit to my father, who was physically large, whose last name was Hyman, and for whom, like any human on this planet, experienced one or more notable events in the month of May. Other things in the reading also can easily be fitted to my father. Neither the original sitter nor anyone else would fit this cluster of facts! Schwartz makes much of the fact that the cluster of facts that a sitter extracts from a reading tend to be unique for that sitter. He even calculates the conditional probabilities of such a cluster occurring just by chance. Naturally, these conditional probabilities are extremely low&mdash;often with odds of over a trillion-to-one against chance.</p>
<p>The &quot;accuracy&quot; score for the medium, as calculated by the experimenters, depends critically on the sitter&rsquo;s ratings. This allows subjective validation<sup><a href="#note_5">5</a></sup> and uncontrolled rater biases to enter the picture on the side of the mediums. The sitters were deliberately selected because they were already disposed towards the survival hypothesis (that consciousness survives death). Given the statement &quot;some type of father image,&quot; the sitter easily fit this to her late husband who was the father of her children. For her, this would get the highest accuracy rating. A more skeptical sitter, realizing the ambiguity in the statement, might give it a lower rating. Given the statement &quot;showing me the month of May,&quot; the committed sitter would rate it accurate because her husband actually died in the month of May. A less committed sitter might rate it as less accurate because she realizes that this statement could apply to any significant event that happened to her husband, herself, or her family in May. From the example above, if I were a committed sitter receiving the same reading, I could see myself giving it a score of five out of five (or 100% accuracy) because my father (obviously a type of father image), experienced one or more significant events in May (<em>showing me the month of May</em>), was large and overweight and named Hyman (<em>about the Big H-um, the H connection...an H with an N sound</em>).</p>
<p>Compare this with the task confronting the control subjects. They would be given a series of questions based on this reading which might go as follows:</p>
<ol>
<li>What was the relation of the deceased to the sitter?</li>
<li>What was the name of the sitter&rsquo;s husband?</li>
<li>In what month did he die?</li>
<li>How was he described by his friends?</li>
</ol>
<p>The control students would have to come up with the answers <em>husband, Henry, May</em>, and <em>big</em> to get a perfect score. The likelihood of anyone, including the mediums, getting all these correct, or even a high percentage of them correct, is very small indeed. It is obvious that this a completely different task from the one performed by the mediums. A strikingly obvious difference is that the sitter&rsquo;s judgments and biases are completely removed from the task given the controls. Indeed, it is just these potential biases and subjective judgments being made by the sitters that obviously cries out for controlling.</p>
<h2>Conditional Probabilities</h2>
<p>One way that Schwartz assesses the likelihood that his mediums are obtaining their &quot;hits&quot; just by chance guessing is to calculate conditional probabilities of getting a certain pattern of statements that would match the sitter&rsquo;s situation. In the excerpt from the reading I have been using as an example, he might estimate the probability of getting the gender of the sitter&rsquo;s husband as 1/2; the probability of indicating that he was dead as 1/2; the probability of correctly guessing that deceased person was the sitter&rsquo;s husband as, perhaps, 1/6; the probability of guessing the month of death as 1/12; the probability of getting the correct name as 1/15; and the probability that of knowing that he was described by friends as &quot;big&quot; as 1/20 (of course, the particular probabilities being made in most of these cases have to be based on assumptions and guesswork, but Schwartz claims that he errs on the conservative side in making such estimates). The combined probability of correctly getting this particular pattern of matches just by chance would simply be the product of these separate probabilities. In my example, the probability of achieving this particular pattern of matches would be less than 1 out of 86,000.</p>
<p>Such a low probability would seem to clearly rule out chance as an explanation for the results. Most of Schwartz&rsquo;s actual calculations typically lead to probabilities of less than one out of a million or even millions. In one case he calculated the probability that the results could have been obtained by guessing as 1 in 2.6 trillion! If these calculations were appropriate they certainly would clearly rule out guessing as an explanation for the mediums&rsquo; apparent successes.</p>
<p>Probability, however, is a very slippery concept. Even experts have gone badly astray in trying to apply it to situations in the real world. Some of the reasons why Schwartz&rsquo;s conditional probability calculations are inappropriate and misleading in this context involve highly technical considerations concerning conditional probabilities, independence, sample spaces, and the like. However, you can realize something must be wrong here when you consider that these same types of calculations also provide very low probabilities for any set of matches that any person&mdash;the sitter or someone else&mdash;finds in a given reading. For example, the pattern of matches that I find in the sample reading with respect to my late father yields a probability of guessing that is so low as to also rule out chance. And this will be true for any pattern of matches that anyone can find in the same reading. One problem is that Schwartz&rsquo;s calculations do not take into account the enormous variety of possible combinations that could be extracted from a single reading. Each one would be unique to the person for whom that pattern makes sense.</p>
<p>Ironically, such conditional probability calculations could be justified (with some important reservations) for the task given to the control students. Each question they were posed has an explicit answer. If we can make reasonable assumptions about the probability of getting each answer just by chance, and if we can assume that the answers to each question are independent of each other, then we might legitimately try to estimate the probability of getting all the answers correct by multiplying together the probabilities of correct answers for all the questions. Notice that we can do this only because we defined the total set of possibilities and have not selected, after the fact, just those questions that were answered correctly.</p>
<h2>Reliance on Uncorroborated Sitter Ratings</h2>
<p>This discussion of the reasons why the control comparison and the calculation of conditional probabilities are inappropriate points to one of the most serious weaknesses in this research program. The &quot;accuracy&quot; ratings of the mediums depend entirely upon the judgments of the individual sitters. Each sitter is solely responsible for validating the reading given to him or her. Each sitter is carefully chosen to be someone who is favorably disposed to the survival hypothesis and who wants the medium to be able to communicate with their departed family and friends. Schwartz admits that the &quot;accuracy&quot; ratings from sitters who are not so favorably disposed are much lower. Although this is consistent with rater bias, Schwartz has other explanations. He also believes that just as some mediums are &quot;white crows,&quot; there are also sitters who are &quot;white crows&quot;&mdash;that is, some sitters are prone to get especially good results. In other words, some sitters are more prone to give higher ratings of accuracy than do other sitters.</p>
<p>One simple explanation, consistent with Occam&rsquo;s Razor, is that some sitters are more susceptible to response biases. Schwartz, I am sure, will strongly disagree. This, again, highlights the need for properly conducted research that precludes or adequately corrects for such possible biases. This is why a properly conducted research program requires carefully standardized, reliable, and valid dependent variables; truly double-blind procedures; appropriate control comparisons; and proper controls for sensory leakage. All of these requirements, as I have explained, are lacking in the afterlife experiments.</p>
<p>Schwartz has tried to counter some of these criticisms by pointing to the fact that much of the information provided by the medium consists of factual material that can be independently checked (for example, specific names, relationships, careers, gender, etc.). Yet he has never bothered to make an independent check on these &quot;facts.&quot; He simply accepts the sitters&rsquo; statements. He argues that it is completely unreasonable to believe that one of his trusted sitters would say &quot;yes&quot; to a fact that was untrue.&nbsp; This, of course, is using a plausibility argument in the place of a control that should have been incorporated into the research. Perhaps it is unlikely that a sitter would acquiesce to a factual statement that she or he knows to be untrue. However, his own excerpts from readings given in his book provide one or more examples. In one case, one of his best sitters keeps acquiescing to John Edward&rsquo;s mistaken belief that her husband is dead, even though he is alive and sitting in the next room. As he does over and over again when he encounters what looks like a miss, Schwartz manages to find a convenient explanation to this peculiar situation. He suggests that this could be case of precognition because the sitter&rsquo;s husband was killed in an accident some months after the reading.</p>
<h2>The Laurie Campbell &quot;White Crow&quot; Readings</h2>
<p>The book begins with a quotation from William James. &quot;In order to disprove the law that all crows are black, it is enough to find one white crow.&quot; James was interested in the possibility of psychic phenomena. He believed that it was sufficient to find one truly indisputable example of a psychic occurrence to demonstrate that violations of natural law were possible. Schwartz claims he has uncovered several white crows. The performance of his mediums, especially Laurie Campbell and John Edward, earn them the accolade, in his judgment, of &quot;white crow&quot; mediums. He has also found at least one &quot;white crow&quot; sitter in one of his participants, GD.</p>
<p>GD is a psychiatric social worker who lost his partner, Michael, to AIDS. GD discovered he had mediumistic powers and believed he was in contact with his deceased partner. He took part as one of three sitters in an experiment with the medium Laurie Campbell. The researchers reported that, &quot;Statistically significant evidence for anomalous information retrieval was found for each of the three sitters investigated in this experiment. However, it is the uniqueness and extraordinarily evidential nature of the particular reading highlighted in this detailed report that justifies focusing on this 'white crow' research reading.&quot; In other words, the researchers base their report entirely on the results with this one sitter. Although one of the criteria for the selection of the sitters was their willingness to rate the transcripts of their readings, such ratings were apparently not done at the time this report was written. The experimenters report that GD estimated that the information given by the medium was at least 90 percent accurate. Presumably this was simply a subjective estimate. In the previous experiments the &quot;accuracy&quot; rating was obtained by calculating the proportion of highly rated items among all of the rated items.</p>
<p>Schwartz et al. state that the complete reading took over an hour. They promised that the full transcript will be made available at some future date. So far, I have not seen it, so I cannot judge to what extent this reading might be qualitatively different from the readings that I have witnessed or read that have been given by Laurie Campbell. In the readings I am familiar with, Campbell throws out initials, names, and vague statements that appear to me to characterize the readings from the many psychic readers and mediums I have studied over the past sixty years. I witnessed a public demonstration by her at a conference sponsored by Gary Schwartz and Linda Russek in Tucson in March 2001. I have also carefully studied the complete transcripts of two readings by Campbell.</p>
<p>At first blush the reading given for GD appears qualitatively different. From what we are told, Campbell apparently stated that the recipient of the reading was named George (true) even though she was supposedly completely blind to his identity. She also correctly indicated that the primary deceased person for GD was a male named Michael (true). She also provided the name &quot;Alice&quot; and later, during the interactive part of the reading, correctly stated that this was GD&rsquo;s deceased aunt. Among the list of names she included in her reading was one that she said sounded like <em>Talya, Tiya,</em> or <em>Tilya</em>. GD has a friend that he calls &quot;Tallia.&quot; Campbell mentioned a deceased dog whose name began with an &quot;S.&quot; GD had a beloved dog with an &quot;S&quot; name (but not the name used by Campbell). Other names were also relevant including that of GD&rsquo;s father &quot;Bob.&quot; The researchers cite other qualitative hits that they believe provide powerful evidence that Campbell is getting information from a paranormal source.</p>
<p>This paranormal source, the authors argue, is not simply extrasensory perception based on GD&rsquo;s thoughts. This is because in the interactive phase of the reading &quot;not only were each of the <em>four primary people described accurately by Campbell, but four additional facts not known by GD</em> and later confirmed by sources close to GD indicated that exceptionally accurate information was obtained for GD&rsquo;s deceased and close friends.&quot; Because of this, Schwartz argues that the medium is most likely getting her information from the deceased individuals rather than from the sitter&rsquo;s thoughts. At the time of the reading, GD mistakenly thought that Campbell had erred by stating that the granddaughter of his aunt Alice was named &quot;Katherine&quot; because he believed the name was spelled &quot;Catherine.&quot; When GD later checked, he discovered that his cousin&rsquo;s name was indeed spelled with a K instead of the C that he was thinking during the reading. Another striking example is where Campbell said &quot;that M [Michael] showed her where he lived; somewhere in Europe, and his parents have a 'heavy accent&rsquo; (M was German). Laurie Campbell reported that M was showing her a big city, and then M was traveling through the countryside to his home. . . . Campbell claimed that M showed her an old, stone 'monastery&rsquo; on the edge of the river on the way to his parent&rsquo;s home. This information was not known to GD prior to the reading. After the reading, GD telephoned M&rsquo;s parents in Germany and learned that there was an old abbey church along the river&rsquo;s edge on the way to their house, and that they had held a service for M in this monastery-like stone building a few weeks prior to the experiment.&quot;</p>
<p>These are examples from this reading that Schwartz insists that the skeptics cannot explain away in terms of normal causes such as guessing and cold reading, fraud, or unwitting sensory leakage. However, the experiment is compromised by so many serious defects that it would be futile for a skeptic to accept this challenge. This would be another example of placing the burden of proof on the wrong shoulders. Although the experimenters try to make a plausible argument against collusion between Campbell and GD, as well as against the possibility that Campbell might somehow have gotten access to the manuscript of GD&rsquo;s forthcoming book (a copy of which was in Schwartz&rsquo;s) possession, the actual controls against such sensory leakage were not very convincing. Indeed, the authors partially acknowledge this defect. &quot;Since the exceptional nature of the data reported here was not anticipated ahead of time, the experiment did not include additional desirable controls. . . .&quot; Although I see no reason to assume that fraud did occur in this instance, I believe that the experimenters have an obligation to their mediums and sitters, as well as to the scientific community, to take all reasonable steps to preclude fraud as a possibility. By taking such steps they protect their subjects from any suspicions that might arise in this area.</p>
<p>The results would have become more interesting if they had been collected under double-blind conditions&mdash;that is, under conditions where Campbell, GD, and the experimenter, Schwartz, were all in ignorance of one another at the time of the reading. Schwartz calls the experiment &quot;single-blind&quot; because at the time of the reading (at least the first portions of it), GD did not know who the medium was and Campbell did not know who the sitter was and was separated from him by a thousand miles. Unfortunately, the experimenter, who <em>did</em> know the identity of the sitter as well as quite a bit of his personal history, was with Campbell at the time she was giving much of the reading. Psychical researchers have a long history of dismissing data collected with this weakness as non-evidential.</p>
<p>Probably the most serious weakness of this experiment is that its outcome relies entirely upon the uncorroborated judgments of the sitter GD. Again, Schwartz relies on plausibility arguments for the reliability and validity of GD&rsquo;s ratings of the reading. This is a major defect for many reasons. One is simple rater bias. Individuals can differ widely as to what they will or will not accept as valid for their personal situation. When Campbell says that she is hearing a name that sounds like <em>Talya, Tily</em>, or <em>Tilya</em>, a sitter with a strict criterion might not accept this as referring to a friend whose name is <em>Tallia</em>. On the other hand, a sitter with a looser criterion and who is convinced that the medium is talking about his situation might accept Campbell&rsquo;s probe as referring to a friend with the name of <em>Tanya, Tina, Tilda, Tony, Dalia, Natalie</em>, or a variety of other possibilities. Schwartz may be right that it is unlikely that GD would misremember or misreport having a friend by the name of Tallia. However, if the outcome of this reading is so earth-shaking and scientifically revolutionary as he claims it is, I would think that he should at least make the effort to independently check on some of these facts. </p>
<p>This is especially true for &quot;facts&quot; that were unknown to GD at the time of the reading, but were later discovered by him to be true. For example, when GD called M&rsquo;s parents in Germany, how did the questioning take place? Did they speak in German or English? How well does GD speak German? How well do M&rsquo;s parents speak and understand English? Did GD ask the questions in a leading way? Certainly it would have been highly desirable for the experimenters to have independently communicated with the M&rsquo;s parents. Indeed, it would have been better if they, rather than GD, did all the checking. Instead, everything depends upon GD. Such reliance on a single individual in such circumstances is called by psychologists &quot;the fallacy of personal validation.&quot;</p>
<h2>&quot;Replication&quot; of the Laurie Campbell/GD Reading in a Double-Blind Experiment</h2>
<p>What is required, of course, is a successful replication of these apparently spectacular results in a reading conducted under properly double-blind conditions. Indeed, this is precisely what Schwartz claims he has achieved. He and his colleagues finally conducted a double-blind experiment using Campbell as the medium and six sitters, one of whom was GD. During the readings, Campbell and the sitters had no contact and the two experimenters who were with Campbell were blind to the order in which the sitters were run. Later each sitter was sent two transcripts to judge. One was of the actual reading for that sitter and the other was of a reading given to another subject. The sitters were given no clues as to which was their actual reading. &quot;The question was, even under blind conditions, could the sitters determine which of the readings was theirs?&quot;</p>
<blockquote>The findings were breathtaking. Once again it was George Dalzall&rsquo;s [GD&rsquo;s] reading [that] stood out. . . . This provided incontrovertible evidence in response to the skeptics&rsquo; highly implausible argument against the single-blind study that the sitter would be biased in his or her ratings (for example, misreading his deceased loved ones&rsquo; names and relationships) because he knew the information was from his own reading. . . . The skeptics&rsquo; complaint becomes a completely and convincingly impossible argument in the case of the double-blind study. . . . It appeared to be the ultimate &quot;white crow&quot; design. . . . (p. 236)</blockquote>
<p>As these quotations reveal, Schwartz believes this double-blind experiment has put to rest all the skeptical arguments against his evidence. One of Schwartz&rsquo;s mantras in relation to his afterlife experiments is <em>let the data speak</em>. When I read the full the report<sup><a href="#note_6">6</a></sup> of this &quot;ultimate 'white crow&rsquo; design,&quot; the data did speak loud and clear. However, the story the data told is just the opposite from the one that Professor Schwartz apparently hears.</p>
<p>The plan of the study was admirably simple. Campbell gave readings to the six sitters in an order that neither she nor the experimenter who was with her knew. In this way neither the medium nor the person in her presence was aware of who the sitter was at the time of the reading.<sup><a href="#note_7">7</a></sup> At the time of the reading, the sitter was physically separated from the medium. The medium gave her readings in Tucson, Arizona, while the sitters were in their homes in different parts of the country. Subsequently, each sitter was mailed two transcripts. One of the transcripts was the actual reading for that sitter and the other was from the reading of another sitter. Each sitter rated the two transcripts, not knowing which was the one actually intended for her or him, according to instructions provided by the researchers. The sitter first circled every item in the transcripts which they judged to be a &quot;dazzle shot.&quot; &quot;For you, a dazzle shot is some piece of information&mdash;whatever it is <em>to you</em>, that you experience as 'right on&rsquo; or 'wow&rsquo; or 'that&rsquo;s my family.&rsquo;&quot; Next, the sitter was instructed to go through the transcripts again and score each item as a hit, a miss, or unsure.</p>
<p>Finally, the sitter designated which of the two transcripts was the one that actually was intended for him or her.</p>
<p>The hypothesis was that if Campbell could truly access information from the sitter&rsquo;s departed acquaintances, this would show up on all three measures. In other words, the sitters would successfully pick their own reading from the two transcripts; they would record significantly more dazzle shots in their own transcripts as compared with the control transcripts; and they would find many more hits and fewer misses in the actual as opposed to the control transcript. <em>Each one of these three predictions failed</em>. Four of the sitters did correctly pick their own transcript, but this is consistent with the chance expectation of three successes. On the two more sensitive measures, there were no significant differences in number of dazzle shots or hits and misses.</p>
<p>The authors admit that for the overall data, &quot;there was no apparent evidence of a reliable anomalous information retrieval effect.&quot; So how can they use these results to proclaim a &quot;breathtaking&quot; vindication of their previous findings? This is because, when they looked at the results separately for each sitter, they discovered that in the case of GD, who had been the star sitter in a previous experiment with Campbell, he not only successfully identified his own transcript but also found nine dazzle shots in this transcript and none in the control. The results for the hits and misses were equally striking. He found only a few misses in his own transcript and a large number of misses in the control. He found many hits in his own transcript and not a single one in the control transcript. Given this &quot;unanticipated replication,&quot; the authors hail the results as compelling support for their survival hypothesis. However, for anyone trained in statistical inference and experimental methodology, this will appear as just another blatant attempt to snatch victory out of the jaws of defeat. An accepted principle of research methodology is that the reporting of statistical significance from experimental findings derives meaning from the fact that the experimenter specifies <em>in advance</em> which comparisons he or she will test. If the experimenter plans to make many comparisons, then the criteria for statistical significance must be adjusted to take into account that the more comparisons that will be made the more chances there will be to find something &quot;significant&quot; just by chance. In the present case, it was obvious that the planned comparisons involved the overall differences between the ratings of the actual and the control transcripts. The authors do not indicate whether they intended to make adjustments for the fact that they were using three different measures, but, in any case, it does not matter because there were no meaningful differences on any of the three indicators.</p>
<p>Of course, these strictures do not preclude the investigators from noticing unexpected outcomes in their data. Such unplanned outcomes can serve as hypotheses for new experiments. When an experimenter finds unanticipated, but interesting, quirks in the data, he or she cannot draw conclusions until the surprise finding has been cross-validated with new data. The reason for this is simple. Any set of data that is reasonably complex will always, just by chance, display peculiarities. Some statisticians and methodologists do allow testing for unexpected findings by means of &quot;post hoc&quot; tests. Such tests require that the departures be much greater than those needed for planned comparisons before they can be declared &quot;significant.&quot; Furthermore, such post hoc tests on specific subparts of the data are typically licensed only when the overall tests are significant, which is not the case for the present situation.</p>
<p>So, by commonly accepted scientific practice, the experiment has failed to support the hypothesis it was planned to test. Furthermore, because nothing significant was found, the results do not warrant claiming a successful replication of previous findings. For scientific purposes, this is all that need be said. However, it may be edifying to discuss some additional reasons why the claim for a successful &quot;replication&quot; is highly suspect in the present case. Three of the six sitters for this experiment were selected just because Campbell had provided &quot;successful&quot; readings for them in previous experiments. They were included to see if she could do so again. For two of them, the authors admit that she failed. So it is only for GD that, in their view, she apparently succeeded.</p>
<p>Comparing the two readings that Campbell gave GD, I find little to support the claim that the second one replicates the apparent success of the first one. Although a full transcript of the first GD reading is still not available, what was included in the first report strongly suggests that the second reading cannot be considered to be aimed at the same individual for whom the first one was given. GD&rsquo;s major interest in mediumship is to establish contact with his deceased partner Michael. Campbell is given credit in the first reading for stating that there was a deceased friend named Michael and then later that he was the primary person for this sitter. The name Michael or a deceased partner does not come up in the second reading. Ironically, the name Michael does appear in the control reading. In the first reading Laurie Campbell mentions a strange name that sounded like <em>Talya, Tiya</em>, or <em>Tilya</em>. GD stated that he indeed had a friend (living) named Tallia. No such name appears in the second reading. Indeed, of the twenty names Campbell produced in the first reading only three come up in the second reading, and these are such common ones as <em>George, Robert</em> or <em>Bob</em>, and <em>Joe</em> or <em>Joseph</em>. In none of these three cases does she identify whether the person is living or dead or what relationship he has to GD. None of the &quot;specific&quot; facts that she apparently stated during the first reading come up in the second one.</p>
<p>Schwartz claims that the rater bias could not have affected the ratings of this double-blind experiment. A look at GD&rsquo;s dazzle shots and his discussion of the hit and miss data suggests otherwise. His first dazzle shot is &quot;Bob or Robert.&quot; These names occur early in the reading in a statement that goes, &quot;And then I could feel like what I thought was like a divine presence and the feeling of a name Mary or Bob or Robert.&quot; This appears in a context with other names and other general statements, none of which even hint of a father. The second dazzle shot is &quot;George.&quot; Again this appears in a context with no hint that this could be referring to the sitter. Campbell states, &quot;I got like some names like a Lynn, or Kristie, a George.&quot; His third dazzle shot is the statement, &quot;I had the feeling of a presence of an Aunt.&quot; GD identifies this aunt as his aunt Alice, although Campbell does not provide the name Alice anywhere in the reading. I count at least twenty-seven names thrown out by Campbell during this second reading. Actually, she covers a much broader range of names because she typically casts a wide net with statements like: &quot;And an 'M&rsquo; name. More like a Margaret, or Martha, or Marie, something with an 'M.&rsquo;&quot; It is up to the sitter to find a match. As indicated by his dazzle shots, GD is strongly disposed to do so.</p>
<p>In his qualitative commentary, GD was obviously influenced in selecting one of the transcripts as his reading because it begins with the statement, &quot;I kept feeling the presence of a male.&quot; The control reading happens to begin with the statement, &quot;Now, um, to start with I felt like a woman&rsquo;s energy.&quot; GD wrote, &quot;I was impressed that the reading is gender specific and accurate. . . .&quot; Instead of assuming that Campbell was somehow conveying information to GD from his departed relatives, it is just as plausible to assume that once GD decided that the actual transcript was meant for him, then subjective validation took over and did the rest. There is, of course, a 50/50 chance that the actual reading is the one that GD will decide is meant for him. From then on, he would read that transcript as if it were truly describing his departed relatives and reject the other as not relevant.</p>
<p>This conjecture fits well with everything we know about subjective validation and the acceptance of personality sketches that one believes was meant for one&rsquo;s self. Is this far-fetched in GD&rsquo;s case? To me, it seems quite obvious just reading the transcript and looking at GD&rsquo;s ratings. The entire case for the reading&rsquo;s validity is based on the assumption that Campbell is describing GD&rsquo;s summer vacation home on Lake Erie in upstate New York. Given this assumption everything is then interpreted within this context. Of course, Campbell never states that she is describing a summer vacation home. It is GD who makes this connection. As just one of many examples of how GD is creative in making the reading fit his circumstances, he gives Campbell credit for having identified the color of their summer cottage which was painted yellow with white trim on the windows. Campbell does, at one point, say, &quot;And I kept getting colors of like yellow and white.&quot; This is in a context where she is talking about a woman who spends all her time in the kitchen. One could construe this as perhaps describing the interior colors of the kitchen, the woman&rsquo;s clothing, the old mixer she is described as using, among other possibilities. However, the statement is far removed for any mention of the exterior of the house as such. Earlier in the reading she mentions a white house. A little bit further on, she again mentions a house. She immediately follows this with &quot;And I kept seeing the colors of like grays and blues, but that looked real weathered.&quot; Obviously, if the house had been gray and blue, Campbell would have been given credit for a direct hit. GD manages to ignore this and gives Campbell credit for having correctly described the house as yellow and white.</p>
<p>Again, I suspect that Schwartz will disagree with my interpretation. After all, he has already gone on record that this study &quot;provided incontrovertible evidence in response to the skeptics&rsquo; highly implausible argument against the single-blind study that the sitter would be biased in his or her ratings (for example, misrating his deceased loved ones&rsquo; names and relationships) because he knew that this information was from his own reading.&quot; Nevertheless, the data are quite consistent with the possibility that all we have to do to account for his &quot;breathtaking&quot; findings is to assume that they are due to rater bias.</p>
<h2>Conclusions</h2>
<p>So what is the bottom line? <cite>The Afterlife Experiments</cite> describes a program of experiments described in four reports using mediums and sitters. The studies were methodologically defective in a number of important ways, not the least of which was that they were not double-blind. Despite these defects, the authors of the reports claim that their mediums were accessing information by paranormal means and that the application of Occam&rsquo;s Razor leads to the conclusion that the mediums are indeed in contact with the departed friends and relatives of the sitters. Schwartz&rsquo;s demand that the skeptics provide an alternative explanation to their results is clearly unwarranted because of the lack of scientifically acceptable evidence. A fifth report describes a study that was designed to be a true double-blind experiment. The outcome, by any accepted statistical and methodological standard, failed to support the hypothesis of the survival of consciousness. Yet the experimenters offer the results as a &quot;breathtaking&quot; validation of their claims about the existence of the afterlife. This is another unfortunate example of trying to snatch victory from the jaws of defeat.</p>
<h2>Notes</h2>
<ul>
<li><a name="note1"></a>
<p>Fans of Martin Gardner will recognize the similarity of this title to that of Martin&rsquo;s book <cite>How Not to Test a Psychic</cite> (1989, Prometheus Books). I thank Martin Gardner for his agreeing to let me adapt his title for this review.</p></li>
<li><a name="note2"></a>
<p>The principle usually <em>attributed</em> to William of Occam is typically stated as</p>
<p>&ldquo;Entities are not to be multiplied beyond necessity.&rdquo;</p>
<p>This statement, as such, cannot be found in the extant writings of William. The principle was known before William was born. However, he did write many different statements that are consistent with the principle such as,</p>
<p>&ldquo;It is vain to do with more what can be done with fewer.&rdquo;</p>
<p>[Read more about <a href="/sb/9409/closeshave.html">Occam&rsquo;s Razor</a> in the <cite>Skeptical Briefs</cite> newsletter.]</p></li>
<li><a name="note3"></a>
<p>Wiseman, R., and C. O&rsquo;Keeffe. 2001. <a href="/si/2001-11/mediums.html">Accuracy and replicability of anomalous after-death communication across highly skilled mediums: A critique</a>. <a href="http://www.spr.ac.uk/expcms/index.php?section=1">The Paranormal Review</a>, 19: 3&mdash;6. (Also in the <cite>Skeptical Inquirer</cite>, November/December 2001.)</p></li>
<li><a name="note4"></a>
<p>Schwartz, G.E. 2001. Accuracy and replicability of anomalous after-death communication across highly skilled mediums: A call for balanced evidence-based skepticism. <a href="http://www.spr.ac.uk/expcms/index.php?section=1">The Paranormal Review</a>: 20.</p></li>
<li><a name="note5"></a>
<p>For discussion of this concept and for a very striking illustration of subjective validation in operation see Marks, D. (2000, second edition), <cite>The Psychology of the Psychic</cite>. Amherst, N.Y.: Prometheus Books.</p></li>
<li><a name="note6"></a>
<p>Schwartz, G.E., S. Geoffrion, J. Shamini, S. Lewis, and L. Russek. (Submitted to the <a href="http://www.spr.ac.uk/expcms/index.php?section=1">Journal of the Society for Psychical Research</a>.) Evidence of anomalous information retrieval between two research mediums: Replication in a double-blind design. (I obtained a copy of this report from Professor Schwartz in August 2001.)</p></li>
<li><a name="note7"></a>
<p>Unfortunately, the double-blind procedure was not ideal. The research coordinator, who was aware of the sitter&rsquo;s identity, phoned Laurie Campbell and the sitter just before the reading. In this way, the medium had contact with someone who was aware of sitter&rsquo;s identity just prior to the reading.</p></li>

</ul>
<!--<h2>See Also</h2>

<p><a href="/si/2003-05/follow-up-schwartz.html">Schwartz Replies to Hyman</a>

<p><a href="/si/2003-05/follow-up-hyman.html">Hyman Replies to Schwartz</a>

<p><img src="/uploads/images/si/medium-book.jpg" width=100 height=150 style="border:1px solid black;" alt="book cover for the afterlife experiments">

<p><a href="http://www.openmindsciences.com/">
<cite>The Afterlife Experiments</cite> Website</a>

<p><a href="/si/2001-11/mediums.html">A Critique of Schwartz et al.'s After-Death Communication Studies</a> (<cite>Skeptical Inquirer</cite> Nov/Dec 2001)--> 




      
      ]]></description>
    </item>

    <item>
      <title>Proper Criticism</title>
      <pubDate>Sun, 01 Jul 2001 13:22:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/proper_criticism</link>
      <guid>http://www.csicop.org/si/show/proper_criticism</guid>
      <description><![CDATA[
        



			<p class="intro" style="font-weight:normal;">This brief guide by psychology professor Ray Hyman, a member of the CSICOP Executive Council from the beginning, has for many years been used by Skeptical Inquirer&rsquo;s editorial staff and widely distributed to authors and others. It was printed some years ago in the CSICOP newsletter <a href="/sb/">Skeptical Briefs</a>, and it appears in Hyman&rsquo;s book <a href="/q/book/0879755040"><cite>The Elusive Quarry</cite></a>, but it has never actually been published in SI. We thought our 25th anniversary would be a good time to do so.<br /><br />
<strong>&mdash;The Editors</strong></p>
<p>Since the founding of CSICOP in 1976, and with the growing number of localized skeptical groups, the skeptic finds more ways to state his or her case. The broadcast and print media, along with other forums, provide more opportunities for us to be heard. For some of these occasions, we have the luxury of carefully planning and crafting our response, but most of the time we have to formulate our response on the spot. But regardless of the circumstance, the critic&rsquo;s task, if it is to be carried out properly, is both challenging and loaded with unanticipated hazards.</p>
<p>Many well-intentioned critics have jumped into the fray without carefully thinking through the various implications of their statements. They have sometimes displayed more emotion than logic, made sweeping charges beyond what they can reasonably support, failed to adequately document their assertions, and, in general, failed to do the homework necessary to make their challenges credible.</p>
<p>Such ill-considered criticism can be counterproductive for the cause of serious skepticism. The author of such criticism may fail to achieve the desired effect, may lose credibility, and may even become vulnerable to lawsuits. But the unfavorable effects have consequences beyond the individual critics, and the entire cause of skepticism suffers as a result. Even when the individual critic takes pains to assert that he or she is expressing his or her own personal opinion, the public associates the assertions with all critics.</p>
<p>During CSICOP&rsquo;s first decade of existence, members of the Executive Council often found themselves devoting most of their available time to damage control-precipitated by the careless remarks of fellow skeptics-instead of toward the common cause of explaining the skeptical agenda.</p>
<p>Unfortunately, at this time, there are no courses on the proper way to criticize paranormal claims. So far as I know, no manuals or books or rules are currently available to guide us. Until such courses and guide books come into being, what can we do to ensure that our criticisms are both effective and responsible?</p>
<p>I would be irresponsible if I told you I had an easy solution. The problem is complicated and there are no quick fixes. But I do believe we all could improve our contributions to responsible criticism by keeping a few principles always in mind.</p>
<p>We can make enormous improvements in our collective and individual efforts by simply trying to adhere to those standards that we profess to admire and that we believe that many peddlers of the paranormal violate. If we envision ourselves as the champions of rationality, science, and objectivity, then we ought to display these very same qualities in our criticism. Just by trying to speak and write in the spirit of precision, science, logic, and rationality-those attributes we supposedly admire-we would raise the quality of our critiques by at least one order of magnitude.</p>
<p>The failure to consistently live up to these standards exposes us to a number of hazards. We can find ourselves going beyond the facts at hand. We may fail to communicate exactly what we intended. We can confuse the public about what skeptics are trying to achieve. We can unwittingly put paranormal proponents in the position of the underdogs and create sympathy for them. And, as I already mentioned, we can make the task much more difficult for other skeptics.</p>
<p>What, then, can skeptics do to upgrade the quality of their criticism? What follows are just a few suggestions. It is hoped they will stimulate further thought and discussion.

<ol>
<li>
<h4>Be prepared.</h4>
<p>Good criticism is a skill that requires practice, work, and level-headedness. Your response to a sudden challenge is much more likely to be appropriate if you have already anticipated similar challenges. Try to prepare in advance effective and short answers to those questions you are most likely to be asked. Be ready to answer why skeptical activity is important, why people should listen to your views, why false beliefs can be harmful, and the many similar questions that invariably are raised. A useful project would be to compile a list of the most frequently occurring questions along with possible answers.</p>
<p>Whenever possible try your ideas out on friends and &ldquo;enemies&rdquo; before offering them in the public arena. An effective exercise is to rehearse your arguments with fellow skeptics. Some of you can take the role of the psychic claimants while others play the role of critics. And, for more general preparation, read books on critical thinking, effective writing, and argumentation.</p>
</li>
<li>
<h4>Clarify your objectives.</h4>
<p>Before you try to cope with a paranormal claim, ask yourself what you are trying to accomplish. Are you trying to release pent-up resentment? Are you trying to belittle your opponent? Are you trying to gain publicity for your viewpoint? Do you want to demonstrate that the claim lacks reasonable justification? Do you hope to educate the public about what constitutes adequate evidence? Often our objectives, upon examination, turn out to be mixed. And, especially when we act impulsively, some of our objectives conflict with one another.</p>
<p>The difference between short-term and long-term objectives can be especially important. Most skeptics, I believe, would agree that our long-term goal is to educate the public so that it can more effectively cope with various claims. Sometimes this long-range goal is sacrificed because of the desire to expose or debunk a current claim.</p>
<p>Part of clarifying our objectives is to decide who our audience is. Hard-nosed, strident attacks on paranormal claims rarely change opinions, but they do stroke the egos of those who are already skeptics. Arguments that may persuade the readers of the National Enquirer may offend academics and important opinion-makers.</p>
<p>Try to make it clear that you are attacking the claim and not the claimant. Avoid, at all costs, creating the impression that you are trying to interfere with someone&rsquo;s civil liberties. Do not try to get someone fired from his or her job. Do not try to have courses dropped or otherwise be put in the position of advocating censorship. Being for rationality and reason should not force us into the position to seeming to be against academic freedom and civil liberties.</p>
</li>
<li>
<h4>Do your homework.</h4>
<p>Again, this goes hand in hand with the advice about being prepared. Whenever possible, you should not try to counter a specific paranormal claim without getting as many of the relevant facts as possible. Along the way, you should carefully document your sources. Do not depend upon a report in the media either for what is being claimed or for facts relevant to the claim. Try to get the specifics of the claim directly from the claimant.</p>
</li>
<li>
<h4>Do not go beyond your level of competence.</h4>
<p>No one, especially in our times, can credibly claim to be an expert on all subjects. Whenever possible, you should consult appropriate experts. We, understandably, are highly critical of paranormal claimants who make assertions that are obviously beyond their competence. We should be just as demanding on ourselves. A critic&rsquo;s worst sin is to go beyond the facts and the available evidence.</p>
<p>In this regard, always ask yourself if you really have something to say. Sometimes it is better to remain silent than to jump into an argument that involves aspects that are beyond your present competence. When it is appropriate, do not be afraid to say, &ldquo;I don't know.&rdquo;</p>
</li>
<li>
<h4>Let the facts speak for themselves.</h4>
<p>If you have done your homework and have collected an adequate supply of facts, the audience rarely will need your help in reaching an appropriate conclusion. Indeed, your case is made much stronger if the audience is allowed to draw its own conclusions from the facts. Say that Madame X claims to have psychically located Mrs. A&rsquo;s missing daughter and you have obtained a statement from the police to the effect that her contributions did not help. Under these circumstances it can be counterproductive to assert that Madame X lied about her contribution or that her claim was &ldquo;fraudulent.&rdquo; For one thing, Madame X may sincerely, if mistakenly, believe that her contributions did in fact help. In addition, some listeners may be offended by the tone of the criticism and become sympathetic to Madame X. However, if you simply report what Madame X claimed along with the response of the police, you not only are sticking to the facts, but your listeners will more likely come the appropriate conclusion.</p>
</li>
<li>
<h4>Be precise.</h4>
<p>Good criticism requires precision and care in the use of language. Because, in challenging psychic claims, we are appealing to objectivity and fairness, we have a special obligation to be as honest and accurate in our own statements as possible. We should take special pains to avoid making assertions about paranormal claims that cannot be backed up with hard evidence. We should be especially careful in this regard when being interviewed by the media. Every effort should be made to ensure that the media understand precisely what we are and are not saying.</p>
</li>
<li>
<h4>Use the principle of charity.</h4>
<p>I know that many of my fellow critics will find this principle to be unpalatable. To some, the paranormalists are the &ldquo;enemy,&rdquo; and it seems inconsistent to lean over backward to give them the benefit of the doubt. But being charitable to paranormal claims is simply the other side of being honest and fair. The principle of charity implies that, whenever there is doubt or ambiguity about a paranormal claim, we should try to resolve the ambiguity in favor of the claimant until we acquire strong reasons for not doing so. In this respect, we should carefully distinguish between being wrong and being dishonest.</p>
<p>We often can challenge the accuracy or validity of a given paranormal claim. But rarely are we in a position to know if the claimant is deliberately lying or is self-deceived. Furthermore, we often have a choice in how to interpret or represent an opponent&rsquo;s arguments. The principle tell us to convey the opponent&rsquo;s position in a fair, objective, and non-emotional manner.</p>
</li>
<li>
<h4>Avoid loaded words and sensationalism.</h4>
<p>All these principles are interrelated. The ones previous stated imply that we should avoid using loaded and prejudicial words in our criticisms. If the proponents happen to resort to emotionally laden terms and sensationalism, we should avoid stooping to their level. We should not respond in kind.</p>
<p>This is not a matter of simply turning the other cheek. We want to gain credibility for our cause. In the short run, emotional charges and sensationalistic challenges might garner quickly publicity. But most of us see our mission as a long-run effort. We would like to persuade the media and the public that we have a serious and important message to get across. And we would like to earn their their trust as a credible and reliable source. Such a task requires always keeping in mind the scientific principles and standards of rationality and integrity that we would like to make universal.</p>
</li>
</ol></p>




      
      ]]></description>
    </item>

    <item>
      <title>The Evidence for Psychic Functioning: Claims vs. Reality</title>
      <pubDate>Fri, 01 Mar 1996 13:19:00 EDT</pubDate>
	<author>info@csicop.org (<![CDATA[Ray Hyman]]>)</author>
      <link>http://www.csicop.org/si/show/evidence_for_psychic_functioning_claims_vs._reality</link>
      <guid>http://www.csicop.org/si/show/evidence_for_psychic_functioning_claims_vs._reality</guid>
      <description><![CDATA[
        



			<p>The recent media frenzy over the Stargate report violated the truth. Sober scientific assessment has little hope of winning in the public forum when pitted against unsubstantiated and unchallenged claims of &ldquo;psychics&rdquo; and psychic researchers &mdash; especially when the claimants shamelessly indulge in hyperbole. While this situation may be depressing, it is not unexpected. The proponents of the paranormal have seized an opportunity to achieve by propaganda what they have failed to achieved through science.</p>
<p>Most of these purveyors of psychic myths should not be taken seriously. However, when one of the persons making extreme claims is Jessica Utts, who is a professor of statistics at the University of California at Davis, this is another matter. Utts has impressive credentials and she marshals the evidence for her case in an effective way. So it is important to look at the basis for what I believe are extreme claims, even for a parapsychologist. Here is what Utts writes in her report on the Stargate program: 
Using the standards applied to any other area of science, it is concluded that psychic functioning has been well established. The statistical results of the studies examined are far beyond what is expected by chance. Arguments that these results could be due to methodological flaws in the experiments are soundly refuted. Effects of similar magnitude to those found in government-sponsored research at SRI [Stanford Research Institute] and SAIC [Science Applications International Corporation] have been replicated at a number of laboratories across the world. Such consistency cannot be readily explained by claims of flaws or fraud. . . . [Psychic functioning] is reliable enough to be replicated in properly conducted experiments, with sufficient trials to achieve the long-run statistical results needed for replicability. . . . Precognition, in which the answer is known to no one until a future time, appears to work quite well. . . . There is little benefit to continuing experiments designed to offer proof, since there is little more to be offered to anyone who does not accept the current collection of data.</p>
<p>For what it is worth, I happen to be one of those &ldquo;who does not accept the current collection of data&rdquo; as proving psychic functioning. Indeed, I do not believe that &ldquo;the current collection of data&rdquo; justifies that an anomaly of any sort has been demonstrated, let alone a paranormal anomaly. Although Utts and I &mdash; in our capacities as coevaluators of the Stargate project &mdash; evaluated the same set of data, we came to very different conclusions. If Utts&rsquo;s conclusion is correct, then the fundamental principles that have so successfully guided the progress of science from the days of Galileo and Newton to the present must be drastically revised. Neither relativity theory nor quantum mechanics in their present versions can cope with a world that harbors the psychic phenomena so boldly proclaimed by Utts and her parapsychological colleagues.</p>
<p>So, it is worth looking at the evidence that Utts uses to buttress her case. Unfortunately, many of the issues that this evidence raises are technical or require long and tedious refutations. This is not the place to develop this lengthy rebuttal. Instead, I will briefly list the sources of Utts&rsquo;s evidence and try to provide at least one or two simple reasons why they do not, either singly or taken together, justify her conclusions. As I understand it, Utts supports her conclusion with the following sources of evidence:</p>
<h2>1. Meta-analyses of Previous Parapsychological Experiments</h2>
<p>In a meta-analysis, an investigator uses statistical tools to pool the data from a series of similar experiments published over a period of time that may involve several different investigators and laboratories. Although some or many of the individual experiments might have yielded weak or nonsignificant results, the pooled data can be highly significant from a statistical viewpoint. In addition to getting an overall measure of significance, the meta-analyses typically also grade each study for quality on one or more dimensions. The idea is to see if the successful outcomes are correlated with poor quality. If so, this counts against the evidence for paranormal functioning. If not, then this is proclaimed as evidence that the successful outcomes were not due to flaws.</p>
<p>In the four major meta-analyses of previous parapsychological research, the pooled data sets produced astronomically significant results while the correlation between successful outcome and rated quality of the experiments was essentially zero.</p>
<p>Much can be written at this point. The major point I would make, however, is that drawing conclusions from meta-analytic studies is like having your cake and eating it too. The same data are being used to generate and test a hypothesis. The proper use of meta-analysis is to generate hypotheses, which then must be independently tested on new data. As far as I know, this has yet to be done. The correlation between quality and outcome also must be suspect because the ratings are not done blindly.</p>
<p>As far as I can tell, I was the first person to do a meta-analysis on parapsychological data. I did a meta-analysis of the original ganzfeld experiments as part of my critique of those experiments. My analysis demonstrated that certain flaws, especially quality of randomization, did correlate with outcome. Successful outcomes correlated with inadequate methodology. In his reply to my critique, Charles Honorton did his own meta-analysis of the same data. He too scored for flaws, but he devised scoring schemes different from mine. In his analysis, his quality ratings did not correlate with outcome. This came about because, in part, Honorton found more flaws in unsuccessful experiments than I did. On the other I found more flaws in successful experiments than Honorton did. Presumably, both Honorton and I believed we were rating quality in an objective and unbiased way. Yet, both of us ended up with results that matched our preconceptions.</p>
<p>So far, other than my meta-analysis, all the meta-analyses evaluating quality and outcome have been carried out by parapsychologists. We might reasonably expect that the findings will differ with skeptics as raters.</p>
<p>These are just two, but very crucial, reasons why the meta-analyses conducted so far on parapsychological data cannot be used as evidence for psi.</p>
<h2>2. The Original Ganzfeld Experiments</h2>
<p>These consisted of 42 experiments (by Honorton&rsquo;s count) of which 55 percent had been claimed as producing significant results in favor of ESP. My meta-analysis and evaluation of these experiments showed that this database did not justify concluding that ESP was demonstrated. Honorton&rsquo;s meta-analysis and rebuttal suggests otherwise. Utts naturally relies on Honorton&rsquo;s meta-analysis and ignores mine. In our joint paper, both Honorton and I agreed that there were sufficient problems with this original database that nothing could be concluded until further replications, conducted according to specified criteria, appeared.</p>
<h2>3. The Autoganzfeld Experiments</h2>
<p>This series of experiments, conducted over a period of six years, is so named because the collection of data was partially automated. When this set of experiments was first published in the Journal of Parapsychology in 1990, it was presented as a successful replication of the original ganzfeld experiments. Moreover, these experiments were said to have been conducted according the criteria set out by Honorton and me. This indeed seemed to be the case with the strange exception of the procedure for randomizing targets at presentation and judging. Even in writing our joint paper, Honorton argued with me that careful randomization was not necessary in the ganzfeld experiments because each subject appears only once. I disagreed with Honorton, but even by his own reasoning, randomization is not as important if you believe that the subject is the sole source of the final judgment. But this was blatantly not the case in the autoganzfeld experiments. The experimenter, who was not so well shielded from the sender as the subject, interacted with the subject during the judging process. Indeed, during half of the trials the experimenter deliberately prompted the subject during the judging procedure. This means that the judgments from trial to trial were not strictly independent.</p>
<p>However, from the original published report, I had little reason to question the methodology of these experiments. What I did question was the claim that they were consistent with the original ganzfeld experiments. I pointed out a number of ways that the two outcomes were inconsistent. Not until I was asked to write a response to a new presentation of these experiments in the January 1994 issue of the Psychological Bulletin did I get an opportunity to scrutinize the raw data. Unfortunately, I did not get all of the data, especially the portion that I needed to make direct tests of the randomizing procedures. But my analyses of what I did get uncovered some peculiar and strong patterns in the data. All of the significant hitting was done on the second or later appearance of a target. If we examined the guesses against just the first occurrences of targets, the result is consistent with chance. Moreover, the hit rate rose systematically with each additional occurrence of a target. This suggests to me a possible flaw. Daryl Bem, the coauthor with Honorton of the Psychological Bulletin paper, responded that it might reveal another peculiarity of psychic phenomena. The reason why my finding is of concern is that all the targets were on videotape and played on tape players during presentation. At the very least, the peculiar pattern I identified suggests that we need to require that when targets and decoys are presented to the subjects for judging, they all have been run through the machine the exact same number of times. Otherwise there might be nonparanormal reasons why one of the video clips appears different to the subjects.</p>
<p>Subsequent to my response, I have learned about other possible problems with the autoganzfeld experiments. The point of this is to show that it takes time and critical scrutiny to realize that what at first seems like an airtight series of experiments has a variety of possible weaknesses. I concluded, and do so even more strongly now, that the autoganzfeld experiments constitute neither a successful replication of the original ganzfeld experiments nor a sufficient body of data to conclude that ESP has finally been demonstrated. This new set of experiments needs independent replication with tighter controls.</p>
<h2>4. Apparent Replications of the Autoganzfeld Experiments</h2>
<p>Utts points to some apparent replications of the ganzfeld experiments that have been reported at parapsychological meetings. The major one is a direct attempt to replicate the autoganzfeld experiments with better controls, done at the University of Edinburgh. The reported results were apparently significant but were due to just one of the three experimenters. The two experienced experimenters produced only chance hitting. There are some inconsistencies in these unpublished reports. Utts points to three different replications that were apparently successful. I have heard of at least two large-scale replications that were unsuccessful. None of these replications, however, has been reported in a refereed journal and none has had the opportunity to be critically scrutinized. So we cannot count these one way or the other at this time until we know the details.</p>
<h2>5. The SAIC Experiments</h2>
<p>Utts and I were hired as the evaluation panel to assess the results of 20 years of previously classified research on remote viewing and related ESP phenomena. In the time available to us, it was impossible to scrutinize carefully all the of documents generated by this program. Instead, we focused our efforts on evaluating the ten studies done at Science Applications International Corporation (SAIC) during the early 1990s. These were selected, in consultation with the principal investigator, as representing the best experiments in the set. These ten experiments included two that examined physiological correlates of ESP. The results were negative. Another study found a correlation between when a subject was being observed (via remote camera) and galvanic skin reactions. The remaining studies, in one way or another, dealt with various target and other factors that might influence remote viewing ability. In these studies the same set of viewers produced descriptions that were successfully matched against the correct target consistently better than chance (with some striking exceptions).</p>
<p>Neither Utts nor I had the time or resources to fully scrutinize the laboratory procedures or data from these experiments. Instead, we relied on what we could glean from reading the technical reports. Two of the experiments had recently been published in the Journal of Parapsychology. The difficulty here is that these newly declassified experiments have not been in the public arena for a sufficient time to have been carefully and critically scrutinized. As with the original ganzfeld data base and the autoganzfeld experiments, it takes careful scrutiny and a period of a few years to find the problems of newly published or revealed parapsychological experiments. One obvious problem with the SAIC experiments is that the remote viewing results were all judged by one person &mdash; the director of the program. I believe that Utts agrees with me that we have to withhold judgments on these experiments until it can be shown that independent judges can produce the same results. Beyond this, we would require, as with any other set of newly designed experiments, replication by independent laboratories before we decide that the reported outcomes can be trusted.</p>
<h2>6. Prima Facie Evidence</h2>
<p>Utts and other parapsychologists also talk about prima facie evidence in connection with the operational stories of the psychics (or remote viewers) employed by the government. Everyone agrees there is no way to evaluate the accounts of these attempts to use input from remote viewers in intelligence activities. This is because the data were collected in haphazard and nonsystematic ways. No consistent records are available; no attempt was made to interrogate the viewers in nonsuggestive ways; no contemporary systematic attempts to evaluate the results are there, etc.</p>
<p>The attempts to evaluate these operational uses after the fact are included in the American Institutes for Research (A.I.R.) report and they do not justify concluding anything about the effectiveness or reality of remote viewing. Some stories, especially those involving cases that occurred long ago and/or that are beyond actual verification, have been put forth as evidence of apparently striking hits. The claim is that these remote viewers are right on &mdash; are actually getting true psychic signals &mdash; about 20 percent of the time.</p>
<p>Call it prima facie or whatever, none of this should be considered as evidence for anything. In situations where we do have some control comparisons, we find the same degree of hitting for wrong targets (when the judge does not realize it is the wrong target) as for the correct targets. A sobering example of this with respect to remote viewing can be found in David Marks and Richard Kammann&rsquo;s book The Psychology of the Psychic (Prometheus Books, Amherst, New York, 1980).</p>
<p>Psychologists, such as myself, who study subjective validation find nothing striking or surprising in the reported matching of reports against targets in the Stargate data. The overwhelming amount of data generated by the viewers is vague, general, and way off target. The few apparent hits are just what we would expect if nothing other than reasonable guessing and subjective validation are operating.</p>
<h2>7. Consistency Among the Different Sources</h2>
<p>Utts points to consistencies in effect sizes across the studies. More important, she points out several patterns such as bigger effect sizes with experienced subjects, etc. I do not have time or space to detail all the problems with these apparent consistencies. Many of them happen to relate to the fact that the average effect sizes in these cases are arbitrary combinations of heterogeneous sources. Moreover, where Utts detects consistencies, I find inconsistencies. I have documented some of these elsewhere; I will do so again in the near future.</p>
<h2>Conclusions</h2>
<p>When we examine the basis of Utts&rsquo;s strong claim for the existence of psi, we find that it relies on a handful of experiments that have been shown to have serious weaknesses after undergoing careful scrutiny, and another handful of experiments that have yet to undergo scrutiny or be successfully replicated. What seems clear is that the scientific community is not going to abandon its fundamental ideas about causality, time, and other principles on the basis of a handful of experiments whose findings have yet to be shown to be replicable and lawful.</p>
<p>Utts does assert that the findings from parapsychological experiments can be replicated with well-controlled experiments given adequate resources. But this is a hope or promise. Before we abandon relativity and quantum mechanics in their current formulations, we will require more than a promissory note. We will want, as is the case in other areas of science, solid evidence that these findings can, indeed, be produced under specified conditions.</p>
<p>Again, I do not have time to develop another part of this story. Because even if Utts and her colleagues are correct and we were to find that we could reproduce the findings under specified conditions, this would still be a far cry from concluding that psychic functioning has been demonstrated. This is because the current claim is based entirely upon a negative outcome &mdash; the sole basis for arguing for ESP is that extra-chance results can be obtained that apparently cannot be explained by normal means. But an infinite variety of normal possibilities exist and it is not clear than one can control for all of them in a single experiment. You need a positive theory to guide you as to what needs to be controlled, and what can be ignored. Parapsychologists have not come close to this as yet.</p>




      
      ]]></description>
    </item>

    
    </channel>
</rss>