How to Think Critically V – Daylight Atheism

by Adam Lee on December 29, 2007

Double-Blind Tests

In a post back in October, I dissected the claims of a spam e-mail that landed in my inbox to promote the “Detox Box“, an expensive piece of snake oil claimed to be able to cure any disease through the power of pseudoscience. As is usual in these matters, an offended true believer showed up in the comments to proclaim her faith in the device:

The Detox Box works for me and I wouldn’t go without it. It is my personal results. I don’t need a double blind study to know that it works.

In fact, as I and others pointed out, you do need a double-blind test to know that this or any other medical treatment works. Personal testimony is not sufficient, and in fact, it is among the least reliable kinds of evidence – with good reason.

It should be no surprise that people often interpret the world according to their expectations. When we expect something to happen, it often does happen. For example, when we undergo a treatment which we expect to help with an illness, most people do report feeling better afterwards. Along the same lines, doctors who are administering a treatment which they expect to help their patient will tend to notice signs of improvement thereafter. This is just the well-known placebo effect, but what’s not as well-known is just how powerful the placebo effect can be. For a dramatic example of the mind’s power to influence the body, I give you pseudocyesis – psychosomatic pregnancy.

“Every sign and symptom of pregnancy has been recorded in these patients except for three: You don’t hear heart tones from the fetus, you don’t see the fetus on ultrasound, and you don’t get a delivery,” Dr. Paulman said.

“Every sign and symptom of pregnancy” is literally accurate. Women with pseudocyesis see their bellies swell, exactly as you’d expect in a late-term pregnancy. Their breasts become enlarged and produce milk; their menstrual cycle stops; their hormone levels are elevated; they suffer food cravings and morning sickness. And when the time arrives, they have contractions and labor pains. So convincing are these signs that pseudocyesis often fools even experienced obstetricians, and sometimes goes undetected right up until the moment of labor, when it finally becomes obvious there’s nothing to be delivered. Some women have undergone unnecessary Caesarian sections because of it. (The usual explanation is that it occurs in women who desperately want to conceive children, but are unable to for some reason. It’s rarer these days thanks to ultrasound and other fetal imaging technologies which can provide the woman incontrovertible proof that she isn’t pregnant.)

Human beings are fallible; of that there can be no doubt. When studying things that can’t easily be quantified, we’re all too susceptible to bias, both for and against. Even if we have the most honest intentions in the world, unconscious presuppositions may subtly skew our results without our even being aware of it. (58% of corporate CEOs are six feet tall or more; in the general population it’s only 15%. Apparently most people subconsciously think that tall people make better leaders.)

How do we compensate for our own biases? We can do it with a scientific technique that’s considered the gold standard in medicine and other fields of diagnosis: the double-blind test.

The double-blind test is fundamentally a very simple concept. To conduct a double-blind test, you start by gathering a group of volunteers. Each volunteer is assigned – randomly, which is important – to either the experimental group or the control group. Those in the experimental group receive the treatment that is being tested, while those in the control group receive a placebo that is identical to all outward appearances, but completely inert. The experimenters rate the progress of both groups and, at the conclusion of the test, crunch the numbers to determine whether the experimental group shows a statistically significant difference from the control group.

The crucial point of a double-blind test is that the volunteers don’t know whether they’re in the experimental or the control group, and neither do the scientists who are administering the treatment and observing the results. Generally, during the experiment, that knowledge is held by a third party who does not otherwise participate. Once the experiment is done and the results are in, the blinding is broken, and only then do the experimenters sift the data to determine whether there’s a significant difference between the two groups. (Sometimes this has to be modified, such as in double-blind studies of surgical treatments that compare the real operation with “sham surgery” which makes an incision but nothing more. In this case, one set of doctors performs the surgery and a different set, blinded to who got which treatment, evaluates the results.)

The point of a double-blind test is that it is designed so that no one’s beliefs, biases or expectations can influence the results. Since volunteers are assigned randomly to one of the two groups, there’s no possibility that, say, an unscrupulous medical researcher could assign the sicker volunteers to the control group and the healthier ones to the experimental group to make a treatment appear more effective. Since the volunteers don’t know which group they’re in, their expectations about what will happen can’t cause a placebo effect that would throw off the results. (More accurately, both the experimental and control groups should experience the placebo effect to the same degree. Thus, that effect can be statistically subtracted out, to see whether any residual effect is left over that could only be attributed to the efficacy of the treatment.) And since the researchers don’t know either, their expectations about the efficacy of their own treatment can’t skew their data collection.

In experiments that are not double-blind, the results are often affected by the participants’ or the researchers’ advance knowledge. For example, in the classic police lineup technique for identifying suspects, there are concerns that the detective present may purposefully or inadvertently prompt the witness. Double-blind lineups, where the detective overseeing does not know who the suspect is, have been shown to produce fewer false identifications.

A famous case where double-blind experiments revealed the truth was Clever Hans, the horse who could supposedly solve complex mathematics (giving the answers by tapping his hoof). In reality, he was responding to subtle, unconscious cues from the observers who were present and who knew the answer to the question. This fact was eventually exposed by the psychologist Oskar Pfungst, who conducted a double-blind test of the horse’s abilities by asking questions whose answers none of the onlookers knew.

When performed properly, the double-blind test is an invaluable tool for weeding out pseudoscience and falsehood of every kind. Witness the following immortal reply from a practitioner of “applied kinesiology”:

When these results were announced, the head chiropractor turned to me and said, “You see, that is why we never do double-blind testing anymore. It never works!”

Although it may not “work” to confirm presuppositions, double-blind testing most definitely does work to tell truth apart from falsehood. As such, it’s no surprise that advocates of delusion avoid it or denigrate it at every turn.