In his latest New Yorker piece The Truth Wears Off, Jonah Lehrer directs our attention to the lack of reproducibility of results in scientific research. The problem is pervasive, he says:
…now all sorts of well-established, multiply confirmed finding have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread…
The Decline Effect, as Lehrer calls it, refers to scientists’ inability to reproduce reported results. The problem isn’t simple: it’s not just that different investigators or teams come up with conflicting information, or interpret the same raw data in disparate ways; over time, a single scientist may not be able to reproduce his or her own observations.
Lehrer begin his story with a target loaded with potential bias and conflicts of interest – a 2007 meeting in Brussels of scientists, shrinks and pharma executives contemplating the disappointing results in recent large clinical trials of blockbuster antipsychotic drugs like Abilify, Seroquel and Zyprexa. Initial reports, mainly from the early 1990s, which supported these drugs’ FDA approval and widespread use, turned out to present a too-positive story. Later studies indicate these agents are not as terrific as was advertised; new data call into question the drugs’ effectiveness and safety.
This is probably true, but it’s hardly surprising. It happens in oncology all the time – when drug companies support initial studies of new drugs with an intention to sell those, it’s sometimes the case (and unfortunately frequent) that initial reports are more promising than what really happens after a decades’ worth of less careful (i.e. more open) selection of patients who take an FDA-approved medication. Once you include a broader group of patients in the analysis, whose doctors aren’t researchers whose salaries are supported by the drug makers, the likelihood of getting truthful reports of side effects and effectiveness shoots up.
So I don’t think Lehrer’s big-pharma example is a reasonable shot at the scientific method, per se. Rather, it’s a valid perspective on problems that arise when drug companies sponsor what’s supposed to be objective, scientific research.
Lehrer moves on to what might be purer example of the decline effect. He tells the story of Professor Jonathan Schooler, a now-tenured professor who discovered in the 1980s that humans’ memories are strengthened by the act of describing them. The work is cited often, Lehrer says.
…But while Schooler was publishing these results in highly reputable journals, a secret worry gnawed at him: it was proving difficult to replicate his earlier findings. ‘I’d often still see an effect, but the effect just wouldn’t be as strong.’
Next, Lehrer steps back in history. He relates the story of Joseph Banks Rhine, a psychologist at Duke who in the early 1930s developed an interest in the possibility of extrasensory perception. (Yes, that would be ESP.) Rhine devised experiments to evaluate individuals’ capacity to guess which symbol-bearing cards might be drawn from a deck, before they’re drawn. The initial findings were uncanny: “Rhine documented these stunning results in his notebook and prepared several papers for publication. But then, just as he began to believe in the possibility of extrasensory perception, the student lost his spooky talent…”
Schooler, plagued with self-doubt about his published findings on human memory, as Lehrer tells it, embarked on an “ironic” attempt to replicate Rhine’s work on ESP. In 2004, he set up experiments in which he flashed images and asked a subject to identify those; next he randomly selected some of those images for a second showing, to see if those were more likely to have been identified in the first round.
“The craziness of the hypothesis was the point,” Lehrer says. “But he wasn’t testing extrasensory powers; he was testing the decline effect.” He continues:
‘At first, the data looked amazing, just as we’d expected,’ Schooler says. ‘I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size’ – a standard statistical measure – ‘kept on getting smaller and smaller.’ The scientists eventually tested more than two thousand undergraduates …’We found this strong paranormal effect, but it disappeared on us.’
OK, are we talking science, or X-Files? I find this particular episode – both in its original, depression-era version and in Schooler’s 1990s remake – fascinating, even thought-provoking. But these don’t change my confidence in the scientific method one iota.
He moves on to consider a zoologist in Uppsala, Sweden, who published on symmetry and barn swallows’ mating preferences, aesthetics and genetics whose Nature-published theories on “fluctuating asymmetry” haven’t stood the test of time. After an initial blitz of confirmatory reports and curious, related findings, the observed results diminished. Another scientist, said to have been very enthusiastic about the subject and who tried to reproduce them with studies of symmetry in male horned beetles, couldn’t find an effect. The researcher laments:
‘But the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove…’
Next, Lehrer advances toward a more general discussion on bias in scientific publishing. This can only partly explain the decline effect, he says. Intellectual fads and journal editors’ preferences for new and positive results lead to imbalance in reporting. Publication bias distorts the reporting of positive clinical trials over negative or inconclusive results. No argument here –
Still, the problem goes deeper. Lehrer interviews Richard Palmer, a biologist in Alberta who’s used a statistical method called a funnel plot to evaluate trends in published research findings. What happens, Palmer says, is that researchers are disposed (or vulnerable?, ES) to selective reporting based on their unconscious perceptions of truth and uneven enthusiasm for particular concepts. He gives an example:
…While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the west. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and ever single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six percent of these studies found any therapeutic benefits.
These discrepant reports support that scientists see data in ways that confirm their preconceived ideas. “Our beliefs are a form of blindness,” Lehrer writes. In Wired he quotes Paul Simon: “A man sees what he wants to see and disregards the rest.” The point is clear.
Nearing the end, Lehrer draws on and extends upon David Freedman’s November Atlantic feature, Lies, Damned Lies, and Medical Science, on the critical, outstanding oeuvre of John Ioannidis, a Stanford epidemiologist who elucidates falsehoods in published research.
Re-reading these two articles together, as I did this morning, can be disheartening. “Trust no one,” I recalled. Seems like many – and possibly most – published research papers are untrue or at least exaggerated and/or misleading. But on further and closer review, maybe the evidence for pervasive untruths is not so solid.
In sum, the Truth Wears Off, in last week’s Annals of Science, offers valuable ideas – the decline effect (new), the statistician’s funnel plot (not new, but needing attention) and publication bias (tiresome, but definitely relevant). The ESP story is an obvious weak link in the author’s argument, as is the article’s emphasis and reliance, to some degree, on psychological models and findings in relatively soft fields of research. Physics, genetics, molecular biology and ultimately most aspects of cancer medicine, I know and hope – can be measured, tested and reported objectively.
My approach to new information is always to keep in mind who are my sources, whether those are authors of an article I’m reading or a doctor who’s making a recommendation about a procedure for someone in my family, and the limitations of my own experiences. I’m skeptical about new drugs and medical tools, but determinately open-minded.
The problem is this: if we close our minds to all new findings, we’ll never learn anything. Nor will we ever get better. Sometimes scientific reports are accurate, life-saving or even paradigm-shifting; if only we could know which those are –
“When the experiments are done, we still have to choose what to believe,” Lehrer concludes.
He’s right; I agree. Our choices, though, should be informed – through literacy, multiple sources of information, and common sense.