A Closer Look at the Details on Mammography, in Between the Lines

Recently I wrote a review of Between the Lines, a helpful handbook on bio-medical statistics authored by an acquaintance and colleague, Dr. Marya Zilberberg. In that post, I mentioned my concern about some of the assumptions and statements on mammography. One thing I liked the book, abstractly, is the author’s efforts to streamline the discussion so that the reader can follow the concepts. But simplification and rounding numbers, “for ease of presentation” (p. 29) can mess up facts, significantly in ways that some primary care doctors and journalists might not appreciate. And so I offer what I hope is a clarification, or at least an extension of my colleague’s work, for purposes of helping women understand the potential benefits and risks of mammography.

In the section on mammography (pp. 28-31), the author rounds down the incidence of breast cancer in women between the ages of 40 and 50 years, from “1 in 70” (1.43%) to “1 in 100” (1%). As any marketing professional might remind us, this small change represents a 30% drop (0.43/1.43) in the rate of breast cancer in women of that age group. This difference – of 30%, or 43%, depending on how you look at it – will factor into any calculation of the false positive (FP) rate and the positive predictive value (PPV) of the test.

For women ages 40-49 Have breast cancer Don’t have breast cancer
If estimate 1 in 100, 1.0 % 100 9,900
If estimate 1 in 70, 1.43 % 143 9,857

Keep in mind that these same, proportional difference would apply to any BC screening considerations – in terms of the number of women affected, the potential benefits and costs, for the 22,996,493 women between the ages of 40 and 49 counted in the 2010 U.S. Census,

My colleague estimates, fairly for this younger age group of women (who are relatively disposed to fast-growing tumors), that the screening technology (mammography) only picks up 80% of cases; 20% go undetected. In other words – the test is 80% sensitive; the false negative, FN, rate is 20%. In this same section, she considers that the FP rate as 10%. Let’s accept this (unacceptably high) FP rate for now, for the sake of discussion.

As considered in Between the Lines:

If FP rate is 10%, prevalence 1 in 100 Really have BC Don’t have BC Total
Mammography + 80 990 1,070
Mammography – 20 8,910 8,930
Total 100 9,900 10,000

But the above numbers aren’t valid, because the disease affects over 1 in 70 women in this age bracket. Here’s the same table with a prevalence of 1 in 70 women with BC:

If FP rate is 10%, prevalence 1 in 70 Really have BC Don’t have BC Total
Mammography + 114 986 1,100
Mammography – 29 8,871 8,900
Total 143 9,857 10,000

In this closer approximation to reality, the number of true positives is 114, and false positives 986, among 1,100 abnormal screening results. Now, the PPV of an abnormal mammogram is 114/ (114+986) = 10.4%. So the main statistical point – apart from the particulars of this discussion –  is that a seemingly slight rounding down can have a big impact on a test’s calculated and perceived value. By adjusting the BC rate to its prevalence of approximately 1 in 70 women between 40 and 49 years, we’ve raised the PPV from 7.5% to 10.4%.

Here I must admit that I, too, have rounded, although I did so conservatively very slightly. I adopted a 1 in 70 approximation (1.43%) instead of 1 in 69 (1.45%), as indicated on the NCI website. If we repeat the table and figures using a 1 in 69 or 1.45% prevalence rate and 6% FPS, the PPV rises a tad, to 10.5%.

Now, we might insert a different perspective: What if the false positive rate were 6%, as has been observed among sub-specialist radiologists who work mainly in breast cancer screening?

If FP rate is 6%, prevalence 1 in 70 Really have BC Don’t have BC Total
Mammography + 114 591 705
Mammography – 29 9266 9,295
Total 143 9,857 10,000

As you can see, if we use a FP rate of 6% in our calculations, the total number of FPs drops to 591 among 10,000 women screened. In this better-case scenario, the PPV of the test would = 114/ (114+591) =16%. Still, that’s not great – and I’d argue that public health officials, insurers and patients should be pushing for FP rates closer to 2 or 3% – but that’s irrelevant to my colleague’s point and her generally instructive work.

My second concern has to do with language, and making the consequences of false positives seem worse than they really are. On page 29, the author writes: “ So, going back to the 10,000 women being screened, of 9,900 who do NOT have cancer… 10%, or 990 individuals will still be diagnosed as having cancer.” The fact is, the overwhelming majority of women with positive mammograms won’t receive a cancer diagnosis. Rather, they’ll be told they have “an abnormal result, or a finding that suggests the possibility of cancer and needs further evaluation,” or something along those lines. It would be unusual in practice to jump from a positive mammogram straight to a breast cancer diagnosis. There are steps between, and every patient and journalist should be aware of those.


Finally, if I were to write what I really think, apart from and beyond Between the Lines – I’d suggest the FP rate should be no higher than 2 or 3% in 2012. This is entirely feasible using extant technology, if we were to change just two aspects of mammography practice in the U.S. First, require that all mammograms be performed by breast radiologists who get extra training and focus in their daily work almost exclusively on breast imaging. Second, make sonograms – which, together with mammograms, enhance the specificity of BC screening in women with dense breasts– universally available to supplement the radiologists’ evaluations of abnormal mammograms and dense breasts in younger women.

By implementing these two changes, essentially supporting the practice of sub-specialists in breast radiology, we could significantly lower the FP rate in breast cancer screening. The “costs” of those remaining FPs could be minimized by judicious use of sonograms, needle biopsies and other measures to reduce unnecessary surgery and over-treatment. Over the long haul, we need to educate doctors not to over-treat early stage disease, but that goes far beyond this post and any one woman’s analysis of mammography’s effectiveness.

All for now,
ES

Related Posts:

Reading Between the Lines, and Learning from an Epidemiologist

Early on in Between the Lines, a breezy new book on medical statistics by Dr. Marya Zilberberg, the author encourages her readers to “write, underline, highlight, dog-ear and leave sticky notes.” I did just that. Well, with one exception; I didn’t use a highlighter. That’s partially due to my fear of chemicals, but mainly because we had none in my home.

I enjoyed reading this book, perhaps more than I’d anticipated. Maybe that’s because I find the subject of analyzing quantitative data, in itself, dull. But this proves an easy read: it’s short and not boring. The author avoids minutia. Although I’m wary of simplified approaches – because as she points out, the devil is often in the details of any study – this tact serves the reader who might otherwise drop off this topic. Her style is informal. The examples she chooses to illustrate points on medical studies are relevant to what you might find in a current journal or newspaper this morning.

Over the past year or two, I have gotten to know Dr. Zilberberg, just a bit, as a blogging colleague and on-line associate. This book gave me the chance to understand her perspective. Now, I can better “see” where she’s coming from.

There’s a lot anyone with an early high school math background, or a much higher level of education, might take away from this work. For doctors who’ve attended four-year med schools and, of course, know their stats well (I’m joking, TBC*), this book provides an eminently readable review of basic concepts – sensitivity, specificity, types of evidence, types of trials, Type II errors, etc. For those, perhaps pharmacy student, journalists and others, looking for an accessible source of information on terms like “accuracy” or HTE (heterogeneous treatment effect), Between the Lines will fill you in.

The work reads as a skinny, statistical guidebook with commentary. It includes a few handy tables – on false positives and false negatives (Chapter 3), types of medical studies (Chapter 14), and relative risk (Chapter 19). There’s considered discussion of bias, sources of bias, hypothesis testing and observational studies. In the third chapter the author uses lung cancer screening scenarios to effectively explain terms like accuracy, sensitivity and specificity in diagnostic testing, and the concept of positive predictive value.

Though short, this is a thoughtful, non-trivial work with insights. In a segment on hierarchies of evidence, for example, the author admits “affection for observational data.” This runs counter to some epidemiologists’ views. But Zilberberg defends it, based on the value of observational data in describing some disease frequencies, exposures, and long-term studies of populations. In the same chapter, she emphasizes knowing – and stating – the limits of knowledge (p. 37): “…I do think we (and the press) do a disservice to patients, and to ourselves, and to the science if we are not upfront about just how uncertain much of what we think we know is…”

Mammography is, not surprisingly, one of few areas about which I’d take issue with some of the author’s statements. For purposes of this post and mini-review, I’ll leave it at that, because I think this is a helpful book overall and in many particulars.

Dr. Zilberberg cites a range of other sources on statistics, medical studies and epistemology. One of my favorite quotes appears early on, from the author directly. She considers the current, “layered” system of disseminating medical information through translators, who would be mainly physicians, to patients, and journalists, to the public. She writes: “I believe that every educated person must at the very least understand how these interpreters of medical knowledge examine, or should examine, it to arrive at the conclusions.”

This book sets the stage for richer, future discussions of clinical trials, cancer screening, evidence-based medicine, informed consent and more. It’s a contribution that can help move these dialogues forward. I look ahead to a continued, lasting and valuable conversation.

 —

*TBC = to be clear

Related Posts:

A JAMA Press Briefing on CER, Helicopters and Time for Questions

This week the Journal of the American Medical Association, JAMA, held a media briefing on its current, Comparative Effectiveness Research (CER) theme issue. The event took place in the National Press Club. A doctor, upon entering that building, might do a double-take waiting for the elevator, curious that the journalists occupy the 13th floor – what’s absent in some hospitals.

CER is a big deal in medicine now. Dry as it is, it’s an investigative method that any doctor or health care maven, politician contemplating reform or, maybe, a patient would want to know. The gist of CER is that it exploits large data sets – like SEER data or Medicare billing records – to examine outcomes in huge numbers of people who’ve had one or another intervention. An advantage of CER is that results are more likely generalizable, i.e. applicable in the “real world.” A long-standing criticism of randomized trials – held by most doctors, and the FDA, as the gold standard for establishing efficacy of a drug or procedure – is that patients in research studies tend to get better, or at least more meticulous, clinical care.

The JAMA program began with an intro by Dr. Phil Fontanarosa, a senior editor and author of an editorial on CER, followed by 4 presentations. The subjects were, on paper, shockingly dull: on carboplatin and paclitaxel w/ and w/out bevacizumab (Avastin) in older patients with lung cancer; on survival in adults who receive helicopter vs. ground-based EMS service after major trauma; a comparison of side effects and mortality after prostate cancer treatment by 1 of 3 forms of radiation (conformal, IMRT, or proton therapy); and – to cap it off – a presentation on PCORI‘s priorities and research agenda.

I learned from each speaker. They brought life to the topics! Seriously, and the scene made me realize the value of meeting and hearing from the researchers, directly, in person. But, NTW, on ML today we’ll skip over the oncologist’s detailed report to the second story:

Dr. Adil Haider, a trauma surgeon at Johns Hopkins, spoke on helicopter-mediated saves of trauma patients. Totally cool stuff; I’d rate his talk “exotic” – this was as far removed from the kind of work I did on molecular receptors in cancer cells as I’ve ever heard at a medical or journalism meeting of any sort –

Haider indulged the audience, and grabbed my attention, with a bit of history:  HEMS, which stands for helicopter-EMS, goes back to the Korean War, like in M*A*S*H. The real-life surgeon-speaker at the JAMA news briefing played a music-replete video showing a person hit by a car and rescued by helicopter. While he and other trauma surgeons see value in HEMS, it’s costly and not necessarily better than GEMS (Ground-EMS). Helicopters tend to draw top nurses, and they deliver patients to Level I or II trauma centers, he said, all of which may favor survival and other, better outcomes after serious injury. Accidents happen; previous studies have questioned the helicopters’ benefit.

The problem is, there’s been no solid randomized trial of HEMS vs. GEMS, nor could there be. (Who’d want to get the slow pick-up with a lesser crew to a local trauma center?) So these investigators did a retrospective cohort study to see what happens when trauma victims 15 years and older are delivered by HEMS or GEMS. They used data from the National Trauma Data Bank (NTDB), which includes nearly 62,000 patients transported by helicopter and over 161,000 patients transported by ground between 2007 and 2009. They selected patients with ISS (Injury severity scores) above 15. They used a “clustering” method to control for differences among trauma centers, and otherwise adjusted for degrees of injury and other confounding variables.

“It’s interesting,” Haider said. “If you look at the unadjusted mortality, the HEMS patients do worse.” But when you control for ISS, you get a 16% increase in odds of survival if you’re taken by helicopter to a Level I trauma center. He referred to Table 3 in the paper.  This, indeed, shows a big difference between the “raw” and adjusted data.

In a supplemental video provided by JAMA (starting at 60 seconds in):

When you first look, across the board, you’ll see that actually more patients transported by helicopter, in terms of just the raw percentages, actually die.” – Dr. Samuel Galvagno (DO, PhD), the study’s first author.

The video immediately cuts to the senior author, Haider, who continues:

But when you do an analysis controlling for how severely these patients were injured, the chance of survival improves by about 30 percent, for those patients who are brought by helicopter…

Big picture:

What’s clear is that how investigators adjust or manipulate or clarify or frame or present data – you choose the verb – yields differing results. This capability doesn’t just pertain to data on trauma and helicopters. In many Big Data situations, researchers can cut information to impress whatever point they choose.

The report offers a case study of how researchers can use elaborate statistical methods to support a clinical decision in a way that few doctors who read the results are in a position to grasp, to know if the conclusions are valid, or not.

A concluding note –

I appreciated the time allotted for Q&A after the first 3 research presentations. There’s been recent, legitimate questioning of the value of medical conferences. This week’s session, sponsored by JAMA, reinforced to me the value of meeting study authors in person, and having the opportunity to question them about their findings. This is crucial, I know this from my prior experience in cancer research, when I didn’t ask enough hard questions of some colleagues, in public. For the future, at places like TEDMED – where I’ve heard there was no attempt to allow for Q&A – the audience’s concerns can reveal problems in theories, published data and, constructively, help researchers fill in those gaps, ultimately to bring better-quality information, from any sort of study, to light.

Related Posts:

What Does it Mean if Primary Care Doctors Get the Answers Wrong About Screening Stats?

Last week the Annals of Internal Medicine published a new report on how doctors (don’t) understand cancer screening stats. This unusual paper reveals that some primary care physicians – a majority of those who completed a survey – don’t really get the numbers on cancer incidence, 5-year survival and mortality.

An accompanying editorial by Dr. Virginia Moyer, a Professor of Pediatrics and current Chair of the USPSTF, drives two messages in her title, What We Don’t Know Can Hurt Our Patients: Physician Innumeracy and Overuse of Screening Tests. Dr. Moyer is right, to a point. Because if doctors who counsel patients on screening don’t know what they’re speaking of, they may provide misinformation and cause harm. But she overstates the study’s implications by emphasizing the “overuse of screening tests.”

The report shows, plainly and painfully, that too many doctors are confused and even ignorant of some statistical concepts. Nothing more, nothing less. The new findings have no bearing on whether or not cancer screening is cost-effective or life-saving.

What the study does suggest is that med school math requirements should be upped and rigorous, counter to the trend. And that we should do a better job educating students and reminding doctors about relevant concepts including lead-time bias, overdiagnosis and – as highlighted in two valuable blogs just yesterday, NPR Shots and Reporting on Health Antidote – the Number Needed to Treat, or NNT.

The Annals paper has yielded at least two unfortunate outcomes. One, which there’s no way to get around, is the clear admission of doctors’ confusion. In the long term, this may be a good thing, like admitting a medical error and then having QA improve as a consequence. But meanwhile some doctors at their office desks and lecterns don’t realize what they don’t know, and there’s no clear remedy in sight.

Dr. Moyer, in her editorial, writes that medical journal editors should carefully monitor reports to ensure that results aren’t likely misinterpreted. She says, in just one half-sentence, that medical educators should improve teaching on this topic. And then she directs the task of stats-ed to media and journalists, who, she advises, might follow the lead of the “watchdog” HealthNewsReview. I don’t see that as a solution, although I agree that journalists should know as much as possible about statistics and limits of data about which they report.

The main problem elucidated in this article is a failure in medical education. The cat’s out of the bag now. The WSJ Heath Blog covered the story. Most doctors are baffled, says Fox News. On its home page, the Dartmouth Institute for Health Policy & Clinical Practice links to a Reuters article that’s landed on the NIH/NLM-sponsored MedlinePlus (accessed 3/15/12). This embarrassment  further compromises individuals’ confidence in doctors they would and sometimes need rely on.

We lie, we cheat, we steal, we are confused… What else can doctors do wrong?

The second, and I think unnecessary, problematic outcome of this report is that it’s been used to argue against cancer screening. In the editorial Dr. Moyer indulges an ill-supported statement:

…several analyses have demonstrated that the vast majority of women with screen-detected breast cancer have not had their lives saved by screening, but rather have been diagnosed early with no change in outcome or have been overdiagnosed.

The problem of overdiagnosis, which comes up a lot in the paper, is over-emphasized, at least as it relates to breast cancer, colon cancer and some other tumors. I  have never seen a case of vanishing invasive breast cancer. In younger women, low-grade invasive tumors are relatively rare. So overdiagnosis isn’t applicable in BC, at least for women who are not elderly.

In the second paragraph Dr. Moyer outlines, in an unusual mode for the Annals, a cabal-like screening lobby:

 …powerful nonmedical forces may also lead to enthusiasm for screening, including financial interests from companies that make tests or testing equipment or sell products to treat the conditions diagnosed and more subtle financial pressures from the clinicians whose daily work is to diagnose or treat a condition. If fewer people are diagnosed with a disease, advocacy groups stand to lose contributions and academics who study the disease may lose funding. Politicians may wish to appear responsive to powerful special interests…

While she may be right, that there are some influential and self-serving interests and corporations who push aggressively, and maybe too aggressively for cancer screening, it may also be that some forms of cancer screening are indeed life-saving tools that should be valued by our society. I think, also, that she goes too far in insinuating that major advocacy groups push for screening because they stand to lose funding.

I’ve met many cancer agency workers, some founders, some full-time, paid and volunteer helpers – with varied priorities and goals – and I honestly believe that each and every one of those individuals hopes that the problem of cancer killing so many non-elderly individuals in our society will go away. It’s beyond reason to suggest there’s a hidden agenda at any of the major cancer agencies to “keep cancer going.” There are plenty of other worthy causes to which they might give their time and other resources, like education, to name one.

Which leads me back to the original paper, on doctors’ limited knowledge –

As I read the original paper the first time, I considered what would happen if you tested 412 practicing primary care physicians about hepatitis C screening, strains, and whether or not there’s a benefit to early detection and treatment of that common and sometimes pathologic virus, or about the use of aspirin in adults with high blood pressure and other risk factors for heart disease, or about the risks and benefits of drugs that lower cholesterol.

It seems highly unlikely that physicians’ uncertainty is limited to conceptual aspects of cancer screening stats. Knowing that, you’d have to wonder why the authors did this research, and why the editorial pushes so hard the message of over-screening.

Related Posts:

What is the Disease Control Rate in Oncology?

Last week I came upon a new term in the cancer literature: the Disease Control Rate. The DCR refers to the total proportion of patients who demonstrate a response to treatment.

In oncology terms: The DCR is the sum of complete responses (CR) + partial responses (PR) + stable disease (SD).

Another way of explaining it: Some people with cancer have measurable, growing tumors. For example, a man might have a sarcoma with multiple metastases in the lung that are evidently progressing. If the patient starts a new treatment and the lung mets don’t shrink but stop getting bigger, that might be considered a stabilizing effect from the therapy, and his response would be included in the DCR.

Related Posts:

Breast Cancer Stats: Notes from the 2012 ACS Report, and a Key Question

Earlier this month, the ACS released its annual report on Cancer Facts and Figures. The document, based largely on analyses of SEER data from the NCI, supports that approximately 229,000 adults in the U.S. will receive a diagnosis of invasive breast cancer (BC) this year. The disease affects just over 2,000 men annually; 99% of cases arise in women. Non-invasive, aka in situ or Stage 0 BC, including DCIS, will be found in approximately 63,000 individuals.

The slightly encouraging news is that BC mortality continues to decline. This year, the number of expected deaths from BC is just under 40,000. From the ACS document: “Steady declines in breast cancer mortality among women since 1990 have been attributed to a combination of early detection and improvements in treatment.”

Survival data, from the report:

For all women diagnosed with BC, the 5-year relative survival rate has risen from 63% in the 1960s to 90% today. At 10 years, for women of all stages combined, the relative survival is 82% and at 15 years, 77%. Traditional staging still matters: For women with localized BC (that has not spread to glands or elsewhere outside of the breast), the 5-year relative survival at 5 years is 99%. For women with lymph node involvement, 5-year relative survival is 84%.

For those with metastatic disease, 5-year relative survival is 23%. The report cautions: these “stats don’t reflect recent advances in detection and treatment. For example, 15-year relative survival is based on patients diagnosed as early as 1990.”

Since 1990, we’ve seen testing and widespread use of (no longer) new drugs like Herceptin, taxane-type chemotherapies, aromatase inhibitors and other meds in women with MBC. In addition, it’s possible that better palliative care and supportive strategies, along with more effective treatments for infectious and other complications, may have extended survival.

What we’ve got to ask, and about which data are remarkably elusive, is this: What is the median survival for women with metastatic BC (MBC) in 2012?

Your author has spoken with several leading, national authorities on the subject, and no one has provided a clear answer. The reason for this informational hole is that SEER data includes the incidence of new cases at each stage, and mortality from the disease, but does not include numbers on stage conversion – when a woman who had early-stage disease relapses with Stage IV (MBC). There’s astonishingly little current data about on how long women live, on average, after relapsing.

20 years ago, oncology fellows learned that the median survival of women with MBC was around 3 years. Now, that is pretty much still what doctors tell patients, but there’s a sense that the picture is no longer so bleak. Much of what we know about survival of women with MBC comes from clinical trials of patients with particular subtypes (e.g. Her2+ or negative disease). That information, on subtypes and responsiveness to particular drugs, is crucial. But we also need to know the big picture, i.e. exactly – give or take a few thousand women – how many are alive now with MBC?

This information might inform research funding, planning of medical and social services, besides understanding the course of the illness and extensiveness of this problem. And if survival has indeed improved, that measurement, straightforward as it should be, might offer hope to those living with the disease, today.

Related Posts:

Science Takes a Double Hit in the Press, Maybe

In his latest New Yorker piece The Truth Wears Off, Jonah Lehrer directs our attention to the lack of reproducibility of results in scientific research. The problem is pervasive, he says:

…now all sorts of well-established, multiply confirmed finding have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon  doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread…

The Decline Effect, as Lehrer calls it, refers to scientists’ inability to reproduce reported results. The problem isn’t simple: it’s not just that different investigators or teams come up with conflicting information, or interpret the same raw data in disparate ways; over time, a single scientist may not be able to reproduce his or her own observations.

Lehrer begin his story with a target loaded with potential bias and conflicts of interest – a 2007 meeting in Brussels of scientists, shrinks and pharma executives contemplating the disappointing results in recent large clinical trials of blockbuster antipsychotic drugs like Abilify, Seroquel and Zyprexa. Initial reports, mainly from the early 1990s, which supported these drugs’ FDA approval and widespread use, turned out to present a too-positive story. Later studies indicate these agents are not as terrific as was advertised; new data call into question the drugs’ effectiveness and safety.

This is probably true, but it’s hardly surprising. It happens in oncology all the time – when drug companies support initial studies of new drugs with an intention to sell those, it’s sometimes the case (and unfortunately frequent) that initial reports are more promising than what really happens after a decades’ worth of less careful (i.e. more open) selection of patients who take an FDA-approved medication. Once you include a broader group of patients in the analysis, whose doctors aren’t researchers whose salaries are supported by the drug makers, the likelihood of getting truthful reports of side effects and effectiveness shoots up.

So I don’t think Lehrer’s big-pharma example is a reasonable shot at the scientific method, per se. Rather, it’s a valid perspective on problems that arise when drug companies sponsor what’s supposed to be objective, scientific research.

Lehrer moves on to what might be purer example of the decline effect. He tells the story of Professor Jonathan Schooler, a now-tenured professor who discovered in the 1980s that humans’ memories are strengthened by the act of describing them. The work is cited often, Lehrer says.

…But while Schooler was publishing these results in highly reputable journals, a secret worry gnawed at him: it was proving difficult to replicate his earlier findings. ‘I’d often still see an effect, but the effect just wouldn’t be as strong.’

Next, Lehrer steps back in history. He relates the story of Joseph Banks Rhine, a psychologist at Duke who in the early 1930s developed an interest in the possibility of extrasensory perception. (Yes, that would be ESP.) Rhine devised experiments to evaluate individuals’ capacity to guess which symbol-bearing cards might be drawn from a deck, before they’re drawn. The initial findings were uncanny: “Rhine documented these stunning results in his notebook and prepared several papers for publication. But then, just as he began to believe in the possibility of extrasensory perception, the student lost his spooky talent…”

Schooler, plagued with self-doubt about his published findings on human memory, as Lehrer tells it, embarked on an “ironic” attempt to replicate Rhine’s work on ESP. In 2004, he set up experiments in which he flashed images and asked a subject to identify those; next he randomly selected some of those images for a second showing, to see if those were more likely to have been identified in the first round.

“The craziness of the hypothesis was the point,” Lehrer says. “But he wasn’t testing extrasensory powers; he was testing the decline effect.” He continues:

‘At first, the data looked amazing, just as we’d expected,’ Schooler says. ‘I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size’ – a standard statistical measure – ‘kept on getting smaller and smaller.’ The scientists eventually tested more than two thousand undergraduates …’We found this strong paranormal effect, but it disappeared on us.’

OK, are we talking science, or X-Files? I find this particular episode – both in its original, depression-era version and in Schooler’s 1990s remake – fascinating, even thought-provoking. But these don’t change my confidence in the scientific method one iota.

He moves on to consider a zoologist in Uppsala, Sweden, who published on symmetry and barn swallows’ mating preferences, aesthetics and genetics whose Nature-published theories on “fluctuating asymmetry” haven’t stood the test of time. After an initial blitz of confirmatory reports and curious, related findings, the observed results diminished. Another scientist, said to have been very enthusiastic about the subject and who tried to reproduce them with studies of symmetry in male horned beetles, couldn’t find an effect. The researcher laments:

‘But the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove…’

Next, Lehrer advances toward a more general discussion on bias in scientific publishing. This can only partly explain the decline effect, he says. Intellectual fads and journal editors’ preferences for new and positive results lead to imbalance in reporting. Publication bias distorts the reporting of positive clinical trials over negative or inconclusive results. No argument here –

Still, the problem goes deeper. Lehrer interviews Richard Palmer, a biologist in Alberta who’s used a statistical method called a funnel plot to evaluate trends in published research findings. What happens, Palmer says, is that researchers are disposed (or vulnerable?, ES) to selective reporting based on their unconscious perceptions of truth and uneven enthusiasm for particular concepts. He gives an example:

…While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the west. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and ever single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six percent of these studies found any therapeutic benefits.

These discrepant reports support that scientists see data in ways that confirm their preconceived ideas. “Our beliefs are a form of blindness,” Lehrer writes. In Wired he quotes Paul Simon: “A man sees what he wants to see and disregards the rest.” The point is clear.

Nearing the end, Lehrer draws on and extends upon David Freedman’s November Atlantic feature, Lies, Damned Lies, and Medical Science, on the critical, outstanding oeuvre of John Ioannidis, a Stanford epidemiologist who elucidates falsehoods in published research.

Re-reading these two articles together, as I did this morning, can be disheartening. “Trust no one,” I recalled. Seems like many – and possibly most – published research papers are untrue or at least exaggerated and/or misleading. But on further and closer review, maybe the evidence for pervasive untruths is not so solid.

In sum, the Truth Wears Off, in last week’s Annals of Science, offers valuable ideas – the decline effect (new), the statistician’s funnel plot (not new, but needing attention) and publication bias (tiresome, but definitely relevant). The ESP story is an obvious weak link in the author’s argument, as is the article’s emphasis and reliance, to some degree, on psychological models and findings in relatively soft fields of research. Physics, genetics, molecular biology and ultimately most aspects of cancer medicine, I know and hope – can be measured, tested and reported objectively.

My approach to new information is always to keep in mind who are my sources, whether those are authors of an article I’m reading or a doctor who’s making a recommendation about a procedure for someone in my family, and the limitations of my own experiences. I’m skeptical about new drugs and medical tools, but determinately open-minded.

The problem is this: if we close our minds to all new findings, we’ll never learn anything. Nor will we ever get better. Sometimes scientific reports are accurate, life-saving or even paradigm-shifting; if only we could know which those are –

“When the experiments are done, we still have to choose what to believe,” Lehrer concludes.

He’s right; I agree. Our choices, though, should be informed – through literacy, multiple sources of information, and common sense.

—–

Related Posts:

Word of the Week: floccinaucinihilipilificationism

ML learned a new word upon reading the newspaper: floccinaucinihilipilificationism. According to the New York Times now, the late Senator Patrick Moynihan prided himself on coining the 32-letter mouthful, by which he meant “the futility of making estimates on the accuracy of public data.”

Some brief history:

Sometime around 1981, Moynihan invented the word by adding “ism” to an older, 29-letter English word, floccinaucinihilipilification – defined as “the action or habit as estimating as worthless” in a 1971 edition of the Oxford English Dictionary:

from the Oxford English Dictionary (1971)

You can find an open discussion of the roots of floccinaucinihilipilificationism on Wiktionary, which includes hard-to-decipher, clickable audiofiles – just in case you want to try saying the word out loud. Moynihan used the word in the title of his 1981 New Yorker review of a book by economist John Kenneth Galbraith.

More accessible is a somewhat dull, but worth-a-listen clip of the Moynihan discussing the word’s history in a debate on the budget deficit in July, 1999, on C-SPAN. From the Congressional Record:

“Floccinaucinihilipilification is now the second longest word in the Oxford Dictionary. It is from a debate in the House of Commons in the 18th century meaning the futility of budgets. They never come out straight…I added “ism” to refer to the institutional nature of this, so it became floccinaucinihilipilificationism. It is no joke. One never gets it right. It is not because one cannot, one does not try…

It seems to me the term, which was intended for the realm of economics, and projections in that, might bear also on the intricacies of vast amounts of data in science and health, data mining, and understanding the limitations of medical studies and related analyses. But I’m extrapolating here, for sure.

As I read the late Senator’s words, about his word, it seems maybe he’s suggesting that we could “get it right,” i.e. sort out data in a way that has real value, if we try harder to do so.

But who knows?

Related Posts:

A Small Study Offers Insight On Breast Cancer Patients’ Capacity and Eagerness to Participate in Medical Decisions

Last week the journal Cancer published a small but noteworthy report on women’s experiences with a relatively new breast cancer decision tool called Oncotype DX. This lab-based technology, which has not received FDA approval, takes a piece of a woman’s tumor and, by measuring expression of 21 genes within, estimates the likelihood, or risk, that her tumor will recur.

As things stand, women who receive a breast cancer diagnosis face difficult decisions regarding the extent of surgery they should undergo (see the New York Timesarticle of last week, with over 200 people weighing in on this ultra-sensitive matter). Once the surgeon has removed the tumor, choices about chemotherapy, hormone modifiers, radiation and other possible treatments challenge even the most informed patients among us.

Oncotype DX and similar techniques, like the FDA-approved Mammaprint, provide a more detailed molecular profile of a malignancy than what’s provided by conventional pathology labs. For women who have early-stage (non-metastatic), estrogen-receptor positive (ER+) breast cancer, this test provides risk-assessment that’s personalized, based on gene expression in the individual’s tumor.

Oncotype DX has been commercially available since 2004. The test “reads” three levels of risk for breast cancer recurrence at 10 years: “low” if the predicted recurrence rate is 11% or less, “intermediate” if the estimated rate falls between 12% and 21%, and high if the risk for recurrence is greater than 21%.

The investigators, based at the University of North Carolina, Chapel Hill, identified women eligible for the study who had an ER+, Stage I or II breast cancer removed and tested with the Oncotype Dx tool between 2004 and 2009. The researchers sent surveys to 104 women, of whom 78 completed the questionnaires and 77 could be evaluated for the study. They distributed the surveys between December, 2008 and May, 2009.

Several factors limit the study results including the small number of participants and  that the women were treated at just one medical center (where the oncologists were, presumably, familiar with Oncotype Dx). The patients were predominantly Caucasian, the majority had a college degree and most were financially secure (over 60% had a household income of greater than $60,000). Nonetheless, the report is interesting and, if confirmed by additional and larger studies involving other complex test results  in cancer treatment decisions, has potentially broad implications for communication between cancer patients and their oncologists.

Some highlights of the findings:

1. The overwhelming majority of women (97% of the survey respondents) recalled receiving information about the Oncotype Dx test from their oncologists. Two-thirds (67%) of those women reported they “understood a large amount or all” of what the doctors told them about their recurrence risk based on the test results.

2. Nearly all of the respondents (96%) said they would undergo the test if they had to decide again, and 95% would recommend the test to other women in the same situation.

3. Over three-quarters, 76% “found the test useful” because it determined whether there was a high chance their cancer would come back.

4. The majority of respondents (71%) accurately recalled their recurrence risk, indicating a number within 4% of that indicated by their personal test results.

Taken together, these findings support that a majority of women with breast cancer whose oncologists shared with them these genomic testing results, and who filled out the surveys, had good or excellent recall of the Oncotype Dx reports and felt that the test was helpful.

As an aside, the women were asked to rate their preferences regarding their personal input in medical decisions. Among the 77 respondents, 38% indicated they prefer to have an active role in medical decisions (meaning that they prefer to make their own decisions regardless of the doctor’s opinion or after “seriously considering” the doctor’s opinion) and 49% indicated they like a shared role, together with their doctors, in medical decisions. Only 13% of the women said they “prefer to leave the decision to <the> doctor.”

What’s striking is that among these women with early-stage breast cancer, 85% said they like to be involved in medical decisions. And 96% said they’d undergo the test again. Most of the women, despite imperfect if not frankly limited numeracy and literacy (as detailed in the publication) felt they understood the gist of what their doctors had told them, and indeed correctly answered questions about the likelihood of their tumor’s recurrence.

The results are encouraging, overall, about women’s eagerness to participate in medical decisions, and their capacity to benefit from information derived from complex, molecular tests.

*The capacity of Oncotype Dx to accurately assess the risk of breast cancer recurrence has been evaluated in previous, published studies including a 2004 publication in The New England Journal of Medicine and a 2006 paper in the Journal of Clinical Oncology. The test is manufactured, run and marketed by Genomic Health, based in Redwood City, California.

The National Cancer Institute lists an ongoing trial for women with hormone receptor-positive, node-positive breast cancer that includes evaluation with the Oncotype Dx tool.

Related Posts:

Beware the Power of Data Handling in Politics (and Medicine)

Into my Google Reader this morning came a post from Biophemera (an intriguing blog at the interface of art and science). Scientist-artist-blogger Jessica Palmer offers a provocative clip featuring Alex Lundry, a self-described conservative political pollster, data-miner and data visualizer.

Alex Lundry Chart Wars: The Political Power of Data Visualization
more about “Alex Lundry Chart Wars: The Political…

Some excerpts:

“These charts are meant to illustrate the political power of data visualization. It’s a discipline that’s only just beginning to bloom as a messaging vehicle…

“So what changed, why now?” he asks rhetorically.

“Well of course the internet…What’s really changed is data. We capture more data, we store more data and more data is available to us in machine-readable parsable format. So it’s really gotten to the point where anybody with a computer can create a data visualization easily enough…

“Here are a few quick lessons in graphical literacy…You’ll see that messing around with the origin and axis can make unimpressive growth look pretty amazing, right?…

——

Scary stuff. We’re vulnerable to brainwashing by pie graphs with pretty colors. Men are hired to collect and represent data with a particular aim. And there’s more to come this way, faster than ever by twitter.

So why here? Why a Medical Lesson?

Because the same is true for health information.

——

One of the first rules of medicine is knowing your sources. Before you make a decision, consider: did you read or hear about a treatment in a textbook, in a reputable journal, at a scientific meeting or over lunch with a representative from a pharmaceutical company?

Immersed in data as we are, it’s tempting to grasp at the best-presented material regardless of its intrinsic value. Nifty graphs can persuade or fool even the best of us.

For patients:

1. Know your doctor – be aware of industry ties, academic connections and other sources of pressure to perceive or publish results more clearly than they are;

2. Distinguish ads from articles about health – the difference is not always clear, especially on-line;

3. Read the fine print and identify the perspective of who’s depicting “data” in charts and graphs – when medical information comes onto your TV screen or magazine page, there’s a good chance someone’s got something to sell you.

For doctors:

1. Remember the difference between peer-reviewed journals and PeerView Press (a CME company with a host of industry sponsors, one of many such that provide free, neatly-packaged information targeted to busy doctors);

2. Take the trouble to read the methods and statistical sections of published papers in your field – your patients are counting on you to discern good studies from bad;

3. Don’t forget we’re human, too. We’re vulnerable, drawn to promising new results –

Mind those origins and axes!

Related Posts:

newsletter software
Get Adobe Flash player