Archive for May, 2020
The Prosecutor’s Fallacy Illustrated
Posted by Bill Storage in Probability and Risk on May 7, 2020
“The first thing we do, let’s kill all the lawyers.” – Shakespeare, Henry VI, Part 2, Act IV
My last post discussed the failure of most physicians to infer the chance a patient has the disease given a positive test result where both the frequency of the disease in the population and the accuracy of the diagnostic test are known. The probability that the patient has the disease can be hundreds or thousands of times lower than the accuracy of the test. The problem in reasoning that leads us to confuse these very different likelihoods is one of several errors in logic commonly called the prosecutor’s fallacy. The important concept is conditional probability. By that we mean simply that the probability of x has a value and that the probability of x given that y is true has a different value. The shorthand for probability of x is p(x) and the shorthand for probability of x given y is p(x|y).
“Punching, pushing and slapping is a prelude to murder,” said prosecutor Scott Gordon during the trial of OJ Simpson for the murder of Nicole Brown. Alan Dershowitz countered with the argument that the probability of domestic violence leading to murder was very remote. Dershowitz (not prosecutor but defense advisor in this case) was right, technically speaking. But he was either as ignorant as the physicians interpreting the lab results or was giving a dishonest argument, or possibly both. The relevant probability was not the likelihood of murder given domestic violence, it was the likelihood of murder given domestic violence and murder. “The courtroom oath – to tell the truth, the whole truth and nothing but the truth – is applicable only to witnesses,” said Dershowitz in The Best Defense. In Innumeracy: Mathematical Illiteracy and Its Consequences. John Allen Paulos called Dershowitz’s point “astonishingly irrelevant,” noting that utter ignorance about probability and risk “plagues far too many otherwise knowledgeable citizens.” Indeed.
The doctors’ mistake in my previous post was confusing
P(positive test result) vs.
P(disease | positive test result)
Dershowitz’s argument confused
P(husband killed wife | husband battered wife) vs.
P(husband killed wife | husband battered wife | wife was killed)
In Reckoning With Risk, Gerd Gigerenzer gave a 90% value for the latter Simpson probability. What Dershowitz cited was the former, which we can estimate at 0.1%, given a wife-battery rate of one in ten, and wife-murder rate of one per hundred thousand. So, contrary to what Dershowitz implied, prior battery is a strong indicator of guilt when a wife has been murdered.
As mentioned in the previous post, the relevant mathematical rule does not involve advanced math. It’s a simple equation due to Pierre-Simon Laplace, known, oddly, as Bayes’ Theorem:
P(A|B) = P(B|A) * P(A) / P(B)
If we label the hypothesis (patient has disease) as D and the test data as T, the useful form of Bayes’ Theorem is
P(D|T) = P(T|D) P(D) / P(T) where P(T) is the sum of probabilities of positive results, e.g.,
P(T) = P(T|D) * P(D) + P(T | not D) * P(not D) [using “not D” to mean “not diseased”]
Cascells’ phrasing of his Harvard quiz was as follows: “If a test to detect a disease whose prevalence is 1 out of 1,000 has a false positive rate of 5 percent, what is the chance that a person found to have a positive result actually has the disease?”
Plugging in the numbers from the Cascells experiment (with the parameters Cascells provided shown below in bold and the correct answer in green):
- P(D) is the disease frequency = 0.001 [ 1 per 1000 in population ] therefore:
- P(not D) is 1 – P(D) = 0.999
- P(T | not D) = 5% = 0.05 [ false positive rate also 5%] therefore:
- P(T | D) = 95% = 0.95 [ i.e, the false negative rate is 5% ]
Substituting:
P(T) = .95 * .001 + .999 * .05 = 0.0509 ≈ 5.1% [ total probability of a positive test ]
P(D|T) = .95 * .001 / .0509 = .0019 ≈ 2% [ probability that patient has disease, given a positive test result ]
Voila.
I hope this seeing is believing illustration of Cascells’ experiment drives the point home for those still uneasy with equations. I used Cascells’ rates and a population of 100,000 to avoid dealing with fractional people:

Extra credit: how exactly does this apply to Covid, news junkies?
Edit 5/21/20. An astute reader called me on an inaccuracy in the diagram. I used an approximation, without identifying it. P = r1/r2 is a cheat for P = 1 – Exp(- r1/r2). The approximation is more intuitive, though technically wrong. It’s a good cheat, for P values less that 10%.
Note 5/22/20. In response to questions about how this sort of thinking bears on coronavirus testing -what test results say about prevalence – consider this. We really have one equation in 3 unknowns here: false positive rate, false negative rate, and prevalence in population. A quick Excel variations study using false positive rates from 1 to 20% and false neg rates from 1 to 3 percent, based on a quick web search for proposed sensitivity/specificity for the Covid tests is revealing. Taking the low side of the raw positive rates from the published data (1 – 3%) results in projected prevalence roughly equal to the raw positive rates. I.e., the false positives and false negatives happen to roughly wash out in this case. That also leaves P(d|t) in the range of a few percent.

Innumeracy and Overconfidence in Medical Training
Posted by Bill Storage in History of Science on May 4, 2020
Most medical doctors, having ten or more years of education, can’t do simple statistics calculations that they were surely able to do, at least for a week or so, as college freshmen. Their education has let them down, along with us, their patients. That education leaves many doctors unquestioning, unscientific, and terribly overconfident.
A disturbing lack of doubt has plagued medicine for thousands of years. Galen, at the time of Marcus Aurelius, wrote, “It is I, and I alone, who has revealed the true path of medicine.” Galen disdained empiricism. Why bother with experiments and observations when you own the truth. Galen’s scientific reasoning sounds oddly similar to modern junk science armed with abundant confirming evidence but no interest in falsification. Galen had plenty of confirming evidence: “All who drink of this treatment recover in a short time, except those whom it does not help, who all die. It is obvious, therefore, that it fails only in incurable cases.”
Galen was still at work 1500 years later when Voltaire wrote that the art of medicine consisted of entertaining the patient while nature takes its course. One of Voltaire’s novels also described a patient who had survived despite the best efforts of his doctors. Galen was around when George Washington died after five pints of bloodletting, a practice promoted up to the early 1900s by prominent physicians like Austin Flint.
But surely medicine was mostly scientific by the 1900s, right? Actually, 20th century medicine was dragged kicking and screaming to scientific methodology. In the early 1900’s Ernest Amory Codman of Massachusetts General proposed keeping track of patients and rating hospitals according to patient outcome. He suggested that a doctor’s reputation and social status were poor measures of a patient’s chance of survival. He wanted the track records of doctors and hospitals to be made public, allowing healthcare consumers to choose suppliers based on statistics. For this, and for his harsh criticism of those who scoffed at his ideas, Codman was tossed out of Mass General, lost his post at Harvard, and was suspended from the Massachusetts Medical Society. Public outcry brought Codman back into medicine, and much of his “end results system” was put in place.
20th century medicine also fought hard against the concept of controlled trials. Austin Bradford Hill introduced the concept to medicine in the mid 1920s. But in the mid 1950s Dr. Archie Cochrane was still fighting valiantly against what he called the God Complex in medicine, which was basically the ghost of Galen; no one should question the authority of a physician. Cochrane wrote that far too much of medicine lacked any semblance of scientific validation and knowing what treatments actually worked. He wrote that the medical establishment was hostile the idea of controlled trials. Cochrane fought this into the 1970s, authoring Effectiveness and Efficiency: Random Reflections on Health Services in 1972.
Doctors aren’t naturally arrogant. The God Complex is passed passed along during the long years of an MD’s education and internship. That education includes rights of passage in an old boys’ club that thinks sleep deprivation builds character in interns, and that female med students should make tea for the boys. Once on the other side, tolerance of archaic norms in the MD culture seems less offensive to the inductee, who comes to accept the system. And the business of medicine, the way it’s regulated, and its control by insurance firms, pushes MDs to view patients as a job to be done cost-effectively. Medical arrogance is in a sense encouraged by recovering patients who might see doctors as savior figures.
As Daniel Kahneman wrote, “generally, it is considered a weakness and a sign of vulnerability for clinicians to appear unsure.” Medical overconfidence is encouraged by patients’ preference for doctors who communicate certainties, even when uncertainty stems from technological limitations, not from doctors’ subject knowledge. MDs should be made conscious of such dynamics and strive to resist inflating their self importance. As Allan Berger wrote in Academic Medicine in 2002, “we are but an instrument of healing, not its source.”
Many in medical education are aware of these issues. The calls for medical education reform – both content and methodology – are desperate, but they are eerily similar to those found in a 1924 JAMA article, Current Criticism of Medical Education.
Covid19 exemplifies the aspect of medical education I find most vile. Doctors can’t do elementary statistics and probability, and their cultural overconfidence renders them unaware of how critically they need that missing skill.
A 1978 study, brought to the mainstream by psychologists like Kahnemann and Tversky, showed how few doctors know the meaning of a positive diagnostic test result. More specifically, they’re ignorant of the relationship between the sensitivity and specificity (true positive and true negative rates) of a test and the probability that a patient who tested positive has the disease. This lack of knowledge has real consequences In certain situations, particularly when the base rate of the disease in a population is low. The resulting probability judgements can be wrong by factors of hundreds or thousands.
In the 1978 study (Cascells et. al.) doctors and medical students at Harvard teaching hospitals were given a diagnostic challenge. “If a test to detect a disease whose prevalence is 1 out of 1,000 has a false positive rate of 5 percent, what is the chance that a person found to have a positive result actually has the disease?” As described, the true positive rate of the diagnostic test is 95%. This is a classic conditional-probability quiz from the second week of a probability class. Being right requires a), knowing Bayes Theorem, and b), being able to multiply and divide. Not being confidently wrong requires only one thing: scientific humility – the realization that all you know might be less than all there is to know. The correct answer is 2% – there’s a 2% likelihood the patient has the disease. The most common response, by far, in the 1978 study was 95%, which is wrong by 4750%. Only 18% of doctors and med students gave the correct response. The study’s authors observed that in the group tested, “formal decision analysis was almost entirely unknown and even common-sense reasoning about the interpretation of laboratory data was uncommon.”
As mentioned above, this story was heavily publicized in the 80s. It was widely discussed by engineering teams, reliability departments, quality assurance groups and math departments. But did it impact medical curricula, problem-based learning, diagnostics training, or any other aspect of the way med students were taught? One might have thought yes, if for no reason than to avoid criticism by less prestigious professions having either the relevant knowledge of probability or the epistemic humility to recognize that the right answer might be far different from the obvious one.
Similar surveys were done in 1984 (David M Eddy) and in 2003 (Kahan, Paltiel) with similar results. In 2013, Manrai and Bhatia repeated Cascells’ 1978 survey with the exact same wording, getting trivially better results. 23% answered correctly. They suggesting that medical education “could benefit from increased focus on statistical inference.” That was 35 years after Cascells, during which, the phenomenon was popularized by the likes of Daniel Kahneman, from the perspective of base-rate neglect, by Philip Tetlock, from the perspective of overconfidence in forecasting, and by David Epstein, from the perspective of the tyranny of specialization.
Over the past decade, I’ve asked the Cascells question to doctors I’ve known or met, where I didn’t think it would get me thrown out of the office or booted from a party. My results were somewhat worse. Of about 50 MDs, four answered correctly or were aware that they’d need to look up the formula but knew that it was much less than 95%. One was an optometrist, one a career ER doc, one an allergist-immunologist, and one a female surgeon – all over 50 years old, incidentally.
Despite the efforts of a few radicals in the Accreditation Council for Graduate Medical Education and some post-Flexnerian reformers, medical education remains, as Jonathan Bush points out in Tell Me Where It Hurts, basically a 2000 year old subject-based and lecture-based model developed at a time when only the instructor had access to a book. Despite those reformers, basic science has actually diminished in recent decades, leaving many physicians with less of a grasp of scientific methodology than that held by Ernest Codman in 1915. Medical curriculum guardians, for the love of God, get over your stodgy selves and replace the calculus badge with applied probability and statistical inference from diagnostics. Place it later in the curriculum later than pre-med, and weave it into some of that flipped-classroom, problem-based learning you advertise.