Archive for category Philosophy of Science

Deficient Discipleship in Environmental Science

Bear with me here.

Daniel Oprean’s “Portraits of Deficient Discipleship” (Kairos, 2024) argues that Gospel Matthew 8:18–27 presents three kinds of failed or immature discipleship, each corrected by Jesus’s response.

Oprean reads Matthew 19–20 as discipleship without costs. The “enthusiastic scribe” volunteers to follow Jesus but misunderstands the teacher he’s addressing. His zeal lacks awareness of cost. Jesus’s lament about having “nowhere to lay his head,” Oprean says, reveals that true discipleship entails homelessness, marginalization, and suffering.

As an instance of discipleship without commitment (vv. 21–22), a second disciple hesitates. His request to bury his father provokes Jesus’s radical command: “Follow me, and let the dead bury their own dead.” Oprean takes this as divided loyalty, a failure of commitment even among genuine followers.

Finally comes discipleship without hardships (vv. 23–27). The boat-bound disciples obey but panic in the storm. Their fear shows lack of trust. Jesus rebukes their “little faith.” His calming of the sea becomes a paradigm of faith maturing only through trial.

Across these scenes, Matthew’s Jesus confronts enthusiasm without realism, religiosity without surrender, faith without endurance. Authentic discipleship, Oprean concludes, must include cost, commitment, and hardship.

Oprean’s essay is clear and perfectly conventional evangelical exegesis. The tripartite symmetry – cost, commitment, hardship – works neatly, though it imposes a moral taxonomy on what Matthew presents as narrative tension (a pale echo of Mark’s deeper ironies). Each scene may concern not moral failure but stages of revelation: curiosity, obedience, awe. By moralizing them, Oprean flattens Matthew’s literary dynamism and theological ambiguity for devotional ends.

His dependence on the standard commentators – Gundry, Keener, Bruner – keeps him in the well-worn groove. There’s no attention to Matthew’s redactional strategy, the eschatological charge of “Son of Man” in v. 20, or the symbolic link between the sea miracle and Israel’s deliverance. The piece is descriptive, not interpretive; homiletic rather than analytic. The unsettling portrait of discipleship becomes a sermon outline about piety instead of a crisis in perception.

Fair enough, you say – there’s nothing wrong with devotional writing. True. The problem is devotional writing costumed as analysis and published as scholarship. He isn’t interrogating the text. If he were, he’d ask: Why does Matthew place these episodes together? How does “Son of Man” invoke Danielic or apocalyptic motifs? What does the sea episode reveal about Jesus’s authority over creation itself? Instead, Oprean turns inward, toward exhortation.

It’s an odd hybrid genre – half sermon, half commentary – anchored in evangelical assumptions about the text’s unity and moral purpose. Critical possibilities are excluded from the start. There’s no discussion of redactional intent, no engagement with Second-Temple expectations of the huios tou anthrōpou, no awareness that “stilling the sea” echoes both Genesis and Exodus motifs of creation and deliverance.

This is scholarship only in the confessional sense of “biblical studies,” where the aim is to explain what discipleship should mean according to current theological norms. It’s homiletics, not analysis.


But my quarrel isn’t really with Oprean. He’s the symptom, not the cause. His paper stands for a broader phenomenon – pseudonymous scholarship: writing that borrows the visual grammar of academic work (citations, subheadings, DOIs, statistical jargon) while serving ideological ends.

You can find parallels across the sciences. In the early 2000s, string theory was on the altar. Articles in Foundations of Physics or in Studies in History and Philosophy of Modern Physics carried the trappings of rigor but were effectively apologias for the “beauty” of untestable theories. “Mathematical consistency,” we were told, “is experimental evidence.” The logic matches Oprean’s: inward coherence replaces external test.

Climate science has its mirror image in policy-driven venues like Energy & Environment or think-tank white papers formatted as peer-reviewed studies. They reproduce the scaffolding of scholarship while narrowing inquiry to confirm prior skepticism.

The rhetorical pattern is the same:

  1. Scholarly mimicry: heavy citation and technical diction confer legitimacy.
  2. Rhetorical closure: conclusions are known before the analysis begins.
  3. Audience reassurance: readers are not challenged but comforted.
  4. Boundary play: the work hovers between analysis and advocacy, critique and catechism.

This month’s Sage journal offers a case that makes Oprean look like Richard Feynman. “Dynamic Effect of Green Financing, Economic Development, Renewable Energy and Trade Facilitation on Environmental Sustainability in Developing and Developed Countries” by Usman Ali et al. exhibits the same performative scholarship. The surface polish of method and technical vocabulary hides an absence of real inquiry.

Written in the formal cadence of econometrics – Dynamic Fixed Effects, GEE, co-integration, Sargan tests – it brandishes its methods as credentials rather than arguments. No model specifications, variable definitions, or theoretical tensions appear. “Dynamic” and “robustness” are prestige words, not analytic ones.

Ali’s paper deploys three grand frameworks – Sustainable Development Theory, Innovation Theory, and the Environmental Kuznets Curve – as if piling them together produced insight. But these models conflict! The EKC’s inverted-U relationship between income and pollution is empirically shaky, and no attempt is made to reconcile contradictions. The gesture is interdisciplinary theater: breadth without synthesis.

At least Oprean’s homiletics are harmless. Ali’s conclusion doubles as policy: developed countries must integrate renewables – “science says so.” It’s a sermon in technocratic garb.

Across these domains, and unfortunately many others, we see the creeping genre of methodological theater: environmental-finance papers that treat regressions as theology; equations and robustness tests as icons of faith. The altar may change – from Galilee to global sustainability – but the liturgy is the same.


“The separation of state and church must be complemented by the separation of state and science, that most recent, most aggressive, and most dogmatic religious institution.” Paul Feyerabend, Against Method, 1975

Science clergy

, , , , , ,

3 Comments

The End of Science Again

Dad says enough of this biblical exegesis and hermeneutics nonsense. He wants more science and history of science for iconoclasts and Kuhnians. I said that if prophetic exegesis was good enough for Isaac Newton – who spent most of his writing life on it – it’s good enough for me. But to keep the family together around the spectroscope, here’s another look at what’s gone terribly wrong with institutional science.

It’s been thirty years since John Horgan wrote The End of Science, arguing that fundamental discovery was nearing its end. He may have overstated the case, but his diagnosis of scientific fatigue struck a nerve. Horgan claimed that major insights – quantum mechanics, relativity, the big bang, evolution, the double helix – had already given us a comprehensive map of reality unlikely to change much. Science, he said, had become a victim of its own success, entering a phase of permanent normality, to borrow Thomas Kuhn’s term. Future research, in his view, would merely refine existing paradigms, pose unanswerable questions, or spin speculative theories with no empirical anchor.

Horgan still stands by that thesis. He notes the absence of paradigm-shifting revolutions and a decline in disruptive research. A 2023 Nature study analyzed forty-five million papers and nearly four million patents, finding a sharp drop in genuinely groundbreaking work since the mid-twentieth century. Research increasingly consolidates what’s known rather than breaking new ground. Horgan also raises the philosophical point that some puzzles may simply exceed our cognitive reach – a concern with deep historical roots. Consider consciousness, quantum interpretation, or other problems that might mark the brain’s limits. Perhaps AI will push those limits outward.

Students of History of Science will think of Auguste Comte’s famous claim that we’d never know the composition of the stars. He wasn’t stupid, just cautious. Epistemic humility. He knew collecting samples was impossible. What he couldn’t foresee was spectrometry, where the wavelengths of light a star emits reveal the quantum behavior of its electrons. Comte and his peers could never have imagined that; it was data that forced quantum mechanics upon us.

The same confidence of finality carried into the next generation of physics. In 1874, Philipp von Jolly reportedly advised young Max Planck not to pursue physics, since it was “virtually a finished subject,” with only small refinements left in measurement. That position was understandable: Maxwell’s equations unified electromagnetism, thermodynamics was triumphant, and the Newtonian worldview seemed complete. Only a few inconvenient anomalies remained.

Albert Michelson, in 1894, echoed the sentiment. “Most of the grand underlying principles have been firmly established,” he said. Physics had unified light, electricity, magnetism, and heat; the periodic table was filled in; the atom looked tidy. The remaining puzzles – Mercury’s orbit, blackbody radiation – seemed minor, the way dark matter does to some of us now. He was right in one sense: he had interpreted his world as coherently as possible with the evidence he had. Or had he?

Michelson’s remark came after his own 1887 experiment with Morley – the one that failed to detect Earth’s motion through the ether and, in hindsight, cracked the door to relativity. The irony is enormous. He had already performed the experiment that revealed something was deeply wrong, yet he didn’t see it that way. The null result struck him as a puzzle within the old paradigm, not a death blow to it. The idea that the speed of light might be constant for all observers, or that time and space themselves might bend, was too far outside the late-Victorian imagination. Lorentz, FitzGerald, and others kept right on patching the luminiferous ether.

Logicians will recognize the case for pessimistic meta-induction here: past prognosticators have always been wrong about the future, and inductive reasoning says they will be wrong again. Horgan may think his case is different, but I can’t see it. He was partially right, but overconfident about completeness – treating current theories as final, just as Comte, von Jolly, and Michelson once did.

Where Horgan was most right – territory he barely touched – is in seeing that institutions now ensure his prediction. Science stagnates not for lack of mystery but because its structures reward safety over risk. Peer review, grant culture, and the fetish for incrementalism make Kuhnian normal science permanent. Scientific American canned Horgan soon after The End of Science appeared. By the mid-90s, the magazine had already crossed the event horizon of integrity.

While researching his book, Horgan interviewed Edward Witten, already the central figure in the string-theory marketing machine. Witten rejected Kuhn’s model of revolutions, preferring a vision of seamless theoretical progress. No surprise. Horgan seemed wary of Witten’s confidence. He sensed that Witten’s serene belief in an ever-tightening net of theory was itself a symptom of closure.

From a Feyerabendian perspective, the irony is perfect. Paul Feyerabend would say that when a scientific culture begins to prize formal coherence, elegance, and mathematical completeness over empirical confrontation, it stops being revolutionary. In that sense, the Witten attitude itself initiates the decline of discovery.

String theory is the perfect case study: an extraordinary mathematical construct that’s absorbed immense intellectual capital without yielding a falsifiable prediction. To a cynic (or realist), it looks like a priesthood refining its liturgy. The Feyerabendian critique would be that modern science has been rationalized to death, more concerned with internal consistency and social prestige than with the rude encounter between theory and world. Witten’s world has continually expanded a body of coherent claims – they hold together, internally consistent. But science does not run on a coherence model of truth. It demands correspondence. (Coherence vs. correspondence models of truth was a big topic in analytic philosophy in the last century.) By correspondence theory of truth, we mean that theories must survive the test against nature. The creation of coherent ideas means nothing without it. Experience trumps theory, always – the scientific revolution in a nutshell.

Horgan didn’t say – though he should have – that Witten’s aesthetic of mathematical beauty has institutionalized epistemic stasis. The problem isn’t that science has run out of mysteries, as Horgan proposed, but that its practitioners have become too self-conscious, too invested in their architectures to risk tearing them down. Galileo rolls over.

Horgan sensed the paradox but never made it central. His End of Science was sociological and cognitive; a Feyerabendian would call it ideological. Science has become the very orthodoxy it once subverted.

, , , ,

4 Comments

Grains of Truth: Science and Dietary Salt

Science doesn’t proceeds in straight lines. It meanders, collides, and battles over its big ideas. Thomas Kuhn’s view of science as cycles of settled consensus punctuated by disruptive challenges is a great way to understand this messiness, though later approaches, like Imre Lakatos’s structured research programs, Paul Feyerabend’s radical skepticism, and Bruno Latour’s focus on science’s social networks have added their worthwhile spins. This piece takes a light look, using Kuhn’s ideas with nudges from Feyerabend, Lakatos, and Latour, at the ongoing debate over dietary salt, a controversy that’s nuanced and long-lived. I’m not looking for “the truth” about salt, just watching science in real time.

Dietary Salt as a Kuhnian Case Study

The debate over salt’s role in blood pressure shows how science progresses, especially when viewed through the lens of Kuhn’s philosophy. It highlights the dynamics of shifting paradigms, consensus overreach, contrarian challenges, and the nonlinear, iterative path toward knowledge. This case reveals much about how science grapples with uncertainty, methodological complexity, and the interplay between evidence, belief, and rhetoric, even when relatively free from concerns about political and institutional influence.

In The Structure of Scientific Revolutions, Kuhn proposed that science advances not steadily but through cycles of “normal science,” where a dominant paradigm shapes inquiry, and periods of crisis that can result in paradigm shifts. The salt–blood pressure debate, though not as dramatic in consequence as Einstein displacing Newton or as ideologically loaded as climate science, exemplifies these principles.

Normal Science and Consensus

Since the 1970s, medical authorities like the World Health Organization and the American Heart Association have endorsed the view that high sodium intake contributes to hypertension and thus increases cardiovascular disease (CVD) risk. This consensus stems from clinical trials such as the 2001 DASH-Sodium study, which demonstrated that reducing salt intake significantly (from 8 grams per day to 4) lowered blood pressure, especially among hypertensive individuals. This, in Kuhn’s view, is the dominant paradigm.

This framework – “less salt means better health” – has guided public health policies, including government dietary guidelines and initiatives like the UK’s salt reduction campaign. In Kuhnian terms, this is “normal science” at work. Researchers operate within an accepted model, refining it with meta-analyses and Randomized Control Trials, seeking data to reinforce it, and treating contradictory findings as anomalies or errors. Public health campaigns, like the AHA’s recommendation of less than 2.3 g/day of sodium, reflect this consensus. Governments’ involvement embodies institutional support.

Anomalies and Contrarian Challenges

However, anomalies have emerged. For instance, a 2016 study by Mente et al. in The Lancet reported a U-shaped curve; both very low (less than 3 g/day) and very high (more than 5 g/day) sodium intakes appeared to be associated with increased CVD risk. This challenged the linear logic (“less salt, better health”) of the prevailing model. Although the differences in intake were not vast, the implications questioned whether current sodium guidelines were overly restrictive for people with normal blood pressure.

The video Salt & Blood Pressure: How Shady Science Sold America a Lie mirrors Galileo’s rhetorical flair, using provocative language such as “shady science” to challenge the establishment. Like Galileo’s defense of heliocentrism, contrarians in the salt debate (researchers like Mente) amplify anomalies to question dogma, sometimes exaggerating flaws in early studies (e.g., Lewis Dahl’s rat experiments) or alleging conspiracies (e.g., pharmaceutical influence). More in Feyerabend’s view than in Kuhn’s, this exaggeration and rhetoric might be desirable. It’s useful. It provides the challenges that the paradigm should be able to overcome to remain dominant.

These challenges haven’t led to a paradigm shift yet, as the consensus remains robust, supported by RCTs and global health data. But they highlight the Kuhnian tension between entrenched views and emerging evidence, pushing science to refine its understanding.

Framing the issue as a contrarian challenge might go something like this:

Evidence-based medicine sets treatment guidelines, but evidence-based medicine has not translated into evidence-based policy. Governments advise lowering salt intake, but that advice is supported by little robust evidence for the general population. Randomized controlled trials have not strongly supported the benefit of salt reduction for average people. Indeed, we see evidence that low salt might pose as great a risk.

Sodium Intake vs. Cardiovascular Disease Risk

Sodium Intake vs. Cardiovascular Disease Risk. Based on Mente (2016) and O’Donnell (2014).

Methodological Challenges

The question “Is salt bad for you?” is ill-posed. Evidence and reasoning say this question oversimplifies a complex issue: sodium’s effects vary by individual (e.g., salt sensitivity, genetics), diet (e.g., processed vs. whole foods), and context (e.g., baseline blood pressure, activity level). Science doesn’t deliver binary truths. Modern science gives probabilistic models, refined through iterative testing.

While randomized controlled trials (RCTs) have shown that reducing sodium intake can lower blood pressure, especially in sensitive groups, observational studies show that extremely low sodium is associated with poor health. This association may signal reverse causality, an error in reasoning. The data may simply reveal that sicker people eat less, not that they are harmed by low salt. This complexity reflects the limitations of study design and the challenges of isolating causal relationships in real-world populations. The above graph is a fairly typical dose-response curve for any nutrient.

The salt debate also underscores the inherent difficulty of studying diet and health. Total caloric intake, physical activity, genetic variation, and compliance all confound the relationship between sodium and health outcomes. Few studies look at salt intake as a fraction of body weight. If sodium recommendations were expressed as sodium density (mg/kcal), it might help accommodate individual energy needs and eating patterns more effectively.

Science as an Iterative Process

Despite flaws in early studies and the polemics of dissenters, the scientific communities continue to refine its understanding. For example, Japan’s national sodium reduction efforts since the 1970s have coincided with significant declines in stroke mortality, suggesting real-world benefits to moderation, even if the exact causal mechanisms remain complex.

Through a Kuhnian lens, we see a dominant paradigm shaped by institutional consensus and refined by accumulating evidence. But we also see the system’s limits: anomalies, confounding variables, and methodological disputes that resist easy resolution.

Contrarians, though sometimes rhetorically provocative or methodologically uneven, play a crucial role. Like the “puzzle-solvers” and “revolutionaries” in Kuhn’s model, they pressure the scientific establishment to reexamine assumptions and tighten methods. This isn’t a flaw in science; it’s the process at work.

Salt isn’t simply “good” or “bad.” The better scientific question is more conditional: How does salt affect different individuals, in which contexts, and through what mechanisms? Answering this requires humility, robust methodology, and the acceptance that progress usually comes in increments. Science moves forward not despite uncertainty, disputation and contradiction but because of them.

, , , ,

5 Comments

After the Applause: Heilbron Rereads Feyerabend

A decade ago, in a Science, Technology and Society (STS) roundtable, I brought up Paul Feyerabend, who was certainly familiar to everyone present. I said that his demand for a separation of science and state – his call to keep science from becoming a tool of political authority – seemed newly relevant in the age of climate science and policy entanglement. Before I could finish the thought, someone cut in: “You can’t use Feyerabend to support republicanism!”

I hadn’t made an argument. Feyerabend was being claimed as someone who belonged to one side of a cultural war. His ideas were secondary. That moment stuck with me, not because I was misunderstood, but because Feyerabend was. And maybe he would have loved that. He was ambiguous by design. The trouble is that his deliberate opacity has hardened, over time, into distortion.

Feyerabend survives in fragments and footnotes. He’s the folk hero who overturned Method and danced on its ruins. He’s a cautionary tale: the man who gave license to science denial, epistemic relativism, and rhetorical chaos. You’ll find him invoked in cultural studies and critiques of scientific rationality, often with little more than the phrase “anything goes” as evidence. He’s also been called “the worst enemy of science.”

Against Method is remembered – or reviled – as a manifesto for intellectual anarchy. But “manifesto” doesn’t fit at all. It didn’t offer a vision, a list of principles, or a path forward. It has no normative component. It offered something stranger: a performance.

Feyerabend warned readers in the preface that the book would contradict itself, that it wasn’t impartial, and that it was meant to persuade, not instruct. He said – plainly and explicitly – that later parts would refute earlier ones. It was, in his words, a “tendentious” argument. And yet neither its admirers nor its critics have taken that warning seriously.

Against Method has become a kind of Rorschach test. For some, it’s license; for others, sabotage. Few ask what Feyerabend was really doing – or why he chose that method to attack Method. A few of us have long argued that Against Method has been misread. It was never meant as a guidebook or a threat, but as a theatrical critique staged to provoke and destabilize something that badly needed destabilizing.

That, I was pleased to learn, is also the argument made quietly and precisely in the last published work of historian John Heilbron. It may be the most honest reading of Feyerabend we’ve ever had.

John once told me that, unlike Kuhn, he had “the metabolism of a historian,” a phrase that struck me later as a perfect self-diagnosis: patient, skeptical, and slow-burning. He’d been at Berkeley when Feyerabend was still strutting the halls in full flair – the accent, the dramatic pronouncements, the partying. John didn’t much like him. He said so over lunch, on walks, at his house or mine. Feyerabend was hungry for applause, and John disapproved of his personal appetites and the way he flaunted them.

And yet… John’s recent piece on Feyerabend – the last thing he ever published – is microscopically delicate, charitable, and clear-eyed. John’s final chapter in Stefano Gattei’s recent book, Feyerabend in Dialogue, contains no score-settling, no demolition. Just a forensic mind trained to separate signal from noise. If Against Method is a performance, Heilbron doesn’t boo it offstage. He watches it again, closely, and tells us how it was done. Feyerabend through Heilbron’s lens is a performance reframed.

If anyone was positioned to make sense of Feyerabend, rhetorically, philosophically, and historically, it was Heilbron – Thomas Kuhn’s first graduate student, a lifelong physicist-turned-historian, and an expert on both early modern science and quantum theory’s conceptual tangles. His work on Galileo, Bohr, and the Scientific Revolution was always precise, occasionally sly, and never impressed by performance for performance’s sake.

That care is clearest in his treatment of Against Method’s most famous figure: Galileo. Feyerabend made Galileo the centerpiece of his case against scientific method – not as a heroic rationalist, but as a cunning rhetorician who won not because of superior evidence, but because of superior style. He compared Galileo to Goebbels, provocatively, to underscore how persuasion, not demonstration, drove the acceptance of heliocentrism. In Feyerabend’s hands, Galileo became a theatrical figure, a counterweight to the myth of Enlightenment rationality.

Heilbron dismantles this with the precision of someone who has lived in Galileo’s archives. He shows that while Galileo lacked a modern theory of optics, he was not blind to his telescope’s limits. He cross-checked, tested, and refined. He triangulated with terrestrial experiments. He understood that instruments could deceive, and worked around that risk with repetition and caution. The image of Galileo as a showman peddling illusions doesn’t hold up. Galileo, flaws acknowledged, was a working proto-scientist, attentive to the fragility of his tools.

Heilbron doesn’t mythologize Galileo; his 2010 Galileo makes that clear. But he rescues Galileo from Feyerabend’s caricature. In doing so, he models something Against Method never offered: a historically grounded, philosophically rigorous account of how science proceeds when tools are new, ideas unstable, and theory underdetermined by data.

To be clear, Galileo was no model of transparency. He framed the Dialogue as a contest between Copernicus and Ptolemy, though he knew Tycho Brahe’s hybrid system was the more serious rival. He pushed his theory of tides past what his evidence could support, ignoring counterarguments – even from Cardinal Bellarmine – and overstating the case for Earth’s motion.

Heilbron doesn’t conceal these. He details them, but not to dismiss. For him, these distortions are strategic flourishes – acts of navigation by someone operating at the edge of available proof. They’re rhetorical, yes, but grounded in observation, subject to revision, and paid for in methodological care.

That’s where the contrast with Feyerabend sharpens. Feyerabend used Galileo not to advance science, but to challenge its authority. More precisely, to challenge Method as the defining feature of science. His distortions – minimizing Galileo’s caution, questioning the telescope, reimagining inquiry as theater – were made not in pursuit of understanding, but in service of a larger philosophical provocation. This is the line Heilbron quietly draws: Galileo bent the rules to make a case about nature; Feyerabend bent the past to make a case about method.

In his final article, Heilbron makes four points. First, that the Galileo material in Against Method – its argumentative keystone – is historically slippery and intellectually inaccurate. Feyerabend downplays empirical discipline and treats rhetorical flourish as deception. Heilbron doesn’t call this dishonest. He calls it stagecraft.

Second, that Feyerabend’s grasp of classical mechanics, optics, and early astronomy was patchy. His critique of Galileo’s telescope rests on anachronistic assumptions about what Galileo “should have” known. He misses the trial-based, improvisational reasoning of early instrumental science. Heilbron restores that context.

Third, Heilbron credits Feyerabend’s early engagement with quantum mechanics – especially his critique of von Neumann’s no-hidden-variables proof and his alignment with David Bohm’s deterministic alternative. Feyerabend’s philosophical instincts were sharp.

And fourth, Heilbron tracks how Feyerabend’s stance unraveled – oscillating between admiration and disdain for Popper, Bohr, and even his earlier selves. He supported Bohm against Bohr in the 1950s, then defended Bohr against Popper in the 1970s. Heilbron doesn’t call this hypocrisy. He calls it instability built into the project itself: Feyerabend didn’t just critique rationalism – he acted out its undoing. If this sounds like a takedown, it isn’t. It’s a reconstruction – calm, slow, impartial. The rare sort that shows us not just what Feyerabend said, but where he came apart.

Heilbron reminds us what some have forgotten and many more never knew: that Feyerabend was once an insider. Before Against Method, he was embedded in the conceptual heart of quantum theory. He studied Bohm’s challenge to Copenhagen while at LSE, helped organize the 1957 Colston symposium in Bristol, and presented a paper there on quantum measurement theory. He stood among physicists of consequence – Bohr, Bohm, Podolsky, Rosen, Dirac, and Pauli – all struggling to articulate alternatives to an orthodoxy – Copenhagen Interpretation – that they found inadequate.

With typical wit, Heilbron notes that von Neumann’s no-hidden-variables proof “was widely believed, even by people who had read it.” Feyerabend saw that dogma was hiding inside the math – and tried to smoke it out.

Late in life, Feyerabend’s provocations would ripple outward in unexpected directions. In a 1990 lecture at Sapienza University, Cardinal Joseph Ratzinger – later Pope Benedict XVI – quoted Against Method approvingly. He cited Feyerabend’s claim that the Church had been more reasonable than Galileo in the affair that defined their rupture. When Ratzinger’s 2008 return visit was canceled due to protests about that quotation, the irony was hard to miss. The Church, once accused of silencing science, was being silenced by it, and stood accused of quoting a philosopher who spent his life telling scientists to stop pretending they were priests.

We misunderstood Feyerabend not because he misled us, but because we failed to listen the way Heilbron did.

, , , , , , , , , ,

2 Comments

Anarchy and Its Discontents: Paul Feyerabend’s Critics

(For and against Against Method)

Paul Feyerabend’s 1975 Against Method and his related works made bold claims about the history of science, particularly the Galileo affair. He argued that science progressed not because of adherence to any specific method, but through what he called epistemological anarchism. He said that Galileo’s success was due in part to rhetoric, metaphor, and politics, not just evidence.

Some critics, especially physicists and historically rigorous philosophers of science, have pointed out technical and historical inaccuracies in Feyerabend’s treatment of physics. Here are some examples of the alleged errors and distortions:

Misunderstanding Inertial Frames in Galileo’s Defense of Copernicanism

Feyerabend argued that Galileo’s arguments for heliocentrism were not based on superior empirical evidence, and that Galileo used rhetorical tricks to win support. He claimed that Galileo simply lacked any means of distinguishing heliocentric from geocentric models empirically, so his arguments were no more rational than those of Tycho Brahe and other opponents.

His critics responded by saying that Galileo’s arguments based on the phases of Venus and Jupiter’s moons were empirically decisive against the Ptolemaic model. This is unarguable, though whether Galileo had empirical evidence to overthrow Tycho Brahe’s hybrid model is a much more nuanced matter.

Critics like Ronald Giere, John Worrall, and Alan Chalmers (What Is This Thing Called Science?) argued that Feyerabend underplayed how strong Galileo’s observational case actually was. They say Feyerabend confused the issue of whether Galileo had a conclusive argument with whether he had a better argument.

This warrants some unpacking. Specifically, what makes an argument – a model, a theory – better? Criteria might include:

  • Empirical adequacy – Does the theory fit the data? (Bas van Fraassen)
  • Simplicity – Does the theory avoid unnecessary complexity? (Carl Hempel)
  • Coherence – Is it internally consistent? (Paul Thagard)
  • Explanatory power – Does it explain more than rival theories? (Wesley Salmon)
  • Predictive power – Does it generate testable predictions?  (Karl Popper, Hempel)
  • Fertility – Does it open new lines of research? (Lakatos)

 Some argue that Galileo’s model (Copernicanism, heliocentrism) was obviously simpler than Brahe’s. But simplicity opens another can of philosophical worms. What counts as simple? Fewer entities? Fewer laws? More symmetry? Copernicus had simpler planetary order but required a moving Earth. And Copernicus still relied on epicycles, so heliocentrism wasn’t empirically simpler at first. Given the evidence of the time, a static Earth can be seen as simpler; you don’t need to explain the lack of wind and the “straight” path of falling bodies. Ultimately, this point boils down to aesthetics, not math or science. Galileo and later Newtonians valued mathematical elegance and unification. Aristotelians, the church, and Tychonians valued intuitive compatibility with observed motion.

Feyerabend also downplayed Galileo’s use of the principle of inertia, which was a major theoretical advance and central to explaining why we don’t feel the Earth’s motion.

Misuse of Optical Theory in the Case of Galileo’s Telescope

Feyerabend argued that Galileo’s use of the telescope was suspect because Galileo had no good optical theory and thus no firm epistemic ground for trusting what he saw.

His critics say that while Galileo didn’t have a fully developed geometrical optics theory (e.g., no wave theory of light), his empirical testing and calibration of the telescope were rigorous by the standards of the time.

Feyerabend is accused of anachronism – judging Galileo’s knowledge of optics by modern standards and therefore misrepresenting the robustness of his observational claims. Historians like Mario Biagioli and Stillman Drake point out that Galileo cross-verified telescope observations with the naked eye and used repetition, triangulation, and replication by others to build credibility.

Equating All Theories as Rhetorical Equals

Feyerabend in some parts of Against Method claimed that rival theories in the history of science were only judged superior in retrospect, and that even “inferior” theories like astrology or Aristotelian cosmology had equal rational footing at the time.

Historians like Steven Shapin (How to be Antiscientific) and David Wootton (The Invention of Science) say that this relativism erases real differences in how theories were judged even in Galileo’s time. While not elaborated in today’s language, Galileo and his rivals clearly saw predictive power, coherence, and observational support as fundamental criteria for choosing between theories.

Feyerabend’s polemical, theatrical tone often flattened the epistemic distinctions that working scientists and philosophers actually used, especially during the Scientific Revolution. His analysis of “anything goes” often ignored the actual disciplinary practices of science, especially in physics.

Failure to Grasp the Mathematical Structure of Physics

Scientists – those broad enough to know who Feyerabend was – often claim that he misunderstood or ignored the role of mathematics in theory-building, especially in Newtonian mechanics and post-Galilean developments. In Against Method, Feyerabend emphasizes metaphor and persuasion over mathematics. While this critique is valuable when aimed at the rhetorical and political sides of science, it underrates the internal mathematical constraints that shape physical theories, even for Galileo.

Imre Lakatos, his friend and critic, called Feyerabend’s work a form of “intellectual sabotage”, arguing that he distorted both the history and logic of physics.

Misrepresenting Quantum Mechanics

Feyerabend wrote about Bohr and Heisenberg in Philosophical Papers and later essays. Critics like Abner Shimony and Mario Bunge charge that Feyerabend misrepresented or misunderstood Bohr’s complementarity as relativistic, when Bohr’s position was more subtle and aimed at objective constraints on language and measurement.

Feyerabend certainly fails to understand the mathematical formalism underpinning Quantum Mechanics. This weakens his broader claims about theory incommensurability.

Feyerabend’s erroneous critique of Neil’s Bohr is seen in his 1958 Complimentarity:

“Bohr’s point of view may be introduced by saying that it is the exact opposite of [realism]. For Bohr the dual aspect of light and matter is not the deplorable consequence of the absence of a satisfactory theory, but a fundamental feature of the microscopic level. For him the existence of this feature indicates that we have to revise … the [realist] ideal of explanation.” (more on this in an upcoming post)

Epistemic Complaints

Beyond criticisms that he failed to grasp the relevant math and science, Feyerabend is accused of selectively reading or distorting historical episodes to fit the broader rhetorical point that science advances by breaking rules, and that no consistent method governs progress. Feyerabend’s claim that in science “anything goes” can be seen as epistemic relativism, leaving no rational basis to prefer one theory over another or to prefer science over astrology, myth, or pseudoscience.

Critics say Feyerabend blurred the distinction between how theories are argued (rhetoric) and how they are justified (epistemology). He is accused of conflating persuasive strategy with epistemic strength, thereby undermining the very principle of rational theory choice.

Some take this criticism to imply that methodological norms are the sole basis for theory choice. Feyerabend’s “anarchism” may demolish authority, but is anything left in its place except a vague appeal to democratic or cultural pluralism? Norman Levitt and Paul Gross, especially in Higher Superstition: The Academic Left and Its Quarrels with Science (1994), argue this point, along with saying Feyerabend attacked a caricature of science.

Personal note/commentary: In my view, Levitt and Gross did some great work, but Higher Superstition isn’t it. I bought the book shortly after its release because I was disgusted with weaponized academic anti-rationalism, postmodernism, relativism, and anti-science tendencies in the humanities, especially those that claimed to be scientific. I was sympathetic to Higher Superstition’s mission but, on reading it, was put off by its oversimplifications and lack of philosophical depth. Their arguments weren’t much better than those of the postmodernists. Critics of science in the humanities critics overreached and argued poorly, but they were responding to legitimate concerns in the philosophy of science. Specifically:

  • Underdetermination – Two incompatible theories often fit the same data. Why do scientists prefer one over another? As Kuhn argued, social dynamics play a role.
  • Theory-laden Observations – Observations are shaped by prior theory and assumptions, so science is not just “reading the book of nature.”
  • Value-laden Theories – Public health metrics like life expectancy and morbidity (opposed to autonomy or quality of life) trickle into epidemiology.
  • Historical Variability of Consensus – What’s considered rational or obvious changes over time (phlogiston, luminiferous ether, miasma theory).
  • Institutional Interest and Incentives – String theory’s share of limited research funding, climate science in service of energy policy and social agenda.
  • The Problem of Reification – IQ as a measure of intelligence has been reified in policy and education, despite deep theoretical and methodological debates about what it measures.
  • Political or Ideological Capture – Marxist-Leninist science and eugenics were cases where ideology shaped what counted as science.

Higher Superstition and my unexpected negative reaction to it are what brought me to the discipline of History and Philosophy of Science.

Conclusion

Feyerabend exaggerated the uncertainty of early modern science, downplayed the empirical gains Galileo and others made, and misrepresented or misunderstood some of the technical content of physics. His mischievous rhetorical style made it hard to tell where serious argument ended and performance began. Rather than offering a coherent alternative methodology, Feyerabend’s value lay in exposing the fragility and contingency of scientific norms. He made it harder to treat methodological rules as timeless or universal by showing how easily they fracture under the pressure of real historical cases.

In a following post, I’ll review the last piece John Heilbron wrote before he died, Feyerabend, Bohr and Quantum Physics, which appeared in Stefano Gattei’s Feyerabend in Dialogue, a set of essays marking the 100th anniversary of Feyerabend’s birth.

Paul Feyerabend. Photo courtesy of Grazia Borrini-Feyerabend.

, , , , , , , , , , ,

1 Comment

John Heilbron Interview – June 2012

In 2012, I spoke with John Heilbron, historian of science and Professor Emeritus at UC Berkeley, about his career, his work with Thomas Kuhn, and the legacy of The Structure of Scientific Revolutions on its 50th anniversary. We talked late into the night. The conversation covered his shift from physics to history, his encounters with Kuhn and Paul Feyerabend, and his critical take on the direction of Science and Technology Studies (STS).

The interview marked a key moment. Kuhn and Feyerabend’s legacies were under fresh scrutiny, and STS was in the midst of redefining itself, often leaning toward sociological frameworks at the expense of other approaches.

Thirteen years later, in 2025, this commentary revisits that interview to illuminate its historical context, situate Heilbron’s critiques, and explore their relevance to contemporary STS and broader academic debates.

Over more than a decade, I had ongoing conversations with Heilbron about the evolution of the history of science – history of the history of science – and the complex relationship between History of Science and Science, Technology, and Society (STS) programs. At UC Berkeley, unlike at Harvard or Stanford, STS has long remained a “Designated Emphasis” rather than a department or standalone degree. Academic conservatism in departmental structuring, concerns about reputational risk, and questions about the epistemic rigor of STS may all have contributed to this decision. Moreover, Berkeley already boasted world-class departments in both History and Sociology.

That 2012 interview, the only one we recorded, brought together themes we’d explored over many years. Since then, STS has moved closer to engaging with scientific content itself. But it still draws criticism, both from scientists and from public misunderstanding. In 2012, the field was still heavily influenced by sociological models, particularly the Strong Programme and social constructivism, which stressed how scientific knowledge is shaped by social context. One of the key texts in this tradition, Shapin and Schaffer’s Leviathan and the Air-Pump (1985), argued that even Boyle’s experiments weren’t simply about discovery but about constructing scientific consensus.

Heilbron pushed back against this framing. He believed it sidelined the technical and epistemic depth of science, reducing STS to a sociological critique. He was especially wary of the dense, abstract language common in constructivist work. In his view, it often served as cover for thin arguments, especially from younger scholars who copied the style but not the substance. He saw it as a tactic: establish control of the conversation by embedding a set of terms, then build influence from there.

The influence of Shapin and Schaffer, Heilbron argued, created the impression that STS was dominated by a single paradigm, ironically echoing the very Kuhnian framework they analyzed. His frustration with a then-recent Isis review reflected his concern that constructivism had become doctrinaire, pressuring scholars to conform to its methods even when irrelevant to their work. His reference to “political astuteness” pointed to the way in which key figures in the field successfully advanced their terminology and frameworks, gaining disproportionate influence. While this gave them intellectual clout, Heilbron saw it as a double-edged sword: it strengthened their position while encouraging dogmatism among followers who prioritized jargon over genuine analysis.


Bill Storage: How did you get started in this curious interdisciplinary academic realm?

John Heilbron: Well, it’s not really very interesting, but I was a graduate student in physics but my real interest was history. So at some point I went down to the History department and found the medievalist, because I wanted to do medieval history. I spoke with the medievalist ad he said, “well, that’s very charming but you know the country needs physicists and it doesn’t need medievalists, so why don’t you go back to physics.” Which I duly did. But he didn’t bother to point out that there was this guy Kuhn in the History department who had an entirely different take on the subject than he did. So finally I learned about Kuhn and went to see him. Since Kuhn had very few students, I looked good; and I gradually I worked my way free from the Physics department and went into history. My PhD is in History; and I took a lot history courses and, as I said, history really is my interest. I’m interested in science too of course but I feel that my major concerns are historical and the writing of history is to me much more interesting and pleasant than calculations.

You entered that world at a fascinating time, when history of science – I’m sure to the surprise of most of its scholars – exploded onto the popular scene. Kuhn, Popper, Feyerabend and Lakatos suddenly appeared in The New Yorker, Life Magazine, and The Christian Century. I find that these guys are still being read, misread and misunderstood by many audiences. And that seems to be true even for their intended audiences – sometimes by philosophers and historians of science – certainly by scientists. I see multiple conflicting readings that would seem to show that at least some of them are wrong.

Well if you have two or more different readings then I guess that’s a safe conclusion. (Laughs.)

You have a problem with multiple conflicting truths…? Anyway – misreading Kuhn…

I’m more familiar with the misreading of Kuhn than of the others. I’m familiar with that because he was himself very distressed by many of the uses made of his work – particularly the notion that science is no different from art or has no stronger basis than opinion. And that bothered him a lot.

I don’t know your involvement in his work around that time. Can you tell me how you relate to what he was doing in that era?

I got my PhD under him. In fact my first work with him was hunting up footnotes for Structure. So I knew the text of the final draft well – and I knew him quite well during the initial reception of it. And then we all went off together to Copenhagen for a physics project and we were all thrown together a lot. So that was my personal connection and then of course I’ve been interested subsequently in Structure, as everybody is bound to be in my line of work. So there’s no doubt, as he says so in several places, that he was distressed by the uses made of it. And that includes uses made in the history of science particularly by the social constructionists, who try to do without science altogether or rather just to make it epiphenomenal on political or social forces.

I’ve read opinions by others who were connected with Kuhn saying there was a degree of back-peddling going by Kuhn in the 1970s. The implication there is that he really did intend more sociological commentary than he later claimed. Now I don’t see evidence of that in the text of Structure, and incidents like his telling Freeman Dyson that he (Kuhn) was not a Kuhnian would suggest otherwise. Do you have any thoughts on that?

I think that one should keep in mind the purpose of Structure, or rather the context in which it was produced. It was supposed to have been an article in this encyclopedia of unified science and Kuhn’s main interest was in correcting philosophers. He was not aiming for historians even. His message was that the philosophy practiced by a lot of positivists and their description of science was ridiculous because it didn’t pay any attention to the way science was actually done. So Kuhn was going to tell them how science was done, in order to correct philosophy. But then much to his surprise he got picked up by people for whom it was not written, who derived from it the social constructionist lesson that we’re all familiar with. And that’s why he was an unexpected rebel. But he did expect to be rebellious; that was the whole point. It’s just that the object of his rebellion was not history or science but philosophy.

So in that sense it would seem that Feyerabend’s question on whether Kuhn intended to be prescriptive versus descriptive is answered. It was not prescriptive.

Right – not prescriptive to scientists. But it was meant to be prescriptive to the philosophers – or at least normalizing – so that they would stop being silly and would base their conception of scientific progress on the way in which scientists actually went about their business. But then the whole thing got too big for him and he got into things that, in my opinion, really don’t have anything to do with his main argument. For example, the notion of incommensurability, which was not, it seems to me, in the original program. And it’s a logical construct that I don’t think is really very helpful, and he got quite hung up on that and seemed to regard that as the most important philosophical message from Structure.

I wasn’t aware that he saw it that way. I’m aware that quite a few others viewed it like that. Paul Feyerabend, in one of his last books, said that he and Kuhn kicked around this idea of commensurability in 1960 and had slightly different ideas about where to go with it. Feyerabend said Kuhn wanted to use it historically whereas his usage was much more abstract. I was surprised at the level of collaboration indicated by Feyerabend.

Well they talked a lot. They were colleagues. I remember parties at Kuhn’s house where Feyerabend would show up with his old white T shirt and several women – but that’s perhaps irrelevant to the main discussion. They were good friends. I got along quite well with Feyerabend too. We had discussions about the history of quantum physics and so on. The published correspondence between Feyerabend and Lakatos is relevant here. It’s rather interesting in that the person we’ve left out of the discussion so far, Karl Popper, was really the lighthouse for Feyerabend and Lakatos, but not for Kuhn. And I think that anybody who wants to get to the bottom of the relationship between Kuhn and Feyerabend needs to consider the guy out of the frame, who is Popper.

It appears Feyerabend was very critical of Kuhn and Structure at the time it was published. I think at that point Feyerabend was still essentially a Popperian. It seems Feyerabend reversed position on that over the next decade or so.

JH: Yes, at the time in question, around 1960, when they had these discussions, I think Feyerabend was still very much in Popper’s camp. Of course like any bright student, he disagreed with his professor about things.

How about you, as a bright student in 1960 – what did you disagree with your professor, Kuhn, about?

Well I believe in the proposition that philosophers and historians have different metabolisms. And I’m metabolically a historian and Kuhn was metabolically a philosopher – even though he did write history. But his most sustained piece of history of science was his book on black body theory; and that’s very narrowly intellectualist in approach. It’s got nothing to do with the themes of the structure of scientific revolutions – which does have something to say for the historian – but he was not by practice a historian. He didn’t like a whole lot of contingent facts. He didn’t like archival and library work. His notion of fun was take a few texts and just analyze and reanalyze them until he felt he had worked his way into the mind of their author. I take that to be a necromantic feat that’s not really possible.

I found that he was a very clever guy and he was excellent as a professor because he was very interested in what you were doing as soon it was something he thought he could make some use of. And that gave you the idea that you were engaged in something important, so I must give him that. On the other hand he just didn’t have the instincts or the knowledge to be a historian and so I found myself not taking much from his own examples. Once I had an argument with him about some way of treating a historical subject and I didn’t feel that I got anything out of him. Quite the contrary; I thought that he just ducked all the interesting issues. But that was because they didn’t concern him.

James Conant, president of Harvard who banned communists, chair of the National Science Foundation, etc.: how about Conant’s influence on Structure?

It’s not just Conant. It was the whole Harvard circle, of which Kuhn was part. There was this guy, Leonard Nash; there was Gerald Holton. And these guys would get together and l talk about various things having to do with the relationship between science and the public sphere. It was a time when Conant was fighting for the National Science Foundation and I think that this notion of “normal science” in which the scientists themselves must be left fully in charge of what they’re doing in order to maximize the progress within the paradigm to bring the profession swiftly to the next revolution – that this is essentially the Conant doctrine with respect to the ground rules of the National Science Foundation, which is “let the scientists run it.” So all those things were discussed. And you can find many bits of Kuhn’s Structure in that discussion. For example, the orthodoxy of normal science in, say, Bernard Cohen, who didn’t make anything of it of course. So there’s a lot of this Harvard group in Structure, as well as certain lessons that Kuhn took from his book on the Copernican Revolution, which was the textbook for the course he gave under Conant. So yes, I think Conant’s influence is very strong there.

So Kuhn was ultimately a philosopher where you are a historian. I think I once heard you say that reading historical documents does not give you history.

Well I agree with that, but I don’t remember that I was clever enough to say it.

Assuming you said it or believe it, then what does give you history?

Well, reading them is essential, but the part contributed by the historian is to make some sense of all the waste paper he’s been reading. This is essentially a construction. And that’s where the art, the science, the technique of the historian comes into play, to try to make a plausible narrative that has to satisfy certain rules. It can’t go against the known facts and it can’t ignore the new facts that have come to light through the study of this waste paper, and it can’t violate rules of verisimilitude, human action and whatnot. But otherwise it’s a construction and you’re free to manipulate your characters, and that’s what I like about it.

So I take it that’s where the historian’s metabolism comes into play – avoidance of leaping to conclusions with the facts.

True, but at some point you’ve got to make up a story about those facts.

Ok, I’ve got a couple questions on the present state of affairs – and this is still related to the aftermath of Kuhn. From attending colloquia, I sense that STS is nearly a euphemism for sociology of science. That bothers me a bit, possibly because I’m interested in the intersection of science, technology and society. Looking at the core STS requirements on Stanford’s website, I see few courses listed that would give a student any hint of what science looks like from the inside.

I’m afraid you’re only too right. I’ve got nothing against sociology of science, the study of scientific institutions, etc. They’re all very good. But they’re tending to leave the science out, and in my opinion, the further they get from science, the worse their arguments become. That’s what bothers me perhaps most of all – the weakness of the evidentiary base of many of the arguments and conclusions that are put forward.

I thought we all learned a bit from the Science Wars – thought that sort of indeterminacy of meaning and obfuscatory language was behind us. Either it’s back, or it never went away.

Yeah, the language part is an important aspect of it, and even when the language is relatively comprehensible as I think it is in, say, constructivist history of science – by which I mean the school of Schaffer and Shapin – the insistence on peculiar argot becomes a substitute for thought. You see it quite frequently in people less able than those two guys are, who try to follow in their footsteps. You get words strung together supposedly constituting an argument but which in fact don’t. I find that quite an interesting aspect of the business, and very astute politically on the part of those guys because if you can get your words into the discourse, why, you can still hope to have influence. There’s a doctrinaire aspect to it. I was just reading the current ISIS favorable book review by one of the fellow travelers of this group. The book was not written by one of them. The review was rather complimentary but then at the end says it is a shame that this author did not discuss her views as related to Schaffer and Shapin. Well, why the devil should she? So, yes, there’s issues of language, authority, and poor argumentation. STS is afflicted by this, no doubt.


John Heilbron and I at The Huntington in 2014

, , , , ,

Leave a comment

Covid Response – Signs of Statistical Success

In a recent post, I suggested that the Covid response demonstrated success in several areas of statistical reasoning, including clear communication of mRNA vaccine efficacy, data-driven ICU triage using the SOFA score, and the use of wastewater epidemiology. The following points support this claim.

Risk Communication in Vaccine Trials (1)
The early mRNA vaccine announcements in 2020 offered clear statistical framing by emphasizing a 95% relative risk reduction in symptomatic Covid for vaccinated individuals compared to placebo, sidelining raw case counts for a punchy headline. While clearer than many public health campaigns, this focus omitted absolute risk reduction and uncertainties about asymptomatic spread, falling short of the full precision needed to avoid misinterpretation.

Pfizer/BioNTech’s November 18, 2020, press release announced a 95% efficacy for its mRNA vaccine (BNT162b2) in preventing symptomatic Covid-19, based on 170 cases (162 in the placebo group, 8 in the vaccinated group) in a trial of ~43,538 participants. Moderna’s November 16, 2020, press release reported a 94.5% efficacy for its mRNA vaccine (mRNA-1273), based on 95 cases (90 placebo, 5 vaccinated) in a 30,000-participant trial. Both highlighted relative risk reduction (RRR) as the primary metric. For Pfizer, placebo risk was ~0.88% (162/18,325), vaccinated risk was ~0.04% (8/18,198), yielding ~95% RRR.

The focus omitted absolute risk reduction (ARR), as described by Brown in Outcome Reporting Bias in COVID mRNA Vaccine Clinical Trials. ARR is the difference in event rates between placebo and vaccinated groups. For Pfizer, placebo risk was ~0.88% (162/18,325), vaccinated risk was ~0.04% (8/18,198), giving an ARR of ~0.84%. Moderna’s ARR was ~0.6% (90/15,000 = 0.6% placebo risk, 5/15,000 = 0.03% vaccinated risk). Neither Pfizer’s nor Moderna’s November 2020 press releases mentioned ARR, focusing solely on RRR. The NEJM publications (Polack, 2020; Baden, 2021) reported RRR and case counts but not ARR explicitly. Both CDC and WHO messaging in 2020 emphasized efficacy rates, not ARR (e.g., CDC’s “Vaccine Effectiveness,” December 2020).

The focus omitted uncertainties about asymptomatic spread, as described by Oran & Topol Prevalence of Asymptomatic SARS-CoV-2 Infection (2020). Pfizer and Moderna trials primarily measured efficacy against symptomatic Covid, with no systematic testing for asymptomatic infections in initial protocols. Pfizer later included N-antibody testing for a subset, but this was not reported in November 2020. Studies (e.g., Oran & Topol, 2020) estimated 40-50% of infections were asymptomatic, but vaccine effects on this were unknown. A CDC report (December 2020) noted uncertainty about transmission.

While generally positive, framing fell short of the precision needed to avoid misinterpretation. The RRR focus without ARR or baseline risk context could exaggerate benefits. High-visibility figures like Bill Gates amplified vaccine optimism, fostering overconfidence in transmission control. For Pfizer, a 95% RRR contrasted with a 0.84% ARR, which was less emphasized. The lack of clarity about transmission led to public misconceptions, with surveys (e.g., Kaiser Family Foundation, January 2021) showing that many people believed vaccines would prevent transmission.

Clinical Triage via Quantitative Models (2)
During peak ICU shortages, hospitals adopted the SOFA score, originally a tool for assessing organ dysfunction, to guide resource allocation with a semi-objective, data-driven approach. While an improvement over ad hoc clinical judgment, SOFA faced challenges like inconsistent application and biases that disadvantaged older or chronically ill patients, limiting its ability to achieve fully equitable triage.

The SOFA score, developed to assess organ dysfunction in critically ill patients, was widely adopted during the Covid pandemic to guide ICU triage and resource allocation in hospitals facing overwhelming demand. Studies and guidelines from 2020–2022 document its use.

Several articles described the incorporation of SOFA scores were incorporated into triage protocols in hospitals in New York, Italy, and Spain to prioritize patients for ventilators and ICU beds, e.g., Fair allocation of scarce medical resources in the time of Covid (NEJM), Adult ICU triage during the Covid pandemic (Lancet), and A framework for rationing ventilators… (Critical Care Medicine).

A 2022 study in Critical Care reported variability in how SOFA was implemented, with some hospitals modifying the scoring criteria or weighting certain organ systems differently, leading to discrepancies in patient prioritization (Maves, 2022). A 2021 analysis in BMJ Open found that SOFA’s application varied due to differences in clinician training, data availability (e.g., incomplete lab results), and local protocol adaptations, which undermined its reliability in some settings (Cook, 2021).

Still, the SOFA score’s design and application introduced biases that disproportionately disadvantaged older adults and patients with chronic illnesses. A 2020 study in The Lancet pointed out that SOFA scores often penalize patients with pre-existing organ dysfunction, as baseline comorbidities (common in older or chronically ill patients) result in higher scores, suggesting worse outcomes even if acute illness was treatable (Grasselli, 2020). A 2021 article in JAMA Internal Medicine criticized SOFA-based triage for its lack of adjustment for age or chronic conditions, noting that older patients were frequently deprioritized due to higher baseline SOFA scores, even when their acute prognosis was favorable (Wunsch, 2021).

Wastewater Epidemiology (3)
Public health researchers used viral RNA in wastewater to monitor community spread, reducing the sampling biases of clinical testing. This statistical surveillance, conducted outside clinics, offered high public health relevance but faced biases and interpretive challenges that tempered its precision.

Wastewater-based epidemiology (WBE) emerged as a critical tool during the Covid pandemic to monitor SARS-CoV-2 RNA in wastewater, providing a population-level snapshot of viral prevalence. Infected individuals, including symptomatic, asymptomatic, and presymptomatic cases, shed viral RNA in their feces, which is detectable in wastewater, enabling community-wide surveillance.

The Centers for Disease Control and Prevention (CDC) launched the National Wastewater Surveillance System (NWSS) in September 2020 to coordinate tracking of SARS-CoV-2 in wastewater across the U.S., transforming local efforts into a national system. A 2020 study in Nature Biotechnology demonstrated that SARS-CoV-2 RNA concentrations in primary sewage sludge in New Haven, Connecticut, tracked the rise and fall of clinical cases and hospital admissions, confirming WBE’s ability to monitor community spread. Similarly, a 2021 study in Scientific Reports monitored SARS-CoV-2 RNA in wastewater from Frankfurt, Germany, showing correlations with reported cases.

Globally, WBE was applied in countries like India, Australia, and the Netherlands, with a 2021 systematic review in ScienceDirect reporting SARS-CoV-2 detection in 29.2% of 26,197 wastewater samples across 34 countries. These studies highlight WBE’s scalability but also underscore challenges in standardizing methods across diverse settings, which could affect data reliability.

Clinical testing for SARS-CoV-2 exposed biases, including selective sampling, testing fatigue, and underreporting from home-based rapid tests. WBE mitigates these by capturing viral RNA from entire communities, including asymptomatic and untested individuals. A 2021 article in Clinical Microbiology Reviews noted that WBE avoids selective population sampling biases, as it does not depend on individuals seeking testing or healthcare access. Daily wastewater sampling provides data comparable to random testing of hundreds of individuals, but is more cost-effective and less invasive.

In practice, WBE’s ability to detect viral RNA in wastewater from diverse populations was demonstrated in settings like university dormitories, where early detection prompted targeted clinical testing.

Next time, I’ll explain why I believe several other aspects of statistical reasoning in the Covid response were poorly handled, some even deeply flawed.

, , ,

3 Comments

Statistical Reasoning in Healthcare: Lessons from Covid-19

For centuries, medicine has navigated the tension between science and uncertainty. The Covid pandemic exposed this dynamic vividly, revealing both the limits and possibilities of statistical reasoning. From diagnostic errors to vaccine communication, the crisis showed that statistics is not just a technical skill but a philosophical challenge, shaping what counts as knowledge, how certainty is conveyed, and who society trusts.

Historical Blind Spot

Medicine’s struggle with uncertainty has deep roots. In antiquity, Galen’s reliance on reasoning over empirical testing set a precedent for overconfidence insulated by circular logic. If his treatments failed, it was because the patient was incurable. Enlightenment physicians, like those who bled George Washington to death, perpetuated this resistance to scrutiny. Voltaire wrote, “The art of medicine consists in amusing the patient while nature cures the disease.” The scientific revolution and the Enlightenment inverted Galen’s hierarchy, yet the importance of that reversal is often neglected, even by practitioners. Even in the 20th century, pioneers like Ernest Codman faced ostracism for advocating outcome tracking, highlighting a medical culture that prized prestige over evidence. While evidence-based practice has since gained traction, a statistical blind spot persists, rooted in training and tradition.

The Statistical Challenge

Physicians often struggle with probabilistic reasoning, as shown in a 1978 Harvard study where only 18% correctly applied Bayes’ Theorem to a diagnostic test scenario (a disease with 1/1,000 prevalence and a 5% false positive rate yields a ~2% chance of disease given a positive test). A 2013 follow-up showed marginal improvement (23% correct). Medical education, which prioritizes biochemistry over probability, is partly to blame. Abusive lawsuits, cultural pressures for decisiveness, and patient demands for certainty further discourage embracing doubt, as Daniel Kahneman’s work on overconfidence suggests.

Neil Ferguson and the Authority of Statistical Models

Epidemiologist Neil Ferguson and his team at Imperial College London produced a model in March 2020 predicting up to 500,000 UK deaths without intervention. The US figure could top 2 million. These weren’t forecasts in the strict sense but scenario models, conditional on various assumptions about disease spread and response.

Ferguson’s model was extraordinarily influential, shifting the UK and US from containment to lockdown strategies. It also drew criticism for opaque code, unverified assumptions, and the sheer weight of its political influence. His eventual resignation from the UK’s Scientific Advisory Group for Emergencies (SAGE) over a personal lockdown violation further politicized the science.

From the perspective of history of science, Ferguson’s case raises critical questions: When is a model scientific enough to guide policy? How do we weigh expert uncertainty under crisis? Ferguson’s case shows that modeling straddles a line between science and advocacy. It is, in Kuhnian terms, value-laden theory.

The Pandemic as a Pedagogical Mirror

The pandemic was a crucible for statistical reasoning. Successes included the clear communication of mRNA vaccine efficacy (95% relative risk reduction) and data-driven ICU triage using the SOFA score, though both had limitations. Failures were stark: clinicians misread PCR test results by ignoring pre-test probability, echoing the Harvard study’s findings, while policymakers fixated on case counts over deaths per capita. The “6-foot rule,” based on outdated droplet models, persisted despite disconfirming evidence, reflecting resistance to updating models, inability to apply statistical insights, and institutional inertia. Specifics of these issues are revealing.

Mostly Positive Examples:

  • Risk Communication in Vaccine Trials (1)
    The early mRNA vaccine announcements in 2020 offered clear statistical framing by emphasizing a 95% relative risk reduction in symptomatic COVID-19 for vaccinated individuals compared to placebo, sidelining raw case counts for a punchy headline. While clearer than many public health campaigns, this focus omitted absolute risk reduction and uncertainties about asymptomatic spread, falling short of the full precision needed to avoid misinterpretation.

  • Clinical Triage via Quantitative Models (2)
    During peak ICU shortages, hospitals adopted the SOFA score, originally a tool for assessing organ dysfunction, to guide resource allocation with a semi-objective, data-driven approach. While an improvement over ad hoc clinical judgment, SOFA faced challenges like inconsistent application and biases that disadvantaged older or chronically ill patients, limiting its ability to achieve fully equitable triage.

  • Wastewater Epidemiology (3)
    Public health researchers used viral RNA in wastewater to monitor community spread, reducing the sampling biases of clinical testing. This statistical surveillance, conducted outside clinics, offered high public health relevance but faced biases and interpretive challenges that tempered its precision.

Mostly Negative Examples:

  • Misinterpretation of Test Results (4)
    Early in the COVID-19 pandemic, many clinicians and media figures misunderstood diagnostic test accuracy, misreading PCR and antigen test results by overlooking pre-test probability. This caused false reassurance or unwarranted alarm, though some experts mitigated errors with Bayesian reasoning. This was precisely the type of mistake highlighted in the Harvard study decades earlier.

  • Cases vs. Deaths (5)
    One of the most persistent statistical missteps during the pandemic was the policy focus on case counts, devoid of context. Case numbers ballooned or dipped not only due to viral spread but due to shifts in testing volume, availability, and policies. COVID deaths per capita rather than case count would have served as a more stable measure of public health impact. Infection fatality rates would have been better still.

  • Shifting Guidelines and Aerosol Transmission (6)
    The “6-foot rule” was based on outdated models of droplet transmission. When evidence of aerosol spread emerged, guidance failed to adapt. Critics pointed out the statistical conservatism in risk modeling, its impact on mental health and the economy. Institutional inertia and politics prevented vital course corrections.

(I’ll defend these six examples in another post.)

A Philosophical Reckoning

Statistical reasoning is not just a mathematical tool – it’s a window into how science progresses, how it builds trust, and its special epistemic status. In Kuhnian terms, the pandemic exposed the fragility of our current normal science. We should expect methodological chaos and pluralism within medical knowledge-making. Science during COVID-19 was messy, iterative, and often uncertain – and that’s in some ways just how science works.

This doesn’t excuse failures in statistical reasoning. It suggests that training in medicine should not only include formal biostatistics, but also an eye toward history of science – so future clinicians understand the ways that doubt, revision, and context are intrinsic to knowledge.

A Path Forward

Medical education must evolve. First, integrate Bayesian philosophy into clinical training, using relatable case studies to teach probabilistic thinking. Second, foster epistemic humility, framing uncertainty as a strength rather than a flaw. Third, incorporate the history of science – figures like Codman and Cochrane – to contextualize medicine’s empirical evolution. These steps can equip physicians to navigate uncertainty and communicate it effectively.

Conclusion

Covid was a lesson in the fragility and potential of statistical reasoning. It revealed medicine’s statistical struggles while highlighting its capacity for progress. By training physicians to think probabilistically, embrace doubt, and learn from history, medicine can better manage uncertainty – not as a liability, but as a cornerstone of responsible science. As John Heilbron might say, medicine’s future depends not only on better data – but on better historical memory, and the nerve to rethink what counts as knowledge.


______

All who drink of this treatment recover in a short time, except those whom it does not help, all of whom die. It is obvious, therefore, that it fails only in incurable cases. – Galen

, , , ,

4 Comments

Extraordinary Popular Miscarriages of Science, Part 6 – String Theory

Introduction: A Historical Lens on String Theory

In 2006, I met John Heilbron, widely credited with turning the history of science from an emerging idea into a professional academic discipline. While James Conant and Thomas Kuhn laid the intellectual groundwork, it was Heilbron who helped build the institutions and frameworks that gave the field its shape. Through John I came to see that the history of science is not about names and dates – it’s about how scientific ideas develop, and why. It explores how science is both shaped by and shapes its cultural, social, and philosophical contexts. Science progresses not in isolation but as part of a larger human story.

The “discovery” of oxygen illustrates this beautifully. In the 18th century, Joseph Priestley, working within the phlogiston theory, isolated a gas he called “dephlogisticated air.” Antoine Lavoisier, using a different conceptual lens, reinterpreted it as a new element – oxygen – ushering in modern chemistry. This was not just a change in data, but in worldview.

When I met John, Lee Smolin’s The Trouble with Physics had just been published. Smolin, a physicist, critiques string theory not from outside science but from within its theoretical tensions. Smolin’s concerns echoed what I was learning from the history of science: that scientific revolutions often involve institutional inertia, conceptual blind spots, and sociopolitical entanglements.

My interest in string theory wasn’t about the physics. It became a test case for studying how scientific authority is built, challenged, and sustained. What follows is a distillation of 18 years of notes – string theory seen not from the lab bench, but from a historian’s desk.

A Brief History of String Theory

Despite its name, string theory is more accurately described as a theoretical framework – a collection of ideas that might one day lead to testable scientific theories. This alone is not a mark against it; many scientific developments begin as frameworks. Whether we call it a theory or a framework, it remains subject to a crucial question: does it offer useful models or testable predictions – or is it likely to in the foreseeable future?

String theory originated as an attempt to understand the strong nuclear force. In 1968, Gabriele Veneziano introduced a mathematical formula – the Veneziano amplitude – to describe the scattering of strongly interacting particles such as protons and neutrons. By 1970, Pierre Ramond incorporated supersymmetry into this approach, giving rise to superstrings that could account for both fermions and bosons. In 1974, Joël Scherk and John Schwarz discovered that the theory predicted a massless spin-2 particle with the properties of the hypothetical graviton. This led them to propose string theory not as a theory of the strong force, but as a potential theory of quantum gravity – a candidate “theory of everything.”

Around the same time, however, quantum chromodynamics (QCD) successfully explained the strong force via quarks and gluons, rendering the original goal of string theory obsolete. Interest in string theory waned, especially given its dependence on unobservable extra dimensions and lack of empirical confirmation.

That changed in 1984 when Michael Green and John Schwarz demonstrated that superstring theory could be anomaly-free in ten dimensions, reviving interest in its potential to unify all fundamental forces and particles. Researchers soon identified five mathematically consistent versions of superstring theory.

To reconcile ten-dimensional theory with the four-dimensional spacetime we observe, physicists proposed that the extra six dimensions are “compactified” into extremely small, curled-up spaces – typically represented as Calabi-Yau manifolds. This compactification allegedly explains why we don’t observe the extra dimensions.

In 1995, Edward Witten introduced M-theory, showing that the five superstring theories were different limits of a single 11-dimensional theory. By the early 2000s, researchers like Leonard Susskind and Shamit Kachru began exploring the so-called “string landscape” – a space of perhaps 10^500 (1 followed by 500 zeros) possible vacuum states, each corresponding to a different compactification scheme. This introduced serious concerns about underdetermination – the idea that available empirical evidence cannot determine which among many competing theories is correct.

Compactification introduces its own set of philosophical problems. Critics Lee Smolin and Peter Woit argue that compactification is not a prediction but a speculative rationalization: a move designed to save a theory rather than derive consequences from it. The enormous number of possible compactifications (each yielding different physics) makes string theory’s predictive power virtually nonexistent. The related challenge of moduli stabilization – specifying the size and shape of the compact dimensions – remains unresolved.

Despite these issues, string theory has influenced fields beyond high-energy physics. It has informed work in cosmology (e.g., inflation and the cosmic microwave background), condensed matter physics, and mathematics (notably algebraic geometry and topology). How deep and productive these connections run is difficult to assess without domain-specific expertise that I don’t have. String theory has, in any case, produced impressive mathematics. But mathematical fertility is not the same as scientific validity.

The Landscape Problem

Perhaps the most formidable challenge string theory faces is the landscape problem: the theory allows for an enormous number of solutions – on the order of 10^500. Each solution represents a possible universe, or “vacuum,” with its own physical constants and laws.

Why so many possibilities? The extra six dimensions required by string theory can be compactified in myriad ways. Each compactification, combined with possible energy configurations (called fluxes), gives rise to a distinct vacuum. This extreme flexibility means string theory can, in principle, accommodate nearly any observation. But this comes at the cost of predictive power.

Critics argue that if theorists can forever adjust the theory to match observations by choosing the right vacuum, the theory becomes unfalsifiable. On this view, string theory looks more like metaphysics than physics.

Some theorists respond by embracing the multiverse interpretation: all these vacua are real, and our universe is just one among many. The specific conditions we observe are then attributed to anthropic selection – we could only observe a universe that permits life like us. This view aligns with certain cosmological theories, such as eternal inflation, in which different regions of space settle into different vacua. But eternal inflation can exist independent of string theory, and none of this has been experimentally confirmed.

The Problem of Dominance

Since the 1980s, string theory has become a dominant force in theoretical physics. Major research groups at Harvard, Princeton, and Stanford focus heavily on it. Funding and institutional prestige have followed. Prominent figures like Brian Greene have elevated its public profile, helping transform it into both a scientific and cultural phenomenon.

This dominance raises concerns. Critics such as Smolin and Woit argue that string theory has crowded out alternative approaches like loop quantum gravity or causal dynamical triangulations. These alternatives receive less funding and institutional support, despite offering potentially fruitful lines of inquiry.

In The Trouble with Physics, Smolin describes a research culture in which dissent is subtly discouraged and young physicists feel pressure to align with the mainstream. He worries that this suppresses creativity and slows progress.

Estimates suggest that between 1,000 and 5,000 researchers work on string theory globally – a significant share of theoretical physics resources. Reliable numbers are hard to pin down.

Defenders of string theory argue that it has earned its prominence. They note that theoretical work is relatively inexpensive compared to experimental research, and that string theory remains the most developed candidate for unification. Still, the issue of how science sets its priorities – how it chooses what to fund, pursue, and elevate – remains contentious.

Wolfgang Lerche of CERN once called string theory “the Stanford propaganda machine working at its fullest.” As with climate science, 97% of string theorists agree that they don’t want to be defunded.

Thomas Kuhn’s Perspective

The logical positivists and Karl Popper would almost certainly dismiss string theory as unscientific due to its lack of empirical testability and falsifiability – core criteria in their respective philosophies of science. Thomas Kuhn would offer a more nuanced interpretation. He wouldn’t label string theory unscientific outright, but would express concern over its dominance and the marginalization of alternative approaches. In Kuhn’s framework, such conditions resemble the entrenchment of a paradigm during periods of normal science, potentially at the expense of innovation.

Some argue that string theory fits Kuhn’s model of a new paradigm, one that seeks to unify quantum mechanics and general relativity – two pillars of modern physics that remain fundamentally incompatible at high energies. Yet string theory has not brought about a Kuhnian revolution. It has not displaced existing paradigms, and its mathematical formalism is often incommensurable with traditional particle physics. From a Kuhnian perspective, the landscape problem may be seen as a growing accumulation of anomalies. But a paradigm shift requires a viable alternative – and none has yet emerged.

Lakatos and the Degenerating Research Program

Imre Lakatos offered a different lens, seeing science as a series of research programs characterized by a “hard core” of central assumptions and a “protective belt” of auxiliary hypotheses. A program is progressive if it predicts novel facts; it is degenerating if it resorts to ad hoc modifications to preserve the core.

For Lakatos, string theory’s hard core would be the idea that all particles are vibrating strings and that the theory unifies all fundamental forces. The protective belt would include compactification schemes, flux choices, and moduli stabilization – all adjusted to fit observations.

Critics like Sabine Hossenfelder argue that string theory is a degenerating research program: it absorbs anomalies without generating new, testable predictions. Others note that it is progressive in the Lakatosian sense because it has led to advances in mathematics and provided insights into quantum gravity. Historians of science are divided. Johansson and Matsubara (2011) argue that Lakatos would likely judge it degenerating; Cristin Chall (2019) offers a compelling counterpoint.

Perhaps string theory is progressive in mathematics but degenerating in physics.

The Feyerabend Bomb

Paul Feyerabend, who Lee Smolin knew from his time at Harvard, was the iconoclast of 20th-century philosophy of science. Feyerabend would likely have dismissed string theory as a dogmatic, aesthetic fantasy. He might write something like:

String theory dazzles with equations and lulls physics into a trance. It’s a mathematical cathedral built in the sky, a triumph of elegance over experience. Science flourishes in rebellion. Fund the heretics.”

Even if this caricature overshoots, Feyerabend’s tools offer a powerful critique:

  1. Untestability: String theory’s predictions remain out of reach. Its core claims – extra dimensions, compactification, vibrational modes – cannot be tested with current or even foreseeable technology. Feyerabend challenged the privileging of untested theories (e.g., Copernicanism in its early days) over empirically grounded alternatives.

  2. Monopoly and suppression: String theory dominates intellectual and institutional space, crowding out alternatives. Eric Weinstein recently said, in Feyerabendian tones, “its dominance is unjustified and has resulted in a culture that has stifled critique, alternative views, and ultimately has damaged theoretical physics at a catastrophic level.”

  3. Methodological rigidity: Progress in string theory is often judged by mathematical consistency rather than by empirical verification – an approach reminiscent of scholasticism. Feyerabend would point to Johannes Kepler’s early attempt to explain planetary orbits using a purely geometric model based on the five Platonic solids. Kepler devoted 17 years to this elegant framework before abandoning it when observational data proved it wrong.

  4. Sociocultural dynamics: The dominance of string theory stems less from empirical success than from the influence and charisma of prominent advocates. Figures like Brian Greene, with their public appeal and institutional clout, help secure funding and shape the narrative – effectively sustaining the theory’s privileged position within the field.

  5. Epistemological overreach: The quest for a “theory of everything” may be misguided. Feyerabend would favor many smaller, diverse theories over a single grand narrative.

Historical Comparisons

Proponents say other landmark theories emerging from math predated their experimental confirmation. They compare string theory to historical cases. Examples include:

  1. Planet Neptune: Predicted by Urbain Le Verrier based on irregularities in Uranus’s orbit, observed in 1846.
  2. General Relativity: Einstein predicted the bending of light by gravity in 1915, confirmed by Arthur Eddington’s 1919 solar eclipse measurements.
  3. Higgs Boson: Predicted by the Standard Model in the 1960s, observed at the Large Hadron Collider in 2012.
  4. Black Holes: Predicted by general relativity, first direct evidence from gravitational waves observed in 2015.
  5. Cosmic Microwave Background: Predicted by the Big Bang theory (1922), discovered in 1965.
  6. Gravitational Waves: Predicted by general relativity, detected in 2015 by the Laser Interferometer Gravitational-Wave Observatory (LIGO).

But these examples differ in kind. Their predictions were always testable in principle and ultimately tested. String theory, in contrast, operates at the Planck scale (~10^19 GeV), far beyond what current or foreseeable experiments can reach.

Special Concern Over Compactification

A concern I have not seen discussed elsewhere – even among critics like Smolin or Woit – is the epistemological status of compactification itself. Would the idea ever have arisen apart from the need to reconcile string theory’s ten dimensions with the four-dimensional spacetime we experience?

Compactification appears ad hoc, lacking grounding in physical intuition. It asserts that dimensions themselves can be small and curled – yet concepts like “small” and “curled” are defined within dimensions, not of them. Saying a dimension is small is like saying that time – not a moment in time, but time itself – can be “soon” or short in duration. It misapplies the very conceptual framework through which such properties are understood. At best, it’s a strained metaphor; at worst, it’s a category mistake and conceptual error.

This conceptual inversion reflects a logical gulf that proponents overlook or ignore. They say compactification is a mathematical consequence of the theory, not a contrivance. But without grounding in physical intuition – a deeper concern than empirical support – compactification remains a fix, not a forecast.

Conclusion

String theory may well contain a correct theory of fundamental physics. But without any plausible route to identifying it, string theory as practiced is bad science. It absorbs talent and resources, marginalizes dissent, and stifles alternative research programs. It is extraordinarily popular – and a miscarriage of science.

, , , , , ,

3 Comments

Intertemporal Choice, Delayed Gratification and Empty Marshmallow Promises

Everyone knows about the marshmallow test. Kids were given a marshmallow and told that they’d get a second one if they resisted eating the first one for a while. The experimenter then left the room and watched the kids endure marshmallow temptation. Years later, the kids who had been able to fight temptation were found to have higher SAT scores, better jobs, less addiction, and better physical fitness than those who succumbed. The meaning was clear; early self control, whether innate or taught, is key to later success. The test results and their interpretation were, scientifically speaking, too good to be true. And in most ways they weren’t true.

That wrinkle doesn’t stop the marshmallow test from being trotted out weekly on LinkedIn and social sites where experts and moralists opine. That trotting out comes with behavioral economics lessons, dripping with references to Kahnemann, Ariely and the like about our irrationality as we face intertemporal choices, as they’re known in the trade. When adults choose an offer of $1000 today over an offer for $1400 to be paid in one year, even when they have no pressing financial need, they are deemed irrational or lacking self control, like the marshmallow kids.

The famous marshmallow test was done by Walter Mischel in the 1960s through 1980s. Not only did subsequent marshmallow tests fail to show as much correlation between not waiting for the second marshmallow and a better life, but, more importantly, similar tests for at least twenty years have pointed to a more salient result, one which Mischel was aware of, but which got lost in popular retelling. Understanding the deeper implications of the marshmallow tests, along with a more charitable view of kids who grabbed the early treat, requires digging down into the design of experiments, Bayesian reasoning, and the concept of risk neutrality.

Intertemporal choice tests like the marshmallow test involve choices between options that involve different payoffs at different times. We face these choices often. And when we face them in the real world, our decision process is informed by memories and judgments about our past choices and their outcomes. In Bayesian terms, our priors incorporate this history. In real life, we are aware that all contracts, treaties, and promises for future payment come with a finite risk of default.

In intertemporal choice scenarios, the probability of the deferred payment actually occurring is always less than 100%. That probability is rarely known and is often unknowable. Consider choices A and B below. This is how the behavioral economists tend to frame the choices.

A B
$1,000 now $1,400 paid next year

But this framing ignores an important feature of any real-world, non-hypothetical intertemporal choice situation: the probability of choice B is always less than 100%. In the above example, even risk-neutral choosers (those indifferent to all choices having the same expected value) would pick choice A over choice B if they judge the probability of non-default (actually getting the deferred payment) to be less than a certain amount.

A B C
$1000 now $1,400 in one year, P= .99 $1,400 in one year, P= 0.7
Expected value =$1000 Expected value = $1386 Expected value = $980

As shown above, if choosers believe the deferred payment likelihood to be less than about 70%, they cannot be  called irrational for choosing choice A.

Lack of Self Control – or Rational Intuitive Bayes?

Now for the final, most interesting twist in tests like the marshmallow test, almost universally ignored by those who cite them. Unlike my example above where the wait time is one year, in the marshmallow tests, the time period during which the subject is tempted to eat the first marshmallow is unknown to the subject. Subjects come into the game with a certain prior – a certain belief about the probability of non-default. But, as intuitive Bayesians, these subjects update the probability they assign to non-default, during their wait, based on the amount of time they have been waiting. The speed at which they revise their probability downward depends on their judgment of the distribution of wait times experienced in their short lives.

If kids in the marshmallow tests have concluded, based on their experience, that adults are not dependable, choice A makes sense; they should immediately eat the first marshmallow, since the second one may never materialize. Kids who endure temptation for a few minutes only to give in and eat their first marshmallow are seen as both irrational and being incapable of self-control.

But if those kids adjust their probability judgments that the second marshmallow will appear based on a prior distribution that is not a normal distribution (i.e., if as intuitive Bayesians they model wait times imposed by adults as a power-law distribution), then their eating the first marshmallow after some test-wait period makes perfect sense. They rightly conclude, on the basis of available evidence, that wait times longer than some threshold period may be very long indeed. These kids aren’t irrational, and self-control is not their main problem. Their problem is that they have been raised by irresponsible adults who have both displayed a tendency to default on payments and who are late to fulfill promises by time durations obeying power-law distributions.

Subsequent marshmallow tests have verified this. In 2013, psychologist Laura Michaelson, after more sophisticated versions of the marshmallow test, concluded “implications of this work include the need to revise prominent theories of delay of gratification.” Actually, tests going back over 50 years have shown similar results (A.R. Mahrer, The role of expectancy in delayed reinforcement, 1956).

In three recent posts (first, second, third) I suggested that behavioral economists and business people who follow them are far too prone to seeing innate bias everywhere, when they are actually seeing rational behavior through their own bias. This is certainly the case with the common misuse of the marshmallow tests. Interpreting these tests as rational behavior in light of subjects’ experience is a better explanatory theory, one more consistent with the evidence, and one that coheres with other explanatory observations, such as humans’ capacity for intuitive Bayesian belief updates.

Charismatic pessimists about human rationality twist the situation so that their pessimism is framed as good news, in the sense that they have at least illuminated an inherent human bias. That pessimism, however cheerfully expressed, is both misguided and harmful. Their failure to mention the more nuanced interpretation of marshmallow tests is dishonest and self-serving. The problem we face is not innate, and it is mostly curable. Better parenting can fix it. The marshmallow tests measure parents more than they measure kids.

Walter Mischel died in 2018. I heard his 2016 talk at the Long Now Foundation in San Francisco. He acknowledged the relatively weak correlation between marshmallow test results and later success, and he mentioned that descriptions of his experiments in popular press were rife with errors. But his talk still focused almost solely on the self-control aspect of the experiments. He missed a great opportunity to help disseminate a better story about the role of trustworthiness and reliability of parents in delayed gratification of children.

 


 

A better description of the way we really work through intertemporal choices would require going deeper into risk neutrality and how, even for a single person, our departure from risk neutrality – specifically risk-appetite skewness – varies between situations and across time. I have enjoyed doing some professional work in that area. Getting it across in a blog post is probably beyond my current blog-writing skills.

 

 

4 Comments