by Paul Alper
Bernoulli’s Fallacy, Statistical Illogic and the Crisis of Modern Science by Aubrey Clayton.
“My goal with this book is not to broker a peace treaty; my goal is to win the war.” (Preface p xv)
“We should no more be teaching p-values in statistics courses than we should be teaching phrenology in medical schools.” (p239)
It is possible or even probable that many a PhD or journal article in the softer sciences has got by through misunderstanding probability and statistics. Clayton’s book aims to expose the shortcomings of a fallacy first attributed to the 17th century mathematician Jacob Bernoulli, but relied on repeatedly for centuries afterwards, despite the 18th century work of statistician Thomas Bayes, and exemplified in the work of RA Fisher, the staple of so many social science primers on probability and statistics.
In the midst of the frightening Cold War, I attended a special lecture at the University of Wisconsin-Madison on 12 February 1960 by Fisher, the most prominent statistician of the 20th century; he was touring the United States and other countries. I had never heard of him and indeed, despite being in grad school, my undergraduate experience was entirely deterministic: apply a voltage then measure a current, apply a force then measure acceleration, etc. Not a hint, not a mention of variability, noise, or random disturbance. The general public’s common currency in 1960 did not then include such terms as random sample, statistical significance, and margin of error.
However, Fisher was speaking on the hot topic of that day: was smoking a cause of cancer? Younger readers may wonder how in the world was this a debatable subject when in hindsight, it is so strikingly obvious. Well, it was not obvious in 1960 and the history of inflight smoking indicates how difficult it was to turn the tide, and how many years it took. Fisher’s tour of the United States was sponsored by the tobacco industry, but it would be wrong to conjecture that he was being hypocritical. And not just because he was a smoker himself.
Fisher believed that mere observations were insufficient for concluding that A causes B; it could be that B causes A or that C is responsible for both A and B. He insisted upon experimental and not mere observational evidence. According to Fisher, it could be that people who have some underlying physical problem led them to smoke rather than smoking caused the underlying problem; or that some other cause such as pollution was to blame. According to Fisher, in order to experimentally link smoking as the cause of cancer, at random some children would be required to smoke and some would be required not to smoke and then as time goes by note the incidence of cancer in each of the two groups.
However, according to Clayton, Fisher himself, just like Jacob Bernoulli, had it backwards when it came to analysing experiments. If Fisher and Bernoulli can make this mistake, it is easy for others to fall into this trap because ordinary language keeps tripping us up. Clayton expends much effort into showing examples, such as the famous Prosecutor’s Fallacy. The fallacy was exemplified in the UK by the infamous Meadows case and is discussed at length by Clayton; a prosecution expert witness made unsustainable assertions about the probability of innocence being “one in 73 million”.
The Bayesian way of looking at things is to consider the probability a person is guilty, given the evidence. This is not the same as the probability of the evidence, given the person is guilty, which is the ‘frequentist’ approach adopted by Fisher, with results which can be wildly different numerically. Another example, from the medical world: there is confusion between the probability of having a disease, given a positive test for the disease:
Prob (Disease | Test Positive) ; the Bayesian way of looking at things
Prob (Test Positive | Disease) ; the frequentist approach
The patient is interested in the former but is often quoted the latter, known as the sensitivity of the test, which might be markedly different depending on the base rate of the disease. If the base rate is, say, one in 1,000 and the test sensitivity is, say, 90%, then for every 1000 tests, 100 will be false positives. A Bayesian would therefore conclude correctly that the chances of a false positive test are 100 times greater than the chances of actually having the disease. In other words, the hypothesis that the person has the disease is not supported by the data/evidence. However a frequentist might mistakenly say that if you test positive there is a 90% chance that you have the disease.
The quotation from page xv of Clayton’s preface which begins this essay, shows how much Clayton, a Bayesian, is determined to counter Bernoulli’s fallacy and set things straight. Fisher’s frequentist approach still finds favor among social scientists because his setup, no matter how flawed, was an easy recipe to follow. Assume a straw-man hypothesis such as ‘no effect’, take data to obtain a so-called p-value and, in the mechanical manner suggested by Fisher, if the p-value is low enough, reject the straw man. Therefore, the winner was the opposite of the straw man, namely the effect/hypothesis/contention/claim is real.
Fisher, a founder, and not just a follower of the eugenics movement, was as I once wrote, “a genius, and difficult to get along with.” Upon reflection, I consequently changed the conjunction to an implication, “a genius, therefore difficult to get along with.” His then son-in-law back on 12 February 1960 was George Box, also a famous statistician – among other things the author of the famous phrase in statistics, “all models are wrong, some are useful” – who had just been appointed to be the head of the University of Wisconsin’s statistics department. Unlike Fisher, Box was a very agreeable and kindly person and, as evidence of those qualities, I note that he was on the committee that approved my PhD thesis, a writing endeavour of mine which I hope is never unearthed for future public consumption.
All of that was a long time ago, well before the Soviet Union collapsed, only to see today’s military rise of Russia. Tobacco use and sales throughout the world are much reduced while cannabis acceptance is on the rise. Statisticians have since moved on to consider and solve much weightier computational problems via the rubric of so-called Data Science. I was in my mid-twenties and I doubt that there were many people younger than I was at that Fisher presentation, so I am on track to be the last one alive who heard a lecture by Fisher disputing smoking as a cause of cancer. He died in Australia in 1962, a month after my 26th birthday but his legacy, reputation and contribution live on and hence, the fallacy of Bernoulli as well.
Paul Alper is an emeritus professor at the University of St. Thomas, having retired in 1998. For several decades, he regularly contributed Notes from North America to Higher Education Review. He is almost the exact age of Woody Allen and the Dalai Lama and thus, was fortunate to be too young for some wars and too old for other ones. In the 1990s, he was awarded a Nike sneaker endorsement which resulted in his paper, Imposing Views, Imposing Shoes: A Statistician as a Sole Model; it can be found at The American Statistician, August 1995, Vol 49, No. 3, pages 317 to 319.