srhe

The Society for Research into Higher Education


Leave a comment

The Doorknob and the Door(s): Why Bayes Matters Now

By Paul Alper

When I was young, there was a sort of funny story about someone who invented the doorknob but died young and poor because the door had yet to be invented. And, perhaps the imagery is backwards in that the door existed but was useless until the doorknob came into being but I will stick with the doorknob coming first in time. Bear with me as I attempt to show the relevance of this to the current meteoric rise of Bayesianism, a philosophy and concept several centuries old. 

In a previous posting, “Statistical Illogic: the fallacy of Jacob Bernoulli and others,” I reviewed the book, Bernoulli’s Fallacy by Aubrey Clayton.  He shows in great detail how easy it is to confuse what we really should want

Prob(Hypothesis| Evidence)                                       Bayesianism

with

Prob(Evidence | Hypothesis)                                       Frequentism

A classic instance of Bayesian revision in higher education would be the famous example at the Berkeley campus of the University of California. In the 1970s, it was alleged that there was discrimination against females applying to graduate school.  Indeed, male admission rate overall was higher than female admission rate.  But, according to https://www.refsmmat.com/posts/2016-05-08-simpsons-paradox-berkeley.html, the simple explanation

“is that women tended to apply to the departments that are the hardest to get into, and men tended to apply to departments that were easier to get into. (Humanities departments tended to have less research funding to support graduate students, while science and engineer departments were awash with money.) So women were rejected more than men. Presumably, the bias wasn’t at Berkeley but earlier in women’s education, when other biases led them to different fields of study than men.”

Clayton’s examples, such as the Prosecutor’s Fallacy and medical testing confusion, give no hint of how analytically difficult it was to perform the calculations of Bayes Theorem in complicated situations. Except for a paragraph or two on pages 297 and 298 he makes no reference to how and why Bayesianism calculations can now be done numerically on very complicated, important, real-life problems in physics, statistics, machine learning, and in many other fields, thus the proliferation of Bayesianism.

For the record, the one and only picture of the reverend Thomas Bayes is generally considered apocryphal; ditto regarding the one and only picture of Shakespeare.   Bayes died in 1761 and his eponymous theorem was presented to The Royal Society in 1763 by Richard Price, an interesting character on his own.

What has changed since the inception of Bayes Theorem more than two centuries ago, the door knob if you will, is the advent of the door: World War II and the computer. At Los Alamos, New Mexico, the place that gave us the atom bomb, five people were confronted with a complicated problem in physics and they came up with a numerical way of solving Bayes Theorem via an approach known as MCMC, which stands for Markov Chain Monte Carlo. Their particular numerical way of doing things is referred to as the “Metropolis Algorithm” named after Nicholas Metropolis, the individual whose name was alphabetically first.

To give the flavour but not the details of the Metropolis algorithm, I will use a well-done, simple example I found on the web which does not use or need the Metropolis algorithm but can be solved simply using straightforward Bayes Theorem; then I show an inferior numerical technique before indicating how Metropolis would do it. The simple illustrative example is taken from the excellent web video, Bayes theorem, the geometry of changing beliefs (which in turn is taken from work by Nobel prize-winning psychologists Daniel Kahneman and Amos Tversky).

The example starts with ‘Steve’, a shy retiring individual. We are asked to say which is more likely – that he is a librarian, or a farmer? Many people will say ‘librarian’, but that is to ignore how many librarians and how many farmers there are in the general population. The example suggests there are 20 times as many farmers as librarians, so we start with 10 librarians and 200 farmers and no one else.  Consequently,

Prob(Librarian) is 10/(10 +200) = 1/21.

Prob(Farmer) is 200/(10 +200) = 20/21

The video has 4 of the 10 Librarians being shy and 20 of the 200 farmers being shy; a calculation shows how the evidence revises our thinking:

Prob(Librarian| shyness) = 4/(4+20) = 4/24 = 1/6

Prob(Farmer| shyness) = 20/(4+20) = 5/6

‘Steve’ is NOT more likely to be a librarian. The probability that ‘Steve’ is a librarian is actually one in six. Bayesian revision has been calculated and note that the results are normalised.  That is, the 5 to 1 ratio, 20/4, trivially leads to 5/6 and 1/6.  Normalisation is important in order to calculate mean, variance, etc. In more complicated scenarios in many dimensions normalisation remains vital but difficult to obtain.

The problem of normalisation can be solved numerically but not yet in the Metropolis way.  Picture a 24-sided die.  At each roll of the die, record whether the side that comes up is a number 1,2,3 or 4 and call it Librarian. If any other number, 5 to 24, comes up, call it Farmer.  Do this (very) many thousands of times and roughly 1/6 of those tosses will be librarian and 5/6 will be farmer.  This sampling procedure is deemed independent in that a given current toss of the die does not depend on what tosses took place before.  Unfortunately, this straight-forward independent sampling procedure does not work well on more involved problems in higher dimensions.

Metropolis does a specific dependent sampling procedure, in which the choice of where to go next does depend on where you are now but not how you got there, ie  the previous places you visited play no role.  Such a situation is called a Markov process, a concept which  dates from the early 20th century. If we know how to transition from one state to another, we typically seek the long-run probability of being in that state. In the Librarian/Farmer problem, there are only two states, Librarian and Farmer. The Metropolis algorithm says begin in one of the states, Librarian or Farmer, toss a two-sided die which proposes a move.  Accept this move as long as  you do not go down. So, moving from Librarian to Librarian, Farmer to Farmer or Librarian to Farmer are accepted. Moving from Farmer to Librarian may be accepted or not; the choice depends on the relative heights – the bigger the drop, the less likely the move is to be accepted.  Metropolis says: take the ratio, 4/20, and compare to a random number between zero and one. If the random number is less than 4/20, move from Farmer to Librarian; if not, stay at Farmer. Repeat the procedure (very) many, many times.

Typically, there is a burn-in period so the first bunch are ignored and we count from then on the fraction of the runs that we are in the Librarian state or in the Farmer state, to yield the 1/6 and 5/6.

Multiple thousands of iterations today take no time at all; back in World War II, computing was in its infancy and one wonders how many weeks it took to get a run which today, would be done in seconds.  But, so to speak, a door was being constructed. 

In 1970, Hastings introduced an additional term so that for complex cases, the proposals and acceptances would better capture more complex, involved “terrain” than this simple example. In keeping with the doorknob and door imagery, Metropolis Hastings is a better door, allowing us to visit more complicated, elaborate terrain more assuredly and more quickly.  An even newer door, inspired by problems in physics, is known as the  Hamiltonian MCMC.  It is even more complicated, but it is still a door,related to previous MCMC doors.  There are many web sites and videos attempting to explain the details of these algorithms but it is not easy going to follow the logic of every step.  Suffice to say, however, the impact  is enormous and justifies the resurgence of Bayesianism.

Paul Alper is an emeritus professor at the University of St. Thomas, having retired in 1998. For several decades, he regularly contributed Notes from North America to Higher Education Review. He is almost the exact age of Woody Allen and the Dalai Lama and thus, was fortunate to be too young for some wars and too old for other ones. In the 1990s, he was awarded a Nike sneaker endorsement which resulted in his paper, Imposing Views, Imposing Shoes: A Statistician as a Sole Model; it can be found at The American Statistician, August 1995, Vol 49, No. 3, pages 317 to 319.