srhe

The Society for Research into Higher Education


Leave a comment

The Doorknob and the Door(s): Why Bayes Matters Now

By Paul Alper

When I was young, there was a sort of funny story about someone who invented the doorknob but died young and poor because the door had yet to be invented. And, perhaps the imagery is backwards in that the door existed but was useless until the doorknob came into being but I will stick with the doorknob coming first in time. Bear with me as I attempt to show the relevance of this to the current meteoric rise of Bayesianism, a philosophy and concept several centuries old. 

In a previous posting, “Statistical Illogic: the fallacy of Jacob Bernoulli and others,” I reviewed the book, Bernoulli’s Fallacy by Aubrey Clayton.  He shows in great detail how easy it is to confuse what we really should want

Prob(Hypothesis| Evidence)                                       Bayesianism

with

Prob(Evidence | Hypothesis)                                       Frequentism

A classic instance of Bayesian revision in higher education would be the famous example at the Berkeley campus of the University of California. In the 1970s, it was alleged that there was discrimination against females applying to graduate school.  Indeed, male admission rate overall was higher than female admission rate.  But, according to https://www.refsmmat.com/posts/2016-05-08-simpsons-paradox-berkeley.html, the simple explanation

“is that women tended to apply to the departments that are the hardest to get into, and men tended to apply to departments that were easier to get into. (Humanities departments tended to have less research funding to support graduate students, while science and engineer departments were awash with money.) So women were rejected more than men. Presumably, the bias wasn’t at Berkeley but earlier in women’s education, when other biases led them to different fields of study than men.”

Clayton’s examples, such as the Prosecutor’s Fallacy and medical testing confusion, give no hint of how analytically difficult it was to perform the calculations of Bayes Theorem in complicated situations. Except for a paragraph or two on pages 297 and 298 he makes no reference to how and why Bayesianism calculations can now be done numerically on very complicated, important, real-life problems in physics, statistics, machine learning, and in many other fields, thus the proliferation of Bayesianism.

For the record, the one and only picture of the reverend Thomas Bayes is generally considered apocryphal; ditto regarding the one and only picture of Shakespeare.   Bayes died in 1761 and his eponymous theorem was presented to The Royal Society in 1763 by Richard Price, an interesting character on his own.

What has changed since the inception of Bayes Theorem more than two centuries ago, the door knob if you will, is the advent of the door: World War II and the computer. At Los Alamos, New Mexico, the place that gave us the atom bomb, five people were confronted with a complicated problem in physics and they came up with a numerical way of solving Bayes Theorem via an approach known as MCMC, which stands for Markov Chain Monte Carlo. Their particular numerical way of doing things is referred to as the “Metropolis Algorithm” named after Nicholas Metropolis, the individual whose name was alphabetically first.

To give the flavour but not the details of the Metropolis algorithm, I will use a well-done, simple example I found on the web which does not use or need the Metropolis algorithm but can be solved simply using straightforward Bayes Theorem; then I show an inferior numerical technique before indicating how Metropolis would do it. The simple illustrative example is taken from the excellent web video, Bayes theorem, the geometry of changing beliefs (which in turn is taken from work by Nobel prize-winning psychologists Daniel Kahneman and Amos Tversky).

The example starts with ‘Steve’, a shy retiring individual. We are asked to say which is more likely – that he is a librarian, or a farmer? Many people will say ‘librarian’, but that is to ignore how many librarians and how many farmers there are in the general population. The example suggests there are 20 times as many farmers as librarians, so we start with 10 librarians and 200 farmers and no one else.  Consequently,

Prob(Librarian) is 10/(10 +200) = 1/21.

Prob(Farmer) is 200/(10 +200) = 20/21

The video has 4 of the 10 Librarians being shy and 20 of the 200 farmers being shy; a calculation shows how the evidence revises our thinking:

Prob(Librarian| shyness) = 4/(4+20) = 4/24 = 1/6

Prob(Farmer| shyness) = 20/(4+20) = 5/6

‘Steve’ is NOT more likely to be a librarian. The probability that ‘Steve’ is a librarian is actually one in six. Bayesian revision has been calculated and note that the results are normalised.  That is, the 5 to 1 ratio, 20/4, trivially leads to 5/6 and 1/6.  Normalisation is important in order to calculate mean, variance, etc. In more complicated scenarios in many dimensions normalisation remains vital but difficult to obtain.

The problem of normalisation can be solved numerically but not yet in the Metropolis way.  Picture a 24-sided die.  At each roll of the die, record whether the side that comes up is a number 1,2,3 or 4 and call it Librarian. If any other number, 5 to 24, comes up, call it Farmer.  Do this (very) many thousands of times and roughly 1/6 of those tosses will be librarian and 5/6 will be farmer.  This sampling procedure is deemed independent in that a given current toss of the die does not depend on what tosses took place before.  Unfortunately, this straight-forward independent sampling procedure does not work well on more involved problems in higher dimensions.

Metropolis does a specific dependent sampling procedure, in which the choice of where to go next does depend on where you are now but not how you got there, ie  the previous places you visited play no role.  Such a situation is called a Markov process, a concept which  dates from the early 20th century. If we know how to transition from one state to another, we typically seek the long-run probability of being in that state. In the Librarian/Farmer problem, there are only two states, Librarian and Farmer. The Metropolis algorithm says begin in one of the states, Librarian or Farmer, toss a two-sided die which proposes a move.  Accept this move as long as  you do not go down. So, moving from Librarian to Librarian, Farmer to Farmer or Librarian to Farmer are accepted. Moving from Farmer to Librarian may be accepted or not; the choice depends on the relative heights – the bigger the drop, the less likely the move is to be accepted.  Metropolis says: take the ratio, 4/20, and compare to a random number between zero and one. If the random number is less than 4/20, move from Farmer to Librarian; if not, stay at Farmer. Repeat the procedure (very) many, many times.

Typically, there is a burn-in period so the first bunch are ignored and we count from then on the fraction of the runs that we are in the Librarian state or in the Farmer state, to yield the 1/6 and 5/6.

Multiple thousands of iterations today take no time at all; back in World War II, computing was in its infancy and one wonders how many weeks it took to get a run which today, would be done in seconds.  But, so to speak, a door was being constructed. 

In 1970, Hastings introduced an additional term so that for complex cases, the proposals and acceptances would better capture more complex, involved “terrain” than this simple example. In keeping with the doorknob and door imagery, Metropolis Hastings is a better door, allowing us to visit more complicated, elaborate terrain more assuredly and more quickly.  An even newer door, inspired by problems in physics, is known as the  Hamiltonian MCMC.  It is even more complicated, but it is still a door,related to previous MCMC doors.  There are many web sites and videos attempting to explain the details of these algorithms but it is not easy going to follow the logic of every step.  Suffice to say, however, the impact  is enormous and justifies the resurgence of Bayesianism.

Paul Alper is an emeritus professor at the University of St. Thomas, having retired in 1998. For several decades, he regularly contributed Notes from North America to Higher Education Review. He is almost the exact age of Woody Allen and the Dalai Lama and thus, was fortunate to be too young for some wars and too old for other ones. In the 1990s, he was awarded a Nike sneaker endorsement which resulted in his paper, Imposing Views, Imposing Shoes: A Statistician as a Sole Model; it can be found at The American Statistician, August 1995, Vol 49, No. 3, pages 317 to 319.


Leave a comment

Bayes, Keynes, King and Fritz: fake news and academics with fixed ideas

By Paul Alper

“When the facts change, I change my mind. What do you do, sir?” – John Maynard Keynes

On the Fritz – Unknown. Attested from 1902, originally meaning “in a bad way” or “in bad condition”, malfunctioning of an appliance. Perhaps from German name Fritz, or by onomatopoeia (here, imitating the sound of electric sparks jumping).

In statistics, Bayesianism plays an important role.  According to Wikipedia, Thomas Bayes was an English statistician, philosopher and Presbyterian minister who is known for formulating a specific case of the theorem that bears his name: Bayes’ theorem. Bayes never published what would become his most famous accomplishment; his notes were edited and published after his death.

Bayes’ theorem is the way of combining what one currently believes with the new data in order to come up with an updated belief. In fact, this is how things like machine learning and weather forecasting are done successfully. Eventually, enough new data can override/supplement initial beliefs. Keynes was thus a Bayesian, albeit a couple of hundred years after Bayes.

Giving up a previous position is never easy – people are not machines – and note that the Keynesian quotation is, in fact, apocryphal. To illustrate further how resistant to updating humans are, consider Karen King, a distinguished Harvard professor, as discussed in Ariel Sabar’s book, Veritas: A Harvard Professor, A Con Man and the Gospel of Jesus’s Wife

Karen King would seem to me to be a classic case of someone, who despite (because of?) being very accomplished, just cannot accept that her idée fixe – Mary Magdalen was both wife and lead disciple of Jesus – could be wrong. In passing, I should note that my long-standing personal idée fixe is that academics in general suffer the same affliction. She is also typical in that when evidence arises that counters her convictions, she actively attempts to dismiss its importance. For details regarding her (mis)weighing of the data, see Mark Oppenheim’s review of Sabar’s book in the New York Times.

The other main character in the book is the con man, Walter Fritz, who may indeed for all I know, be an expert on Bayesianism. He seems to be knowledgeable in Egyptology, papyrology, sex video productions and, for good measure, he was the head of the Stasi museum in Berlin. More importantly, he knew how to exploit the weakness of his mark. He delivered to King precisely what she wanted to be true. Sabar expends a great deal of shoe-leather journalism to find the gaps in Fritz’s storyline and King’s willingness to be a believer.  Almost right to the end, she never was onto Fritz or heard the electric sparks jumping.

But apparently this failure to hear the sparks jumping also afflicts the half of the US who believe deeply, truly and incorrectly that the 2020 election was rigged by The Deep State, an entity so shadowy that finding no evidence of its existence is further proof of its existence. Currently, because of the mob raid on the United States Capitol, Republican legislators are in a Bayesian quandary as to if, when and how to leave the Trump ship. The new data really aren’t all that different from the old data but after a while, as they say, enough is enough, especially when fascist-type behavior is captured on video immediately after deep-south Georgia elected two Democrats to the United States Senate. I am tempted to repeat my favorite quotation from a TV program of my youth:

“You can fool some of the people all of the time, and all of the people some of the time—and them’s pretty good odds.”

Paul Alper is an emeritus professor at the University of St. Thomas, having retired in 1998. For several decades, he regularly contributed Notes from North America to Higher Education Review. He is almost the exact age of Woody Allen and the Dalai Lama and thus, was fortunate to be too young for some wars and too old for other ones. In the 1990s, he was awarded a Nike sneaker endorsement which resulted in his paper, Imposing Views, Imposing Shoes: A Statistician as a Sole Model; it can be found at The American Statistician, August 1995, Vol 49, No. 3, pages 317 to 319.