srhe

The Society for Research into Higher Education


Leave a comment

The Doorknob and the Door(s): Why Bayes Matters Now

By Paul Alper

When I was young, there was a sort of funny story about someone who invented the doorknob but died young and poor because the door had yet to be invented. And, perhaps the imagery is backwards in that the door existed but was useless until the doorknob came into being but I will stick with the doorknob coming first in time. Bear with me as I attempt to show the relevance of this to the current meteoric rise of Bayesianism, a philosophy and concept several centuries old. 

In a previous posting, “Statistical Illogic: the fallacy of Jacob Bernoulli and others,” I reviewed the book, Bernoulli’s Fallacy by Aubrey Clayton.  He shows in great detail how easy it is to confuse what we really should want

Prob(Hypothesis| Evidence)                                       Bayesianism

with

Prob(Evidence | Hypothesis)                                       Frequentism

A classic instance of Bayesian revision in higher education would be the famous example at the Berkeley campus of the University of California. In the 1970s, it was alleged that there was discrimination against females applying to graduate school.  Indeed, male admission rate overall was higher than female admission rate.  But, according to https://www.refsmmat.com/posts/2016-05-08-simpsons-paradox-berkeley.html, the simple explanation

“is that women tended to apply to the departments that are the hardest to get into, and men tended to apply to departments that were easier to get into. (Humanities departments tended to have less research funding to support graduate students, while science and engineer departments were awash with money.) So women were rejected more than men. Presumably, the bias wasn’t at Berkeley but earlier in women’s education, when other biases led them to different fields of study than men.”

Clayton’s examples, such as the Prosecutor’s Fallacy and medical testing confusion, give no hint of how analytically difficult it was to perform the calculations of Bayes Theorem in complicated situations. Except for a paragraph or two on pages 297 and 298 he makes no reference to how and why Bayesianism calculations can now be done numerically on very complicated, important, real-life problems in physics, statistics, machine learning, and in many other fields, thus the proliferation of Bayesianism.

For the record, the one and only picture of the reverend Thomas Bayes is generally considered apocryphal; ditto regarding the one and only picture of Shakespeare.   Bayes died in 1761 and his eponymous theorem was presented to The Royal Society in 1763 by Richard Price, an interesting character on his own.

What has changed since the inception of Bayes Theorem more than two centuries ago, the door knob if you will, is the advent of the door: World War II and the computer. At Los Alamos, New Mexico, the place that gave us the atom bomb, five people were confronted with a complicated problem in physics and they came up with a numerical way of solving Bayes Theorem via an approach known as MCMC, which stands for Markov Chain Monte Carlo. Their particular numerical way of doing things is referred to as the “Metropolis Algorithm” named after Nicholas Metropolis, the individual whose name was alphabetically first.

To give the flavour but not the details of the Metropolis algorithm, I will use a well-done, simple example I found on the web which does not use or need the Metropolis algorithm but can be solved simply using straightforward Bayes Theorem; then I show an inferior numerical technique before indicating how Metropolis would do it. The simple illustrative example is taken from the excellent web video, Bayes theorem, the geometry of changing beliefs (which in turn is taken from work by Nobel prize-winning psychologists Daniel Kahneman and Amos Tversky).

The example starts with ‘Steve’, a shy retiring individual. We are asked to say which is more likely – that he is a librarian, or a farmer? Many people will say ‘librarian’, but that is to ignore how many librarians and how many farmers there are in the general population. The example suggests there are 20 times as many farmers as librarians, so we start with 10 librarians and 200 farmers and no one else.  Consequently,

Prob(Librarian) is 10/(10 +200) = 1/21.

Prob(Farmer) is 200/(10 +200) = 20/21

The video has 4 of the 10 Librarians being shy and 20 of the 200 farmers being shy; a calculation shows how the evidence revises our thinking:

Prob(Librarian| shyness) = 4/(4+20) = 4/24 = 1/6

Prob(Farmer| shyness) = 20/(4+20) = 5/6

‘Steve’ is NOT more likely to be a librarian. The probability that ‘Steve’ is a librarian is actually one in six. Bayesian revision has been calculated and note that the results are normalised.  That is, the 5 to 1 ratio, 20/4, trivially leads to 5/6 and 1/6.  Normalisation is important in order to calculate mean, variance, etc. In more complicated scenarios in many dimensions normalisation remains vital but difficult to obtain.

The problem of normalisation can be solved numerically but not yet in the Metropolis way.  Picture a 24-sided die.  At each roll of the die, record whether the side that comes up is a number 1,2,3 or 4 and call it Librarian. If any other number, 5 to 24, comes up, call it Farmer.  Do this (very) many thousands of times and roughly 1/6 of those tosses will be librarian and 5/6 will be farmer.  This sampling procedure is deemed independent in that a given current toss of the die does not depend on what tosses took place before.  Unfortunately, this straight-forward independent sampling procedure does not work well on more involved problems in higher dimensions.

Metropolis does a specific dependent sampling procedure, in which the choice of where to go next does depend on where you are now but not how you got there, ie  the previous places you visited play no role.  Such a situation is called a Markov process, a concept which  dates from the early 20th century. If we know how to transition from one state to another, we typically seek the long-run probability of being in that state. In the Librarian/Farmer problem, there are only two states, Librarian and Farmer. The Metropolis algorithm says begin in one of the states, Librarian or Farmer, toss a two-sided die which proposes a move.  Accept this move as long as  you do not go down. So, moving from Librarian to Librarian, Farmer to Farmer or Librarian to Farmer are accepted. Moving from Farmer to Librarian may be accepted or not; the choice depends on the relative heights – the bigger the drop, the less likely the move is to be accepted.  Metropolis says: take the ratio, 4/20, and compare to a random number between zero and one. If the random number is less than 4/20, move from Farmer to Librarian; if not, stay at Farmer. Repeat the procedure (very) many, many times.

Typically, there is a burn-in period so the first bunch are ignored and we count from then on the fraction of the runs that we are in the Librarian state or in the Farmer state, to yield the 1/6 and 5/6.

Multiple thousands of iterations today take no time at all; back in World War II, computing was in its infancy and one wonders how many weeks it took to get a run which today, would be done in seconds.  But, so to speak, a door was being constructed. 

In 1970, Hastings introduced an additional term so that for complex cases, the proposals and acceptances would better capture more complex, involved “terrain” than this simple example. In keeping with the doorknob and door imagery, Metropolis Hastings is a better door, allowing us to visit more complicated, elaborate terrain more assuredly and more quickly.  An even newer door, inspired by problems in physics, is known as the  Hamiltonian MCMC.  It is even more complicated, but it is still a door,related to previous MCMC doors.  There are many web sites and videos attempting to explain the details of these algorithms but it is not easy going to follow the logic of every step.  Suffice to say, however, the impact  is enormous and justifies the resurgence of Bayesianism.

Paul Alper is an emeritus professor at the University of St. Thomas, having retired in 1998. For several decades, he regularly contributed Notes from North America to Higher Education Review. He is almost the exact age of Woody Allen and the Dalai Lama and thus, was fortunate to be too young for some wars and too old for other ones. In the 1990s, he was awarded a Nike sneaker endorsement which resulted in his paper, Imposing Views, Imposing Shoes: A Statistician as a Sole Model; it can be found at The American Statistician, August 1995, Vol 49, No. 3, pages 317 to 319.


Leave a comment

Understanding the value of EdTech in higher education

by Morten Hansen

This blog is a re-post of an article first published on universityworldnews.com. It is based on a presentation to the 2021 SRHE Research Conference, as part of a Symposium on Universities and Unicorns: Building Digital Assets in the Higher Education Industry organised by the project’s principal investigator, Janja Komljenovic (Lancaster). The support of the Economic and Social Research Council (ESRC) is gratefully acknowledged. The project introduces new ways to think about and examine the digitalising of the higher education sector. It investigates new forms of value creation and suggests that value in the sector increasingly lies in the creation of digital assets.

EdTech companies are, on average, priced modestly, although some have earned strong valuations. We know that valuation practices normally reflect investors’ belief in a company’s ability to make money in the future. We are, however, still learning about how EdTech generates value for users, and how to take account of such value in the grand scheme of things.


Valuation and deployment of user-generated data

EdTech companies are not competing with the likes of Google and Facebook for advertisement revenue. That is why phrases such as ‘you are the product’ and ‘data is the new oil’ yield little insight when applied to EdTech. For EdTech companies, strong valuations hinge on the idea that technology can bring use value to learners, teachers and organisations – and that they will eventually be willing to pay for such benefits, ideally in the form of a subscription. EdTech companies try to deliver use value in multiple ways, such as deploying user-generated data to improve their services. User-generated data are the digital traces we leave when engaging with a platform: keyboard strokes and mouse movements, clicks and inactivity.


The value of user-generated data in higher education

The gold standard for unlocking the ‘value’ of user-generated data is to bring about an activity that could otherwise not have arisen. Change is brought about through data feedback loops. Loops consist of five stages: data generation, capture, anonymisation, computation and intervention. Loops can be long and short.


For example, imagine that a group of students is assigned three readings for class. Texts are accessed and read on an online platform. Engagement data indicate that all students spent time reading text 1 and text 2, but nobody read text 3. As a result of this insight, come next semester, text 3 is replaced by a more ‘engaging’ text. That is a long feedback loop.


Now, imagine that one student is reading one text. The platform’s machine learning programme generates a rudimentary quiz to test comprehension. Based on the students’ answers, further readings are suggested or the student is encouraged to re-read specific sections of the text. That is a short feedback loop.


In reality, most feedback loops do not bring about activity that could not have happened otherwise. It is not like a professor could not learn, through conversation, which texts are better liked by students, what points are comprehended, and so on. What is true, though, is that the basis and quality of such judgments shifts. Most importantly, so does the cost structure that underpins judgment.


The more automated feedback loops are, the greater the economy of scale. ‘Automation’ refers to the decoupling of additional feedback loops from additional labour inputs. ‘Economies of scale’ means that the average cost of delivering feedback loops decreases as the company grows.


Proponents of machine learning and other artificial intelligence approaches argue that the use value of feedback loops improves with scale: the more users engage in the back-and-forth between generating data, receiving intervention and generating new data, the more precise the underlying learning algorithms become in predicting what interventions will ‘improve learning’.


The platform learns and grows with us

EdTech platforms proliferate because they are seen to deliver better value for money than the human-centred alternative. Cloud-based platforms are accessed through subscriptions without transfer of ownership. The economic relationship is underwritten by law and continued payment is legitimated through the feedback loops between humans and machines: the platform learns and grows with us, as we feed it.


Machine learning techniques certainly have the potential to improve the efficiency with which we organise certain learning activities, such as particular types of student assessment and monitoring. However, we do not know which values to mobilise when judging intervention efficacy: ‘value’ and ‘values’ are different things.


In everyday talk, we speak about ‘value’ when we want to justify or critique a state of affairs that has a price: is the price right, too low, or too high? We may disagree on the price, but we do agree that something is for sale. At other times we reject the idea that a thing should be for sale, like a family heirloom, love or education. If people tell us otherwise, we question their values. This is because values are about relationships and politics.


When we ask about the values of EdTech in higher education, we are really asking: what type of relations do we think are virtuous and appropriate for the institution? What relationships are we forging and replacing between machines and people, and between people and people?


When it comes to the application of personal technology we have valued convenience, personalisation and seamlessness by forging very intimate but easily forgettable machine-human relations. This could happen in the EdTech space as well. Speech-to-text recognition, natural language processing and machine vision are examples of how bonds can be built between humans and computers, aiding feedback loops by making worlds of learning computable.


Deciding on which learning relations to make computable, I argue, should be driven by values. Instead of seeing EdTech as a silver bullet that simply drives learning outcomes, it is more useful to think of it as technology that mediates learning relations and processes: what relationships do we value as important for students and when is technology helpful and unhelpful in establishing those? In this way, values can help us guide the way we account for the value of edtech.

Morten Hansen is a research associate on the Universities and Unicorns project at Lancaster University, and a PhD student at the Faculty of Education, University of Cambridge, United Kingdom. Hansen specialises in education markets and has previously worked as a researcher at the Saïd Business School in Oxford.