SRHE Blog

The Society for Research into Higher Education

Image of Rob Cuthbert


6 Comments

Mr Sherwood v The Office of Qualifications and Examinations Regulation[1]


[1] The ITV programme ‘Mr Bates v The Post Office’ was shown on British TV during the first week of January 2024 and has generated in the UK a media firestorm and a swift government response. Those, probably mostly outside the UK, who are unfamiliar with the story might like to read this explainer from Private Eyebefore reading this editorial. Or just Google it.

by Rob Cuthbert

Mr Sherwood, you’re the only one who’s been reporting these problems …

We have complete confidence that our system is robust.

This is a story of injustice on a massive scale, over a long period. The story of someone affronted by the unfairness who refused to give up, even though the authorities lined up to oppose him and try to make him go away. A story which has not yet attracted the attention it seems to deserve, given the way it affects the lives of tens of thousands of people who put their faith in a flawed system.

Every year a new group of tens of thousands of people are subject to the same repeated injustice. Most of them have no idea that they might have been unfairly treated. If they try to use official procedures for complaint and recompense most of them will fail. The authorities’ repeated mantra is that the system is ‘the best and fairest way’.

It could be, but it isn’t. And one person’s attempts to make things better have been met with denial, opposition, obfuscation, and the use of official processes to discourage media attention, by a public agency which is “independent of government”.

The Office of Qualifications and Examinations Regulation (Ofqual) is charged with regulating and maintaining standards and confidence in GCSEs, A levels, AS levels, and vocational and technical qualifications. Ten years ago Ofqual were aware of some potential problems in grading. To determine the extent of the problem, they took entire cohorts of GCSE, AS and A Level scripts and re-marked them, comparing the marks given by an ordinary examiner to comparable re-marks given by a senior examiner. Eventually this led to two careful and scholarly reports: Marking Consistency Metrics in 2016 and Marking Consistency Metrics – An Update  in 2018.

The reports showed varying reliability in the grades awarded by examiners, compared with the ‘true’ or ‘definitive’ grade awarded by a senior examiner. Dennis Sherwood, an independent analyst and consultant, interpreted Ofqual’s measurements of grade reliability as a consequence of what he termed ‘fuzziness’. Fuzziness is the range around a senior examiner’s ‘definitive’ mark that contains the ‘legitimate’ marks given by an ordinary examiner. The 2018 report found that grades for, say, English and History are much less reliable than those for Maths and Physics. In Sherwood’s terms, the ‘fuzziness’ of the marks associated with English and History is greater than for Maths and Physics.

Problems arise when a marking range straddles a grade boundary. For example, if a script is legitimately marked in a range from 38-42, but a grade boundary is set at 40, then more than one grade could result from that one script, depending on who marks it and how. Ofqual have admitted that this is the case:

“…more than one grade could well be a legitimate reflection of a student’s performance and they would both be a sound estimate of that student’s ability at that point in time based on the available evidence from the assessment they have undertaken.” (Ofqual, 2019).

The 2016 report says: “… the wider the grade boundary locations, the greater the probability of candidates receiving the definitive grade.” GCSEs have nine grades plus unclassified, and A-levels have six plus unclassified, meaning grade widths are inevitably narrower than, for example, university degree classifications with just four plus fail. With comparatively narrow grade widths more candidates will be close to a boundary. In other words, and however good the marking is, grading for many candidates will not always give a ’true’ or ‘definitive’ grade.

This situation is admitted by Ofqual and has been known for more than five years, since the 2018 Report. Dr Michelle Meadows, formerly Ofqual’s Executive Director for Strategy, Risk and Research, said in evidence to the House of Lords Education for 11-16 year olds Committee (2023) on 30 March 2023:

It’s really important that people don’t put too much weight on any individual grade. … I know, unfortunately, that a lot of weight is placed on particular GCSEs for progression, maths and English being the obvious ones. In maths that is less problematic because the assessment in maths is generally highly reliable. In English that is problematic. This is not a failure of our GCSE system. This is the reality of assessment. It is the same around the world. There is no easy fix, I am afraid. It is how we use the grades that needs to change rather than creating a system of lengthy assessments.” (emphasis added).

Dame Glenys Stacey, Ofqual’s Chief Regulator until 2016, was reappointed as Acting Chief Regulator after the departure of Sally Collier in the aftermath of the 2020 results, and she said in 2020 (House of Commons Education Committee, 2020a: Q1059):

“It is interesting how much faith we put in examination and the grade that comes out of that. We know from research, as I think Michelle mentioned, that we have faith in them, but they are reliable to one grade either way.”  (emphasis added)

According to Ofqual’s own research, we have a national system of grading that is only 95% reliable – and then only if you accept that grades are reliable within plus or minus a grade. The problem is that most people use grades more precisely than that. If you don’t get a grade 4 or above in GCSE English or Mathematics, you may be allowed to progress to educational routes post-16, but you must take a resit alongside your next phase of study, and will not be allowed to continue if your resit grade is still 3 or below. If you miss out by just one grade at A-level, your chosen  university may reject you. Although marking meets the best international standards, grading still contains much individual unfairness. That means many students may miss out on their preferred university, be forced to wait a year to try again, or decide not to enter higher education at all.

We know this mainly because of the efforts of Dennis Sherwood, who started writing about problems with grading five years ago. Sherwood’s analyses attracted media attention but often his findings were rejected by Ofqual, for example in Camilla Turner’s Daily Telegraph report of 25 August 2018, when an Ofqual spokesman was quoted as saying: ‘Mr Sherwood’s research is “entirely without merit” and has drawn “incorrect conclusions’ (Turner, 2018).

Ofqual tried to shut down Sherwood’s commentaries, and complained to the Independent Press Standards Organisation (IPSO) about a Sunday Times article headlined ‘Revealed – A-level results are 48% wrong’ published on 11 August 2019. IPSO’s finding upheld the complaint, but only on the narrow grounds that the newspaper had not made it sufficiently clear that the use of the word ‘wrong’ was the newspaper’s, and not Ofqual’s, characterisation of the research. However the IPSO ruling said:

“It was not significantly misleading to report that 48% of grades could be “wrong”, in circumstances where the research indicated that, in 48% of cases, a senior examiner could have awarded a different grade to that awarded by the examiner who had marked the paper. The complainant had accepted that different grades could be awarded as a result of inconsistencies in marking, but disagreed with the characterisation of the research which had been adopted by the publication.”

Sherwood’s argument has never been refuted. Ofqual, with its statutory responsibility to maintain public confidence in qualifications, was trying to ignore or attack stories that ‘one grade in four is wrong’. That tactic might have succeeded, were it not for Covid. The story of the infamous examinations algorithm, ultimately abandoned, need not be repeated here. However it showed, first, that few parents and indeed teachers understood how the grading system worked. Secondly, Ofqual’s defence of the flawed 2020 algorithm was so focused on the collective unfairness of grade inflation between one year and the next that they failed to recognise that their ‘solution’ moved grading from a national competition to an intensely local one. That made individual unfairnesses very visible, there was a public outcry and the algorithm was abandoned. Individual unfairness in grading persists – but has reverted to its former obscurity.

Dennis Sherwood accordingly wrote a book, Missing the Mark, which I reviewed for HEPI, setting out his arguments in detail. It seemed to be persuading more in the educational media to give his arguments the space they deserved. He was no longer entirely alone, with a small group (including me) finding his arguments convincing. Support from various media, notably the HEPI blog, gave him space to make his argument. However, as in the case of Mr Bates and the Post Office, there were still just a few individuals ranged against the forces of Ofqual and (some of) the educational establishment.

On 8 June 2023 I wrote ‘If A-level grades are unreliable, what should admission officers do? for HEPI, arguing that universities should recognise the limited reliability of A-level grades by giving candidates the benefit of the doubt, uplifting all achieved results by one grade. That blog was perhaps provocative but it did at least recognise the problem and suggest a short-term fix. My 2020 explanation about the algorithm had become the most-read HEPI blog ever, and I was invited, as I had been every year since 2020, to contribute a further blog to HEPI, to be published near to A-level results day. My follow-up to the June blog advised students and parents how to respond if they had fallen short of an offer they had accepted. I submitted it to HEPI but it was not accepted. HEPI did however publish a blog by one of its trustees, Mary Curnock Cook, on 14 August, the Monday before results day on Thursday.

Curnock Cook is the widely-respected former head of UCAS. She began:

In this blog, I want to provide some context and challenge to two erroneous statements that are made about exam grades:

  • That ‘one in four exam grades is wrong’
  • That grades are only reliable to ‘within one grade either way’

She asserted that the statement ‘one in four exam grades is wrong’ was a ‘gross misunderstanding’, but then said:

“In many subjects there will be several marks either side of the definitive mark that are equally legitimate. They reflect the reality that even the most expert and experienced examiners in a subject will not always agree on the precise number of marks that an essay or longer answer is worth. But those different marks are not ‘wrong’.”

In other words, as admitted by Ofqual, more than one grade could be a ‘legitimate’ assessment of the outcome for an individual. Huy Duong, another critic of the 2020 algorithm, had been widely quoted in the media in 2020 after he predicted the exact outcomes of the algorithm a week before the publication of results. He commented on Curnock Cook’s blog:

”… a lot of this is simply playing with words … whichever definitions of ‘wrong’ and ‘rights’ the establishment chooses to use, it is irrefutable that students are subjected to a grade lottery … If, as the author and the establishment contend, for a given script, both “Pass” and “Fail” are equally legitimate, then for the student’s certificate to state only either “Pass” or “Fail”, that certificate is stating a half truth.”

Curnock Cook then addressed the supposedly ‘erroneous’ statement that “grades are only reliable to ‘within one grade either way” – the statement made by Glenys Stacey as Chief Regulator – saying:

“Some commentators have chosen to weaponise this statement in a way that shows poor understanding of the concepts underpinning reliable and valid assessment and risks doing immense damage to students and to public confidence in our exam system.” 

How it is that Sherwood’s analysis shows ‘poor understanding’ is not explained. On the contrary, he seems to have a clear understanding of what Ofqual themselves have admitted. Curnock Cook said the claim about reliability had been taken out of context, but the context is not international tests of collective grading reliability, but the way universities and individual students actually use the grades.

Curnock Cook’s blog was welcomed by influential commentators like Jonathan Simons of Public First, a government favourite for research and PR, and some educationists such as Geoff Barton of the Association of School and College Leaders. She said that talking about unreliable grades “risks doing immense damage to students and to public confidence in our exam system”. Indeed it does, but the risk lies not in pointing out that the emperor has no clothes. The real risk is in not changing the system which remains unfair to so many individuals. The emperor still has no clothes, and it is time to redress things.

Most people who suffer injustice in grading do not even know it has happened. For individuals who do know, most will find that using official procedures to complain or appeal is expensive, and unlikely to change the outcome. In his campaign to illuminate the problem Mr Sherwood, like Mr Bates, met denial, opposition and the use of official processes to discourage the media from continuing to cover the story. People in the organisations concerned know how the system actually works, but they don’t want it to be widely known, for the sake of public confidence in the system. Groupthink puts collective inter-cohort ‘fairness’ ahead of fairness to every individual in every cohort. There was even, in 2020, blind faith in a computer system which was later proved to be faulty.

Public confidence in the qualifications and examinations system is of course absolutely vital. But the need for public confidence does not mean that individual unfairness on a large scale should be tolerated and ignored. There are several possible solutions to the problems of grading unreliability, and many would have little direct cost. HE institutions would have to take even greater care in using grades, as part of their wider assessment of the potential and abilities of candidates for their courses. That is a small price to pay for maintaining public confidence in a national system which everyone could be proud of for its fairness as well as its international standing.

This editorial draws on my article first published in The Oxford Magazine No 458, ‘Maintaining public confidence in an unfair system – the case of school examination grades’, and uses some parts of the text with permission.

Rob Cuthbert is Emeritus Professor of Higher Education Management, University of the West of England and Joint Managing Partner, Practical Academics rob.cuthbert@btinternet.com. Twitter @RobCuthbert