Ofqual | SRHE Blog

May 15, 2026
by SRHE News Blog Leave a comment

Complaining to the OIA

by GR Evans

The Office of the Independent Adjudicator (OIA) has stressed in its Annual Report that the system it operates is under strain. The expectation that universities would offer a route for students to make complaints became a requirement at the turn of the century as providers began to recognise the existence of a ‘student contract‘. That made the student a ‘consumer’ of the ‘higher education provider’. ‘Complaints procedures’ for students to use began to appear alongside ‘grievance procedures’ for employees. Scrutinising the performance of higher education providers in that task falls to the Office of the Independent Adjudicator (OIA).

The OIA was created as a company in 2003 and began work as a voluntary scheme. It was designated as operator of a student complaints scheme in 2005. Its current ‘members’ are various sector bodies including Universities UK and GuildHE. Its Board, headed by the actual Adjudicator, and it includes student representatives.

It first needed to show itself to be independent. The OIA faced criticism early on when a petition with 43 signatures, called for its abolition, complaining that it was a ‘biased, unreasonable, and non-impartial organisation. The petition called for:

Full evidence-based investigation into student complaints, fully independent of the University’s internal processes, and in accordance with existing educational and non-educational law,

and ‘a public enquiry into all decisions made against student complaints, by the OIAHE since its inception’, withnew rules:

to provide full legal aid cover for all students whose employment prospects are, or may have been, damaged as a result of their adverse experience with a public educational institution, and who remain unemployed as a result.

This was not followed through in those express terms. The stated objective of the process now followed by the OIA is to ‘put the student back in the position they would have been in if the problem hadn’t occurred’.

Meeting that demand presents difficulties in two respects. The relationships of students to their ‘higher education provider’ have changed. They are its ‘members’ in the case of Oxford and Cambridge but in other providers a governing body of between twelve and twenty-four constitute the ‘members’ under the Higher Education and Research Act 1992. Elsewhere they are likely to be, in effect, paying customers ‘buying’ a course. There is a contract and if the providers does not fulfil its part, the student may complain and seek redress in the form of repayment of fees.

A sense of student entitlement may arise from the sheer cost to a student. In England, tuition fees for the academic year 2026-7 will rise to £9,790 for standard full-time courses, £11,750 for full-time accelerated courses and £7,335 for part-time courses, for providers with a Teaching Excellence Framework award and an Access and Participation Plan. That will increase for the year 2027-8 to £10,050 for standard full-time courses, £12,060 for full-time accelerated courses and £7,530 for part-time courses. Costs for ‘maintenance’ and accommodation are additional.

The procedures to be followed in making a complaint have needed repeated updating. Key terms have had to be defined. For example, the Annual Report of Oxford’s Sexual Harassment and Violence Support Service reports ‘an increasing complexity of cases, and those requiring a longer duration of support’. Where there is a complaint it recognises the need for clarity as to whether a dispute is a ‘University’ or a ‘college’ matter, noting ‘a marked increase in college-based, student-to-student reports of reported incidents’. The University is therefore improving its provision for training to ensure that those with responsibilities for students are clear about what constitutes ‘consent’.

Nationally, is the system now simply overloaded? The OIA published its Annual Report in April, recording the scale of the rise in the number of complaints it receives. In 2008 the OIA received 900 complaints against an England and Wales enrolment denominator of 2,117,535 – a rate of 42.5 complaints per 100,000 students. In 2025 there were 4,234 complaints, an increase of 17 per cent from the previous year. The 4,234 complaints in 2025 ‘translate’, it says, ‘to roughly 165.8 per 100,000. in 2025’. In October 2025 alone there had been 516 complaints, recorded as the busiest single month in its history. In the face of this demand the OIA resolved 3,950 cases within six months and brought the average case handling time down to 81 days.

Stress-points are evident. Its Report notes that the complaints the OIA receives ‘prematurely’ are brought by students who ‘have begun the process but feel that they have waited too long for a decision’:

most of the complaints raised with us prematurely are brought by students who have begun the process but feel that they have waited too long for a decision. Delays are a symptom of a system under strain and may be one impact of the financial challenges facing providers.

Jim Dickinson’s blog for WonkHE on 26 April 2026 pointed to further evidence arguing that the fact that 42% of complainants now disclose a disability could mean a sector which is still structurally unable to accommodate them. So even if the growth in complaints may reflect an increasing sense of entitlement among students, the OIA suggests that the Adjudicator makes recommendations – or requires compensation to be made – that is ‘an indication that a student has not received the service they expect at a time when fees and cost of living pressures are increasing’.

The continuing multiplication of ‘alternative providers’ seems likely to lead to more complaining. They may admit unqualified students and be imperfectly regulated. The OIA publishes a list of ‘case summaries’ on providers where problems have emerged. The ‘worked example’ given in the OIA’s Report is that of Brit College, on which the OIA had already published concerns as of ‘public interest’ in November 2025.

The OIA had made Recommendations and had reported the College’s refusal to comply with its Recommendations to its Board in September 2025 and shared information about the complaint with the Office for Students (OfS), Department for Education (DfE) and Ofqual. None of this led to reform. Companies House reports that Brit College Ltd is subject to Receiver Action, with its accounts and confirmation statement overdue and apparently heading for liquidation.

There seems, then, to be a question as to the effectiveness of the OIA not in terms of its work but in terms of its powers, where a provider of higher education falls beyond the reach of a complaints procedure.

SRHE member GR Evans is Emeritus Professor of Medieval Theology and Intellectual History in the University of Cambridge.

December 17, 2025
by SRHE News Blog Leave a comment

Walk on by: the dilemma of the blind eye

by Dennis Sherwood

Forty years on…

I don’t remember much about my experiences at work some forty-odd years ago, but one event I recall vividly is the discussion provoked by a case study at a training event. The case was simple, just a few lines:

Sam was working late one evening, and happened to walk past Pat’s office. The door was closed, but Sam could hear Pat being very abusive to Alex. Some ten minutes later, Sam saw Alex sobbing.

What might Sam do?

What should Sam do?

Quite a few in the group said “nothing”, on the grounds that whatever was going on was none of Sam’s business. Maybe Pat had good grounds to be angry with Alex and if the local culture was, let’s say, harsh, what’s the problem? Nor was there any evidence that Alex’s sobbing was connected with Pat – perhaps something else had happened in the intervening time.

Others thought that the least could Sam do was to ask if Alex was OK, and offer some comfort – a suggestion countered by the “it’s a tough world” brigade.

The central theme of the conversation was then all about culture. Suppose the culture was supportive and caring. Pat’s behaviour would be out of order, even if Pat was angry, and even if Alex had done something Pat had regarded as wrong.

So what might – and indeed should – Sam do?

Should Sam should confront Pat? Or inform Pat’s boss?

What if Sam is Pat’s boss? In that case then, yes, Sam should confront Pat: failure to do so would condone bad behaviour, which in this culture, would be a ‘bad thing’.

But if Sam is not Pat’s boss, things are much more tricky. If Sam is subordinate to Pat, confrontation is hardly possible. And informing Pat’s boss could be interpreted as snitching or trouble-making. Another possibility is that Sam and Pat are peers, giving Sam ‘the right’ to confront Pat – but only if peer-to-peer honesty and mutual pressure is ‘allowed’. Which it might not be, for many, even benign, cultures are in reality networks of mutual ‘non-aggression treaties’, in which ‘peers’ are monarchs in their own realms – so Sam might deliberately choose to turn a blind eye to whatever Pat might be doing, for fear of setting a precedent that would allow Pat, or indeed Ali or Chris, to poke their noses into Sam’s own domain.

And if Sam is in a different part of the organisation – or indeed from another organisation altogether – then maybe Sam’s safest action is back where we started. To do nothing. To walk on by.

Sam is a witness to Pat’s bad behaviour. Does the choice to ‘walk on by’ make Sam complicit too, albeit at arm’s length?

I’ve always thought that this case study, and its implications, are powerful – which is probably why I’ve remembered it over so long a time.

The truth about GCSE, AS and A level grades in England

I mention it here because it is relevant to the main theme of this blog – a theme that, if you read it, makes you a witness too. Not, of course, to ‘Pat’s’ bad behaviour, but to another circumstance which, in my opinion, is a great injustice doing harm to many people – an injustice that ‘Pat’ has got away with for many years now, not only because ‘Pat’s peers’ have turned a blind eye – and a deaf ear too – but also because all others who have known about it have chosen to ‘walk on by’.

The injustice of which I speak is the fact that about one GCSE, AS and A level grade in every four, as awarded in England, is wrong, and has been wrong for years. Not only that: in addition, the rules for appeals do not allow these wrong grades to be discovered and corrected. So the wrong grades last for ever, as does the damage they do.

To make that real, in August 2025, some 6.5 million grades were awarded, of which around 1.6 million were wrong, with no appeal. That’s an average of about one wrong grade ‘awarded’ to every candidate in the land.

Perhaps you already knew all that. But if you didn’t, you do now. As a consequence, like Sam in that case study, you are a witness to wrong-doing.

It’s important, of course, that you trust the evidence. The prime source is Ofqual’s November 2018 report, Marking Consistency Metrics – An update, which presents the results of an extensive research project in which very large numbers of GCSE, AS and A level scripts were in essence marked twice – once by an ‘assistant’ examiner (as happens in ‘ordinary’ marking each year), and again by a subject senior examiner, whose academic judgement is the ultimate authority, and whose mark, and hence grade, is deemed ‘definitive’, the arbiter of ‘right’.

Each script therefore had two marks and two grades, enabling those grades to be compared. If they were the same, then the ‘assistant’ examiner’s grade – the grade that is on the candidate’s certificate – corresponds to the senior examiner’s ‘definitive’ grade, and is therefore ‘right’; if the two grades are different, then the assistant examiner’s grade is necessarily ‘non-definitive’, or, in plain English, wrong.

You might have thought that the number of ‘non-definitive’/wrong grades would be small and randomly distributed across subjects. In fact, the key results are shown on page 21 of Ofqual’s report as Figure 12, reproduced here:

Figure 1: Reproduction of Ofqual’s evidence concerning the reliability of school exam grades

To interpret this chart, I refer to this extract from the report’s Executive Summary:

The probability of receiving the ‘definitive’ qualification grade varies by qualification and subject, from 0.96 (a mathematics qualification) to 0.52 (an English language and literature qualification).

This states that 96% of Maths grades (all varieties, at all levels), as awarded, are ‘definitive’/right, as are 52% of those for Combined English Language and Literature (a subject available only at A level). Accordingly, by implication, 4% of Maths grades, and 48% of English Language and Literature grades, are ‘non-definitive’/wrong. Maths grades, as awarded, can therefore be regarded as 96% reliable; English Language and Literature grades as 52% reliable.

Scrutiny of the chart will show that the heavy black line in the upper blue box for Maths maps onto about 0.96 on the horizontal axis; the equivalent line for English Language and Literature maps onto 0.56. The measures of the reliability of the grades for each of the other subjects are designated similarly. Ofqual’s report does not give any further numbers, but Table 1 shows my estimates from Ofqual’s Figure 12:

	Probability of
	‘Definitive’ grade	‘Non-definitive’ grade
Maths (all varieties)	96%	4%
Chemistry	92%	8%
Physics	88%	12%
Biology	85%	15%
Psychology	78%	22%
Economics	74%	26%
Religious Studies	66%	34%
Business Studies	66%	34%
Geography	65%	35%
Sociology	63%	37%
English Language	61%	39%
English Literature	58%	42%
History	56%	44%
Combined English Language and Literature (A level only)	52%	48%

Table 1: My estimates of the reliability of school exam grades, as inferred from measurements of Ofqual’s Figure 12.

Ofqual’s report does not present any corresponding information for each of GCSE, AS or A level separately, nor any analysis by exam board. Also absent is a measure of the all-subject overall average. Given, however, the maximum value of 96%, and the minimum of 52%, the average is likely to be somewhere in the middle, say, in the seventies; in fact, if each subject is weighted by its cohort, the resulting average over the 14 subjects shown is about 74%. Furthermore, if other subjects – such as French, Spanish, Computing, Art… – are taken into consideration, the overall average is most unlikely to be greater than 82% or less than 66%, suggesting that an overall average reliability of 75% for all subjects is a reasonable estimate.

That’s the evidence that, across all subjects and levels, about 75% of grades, as awarded, are ‘definitive’/right and 25% – one in four – are ‘non-definitive’/wrong – evidence that has been in the public domain since 2018. But evidence that has been much disputed by those with vested interests.

Ofqual’s results are readily explained. We all know that different examiners can, legitimately, give the same answer (slightly) different marks. As a result, the script’s total mark might lie on different sides of a grade boundary, depending on who did the marking. Only one grade, however, is ‘definitive’.

Importantly, there are no errors in the marking studied by Ofqual – in fact, Ofqual’s report mentions ‘marking error’ just once, and then in a rather different context. All the grading discrepancies measured in Ofqual’s research are therefore attributable solely to legitimate differences in academic opinion. And since the range of legitimate marks is far narrower in subjects such as Maths and Physics, as compared to English Literature and History, then the probability that an ‘assistant’ examiner’s legitimate mark might result in a ‘non-definitive’ grade will be much higher for, say, History as compared to Physics. Hence the sequence of subjects in Ofqual’s Figure 12.

As regards appeals, in 2016, Ofqual – in full knowledge of the results of this research (see paragraph 28 of this Ofqual Board Paper, dated 18 November 2015) – changed the rules, requiring that a grade can be changed only if a ‘review of marking’ discovers a ‘marking error’. To quote an Ofqual ‘news item’ of 26 May 2016:

Exam boards must tell examiners who review results that they should not change marks unless there is a clear marking error. …It is not fair to allow some students to have a second bite of the cherry by giving them a higher mark on review, when the first mark was perfectly appropriate. This undermines the hard work and professionalism of markers, most of whom are teachers themselves. These changes will mean a level-playing field for all students and help to improve public confidence in the marking system.

This assumes that the legitimate marks given by different examiners are all equally “appropriate”, and identical in every way.

This assumption. however, is false: if one of those marks corresponds to the ‘definitive’ grade, and another to a ‘non-definitive’ grade, they are not identical at all. Furthermore, as already mentioned, there is hardly any mention of marking errors in Ofqual’s November 2018 report. All the grade discrepancies they identified can therefore only be attributable to legitimate differences in academic opinion, and so cannot be discovered and corrected by the rules that have been in place since 2016.

Over to you…

So, back to that case study.

Having read this far, like Sam, you have knowledge of wrong-doing – not Pat tearing a strip off Alex, but Ofqual awarding some 1.5 million wrong grades every year. All with no right of appeal.

What are you going to do?

You’re probably thinking something like, “Nothing”, “It’s not my job”, “It’s not my problem”, “I’m in no position to do anything, even if I wanted to”.

All of which I understand. No, it’s certainly not your job. And it’s not your problem directly, in that it’s not you being awarded the wrong grade. But it might be your problem indirectly – if you are involved with admissions, and if grades play a material role, you may be accepting a student who is not fully qualified (in that the grade on the certificate might be too high), or – perhaps worse – rejecting a student who is (in that the grade on the certificate is too low). Just to make that last point real, about one candidate in every six with a certificate showing AAA for A level Physics, Chemistry and Biology in fact truly merited at least one B. If such a candidate took a place at Med School, for example, not only is that candidate under-qualified, but a place has also been denied to a candidate with a certificate showing AAB but who merited AAA.

And although you, as an individual, are indeed not is a position to do anything about it, you, collectively, surely are.

HE is, by far, the largest and most important user of A levels. And relying on a ‘product’ that is only about 75% reliable. HE, collectively, could put significant pressure on Ofqual to fix this, if only by printing “OFQUAL WARNING: THE GRADES ON THIS CERTIFICATE ARE ONLY RELIABLE, AT BEST, TO ONE GRADE EITHER WAY” on every certificate – not my statement, but one made by Ofqual’s then Chief Regulator, Dame Glenys Stacey, in evidence to the 2 September 2020 hearing of the Education Select Committee, and in essence equivalent to the fact that about one grade in four is wrong. That would ensure that everyone is aware of the fact that any decision, based on a grade as shown on a certificate, is intrinsically unsafe.

But this – or some other solution – can happen only if your institution, along with others, were to act accordingly. And that can happen only if you, and your colleagues, band together to influence your department, your faculty, your institution.

Yes, that is a bother. Yes, you do have other urgent things to do.

If you do nothing, nothing will happen.

But if you take action, you can make a difference.

Don’t just walk on by.

Dennis Sherwood is a management consultant with a particular interest in organisational cultures, creativity and systems thinking. Over the last several years, Dennis has also been an active campaigner for the delivery of reliable GCSE, AS and A level grades. If you enjoyed this, you might also like https://srheblog.com/tag/sherwood/.

January 22, 2024
by SRHE News Blog 6 Comments

Mr Sherwood v The Office of Qualifications and Examinations Regulation[1]

[1] The ITV programme ‘Mr Bates v The Post Office’ was shown on British TV during the first week of January 2024 and has generated in the UK a media firestorm and a swift government response. Those, probably mostly outside the UK, who are unfamiliar with the story might like to read this explainer from Private Eyebefore reading this editorial. Or just Google it.

by Rob Cuthbert

Mr Sherwood, you’re the only one who’s been reporting these problems …

We have complete confidence that our system is robust.

This is a story of injustice on a massive scale, over a long period. The story of someone affronted by the unfairness who refused to give up, even though the authorities lined up to oppose him and try to make him go away. A story which has not yet attracted the attention it seems to deserve, given the way it affects the lives of tens of thousands of people who put their faith in a flawed system.

Every year a new group of tens of thousands of people are subject to the same repeated injustice. Most of them have no idea that they might have been unfairly treated. If they try to use official procedures for complaint and recompense most of them will fail. The authorities’ repeated mantra is that the system is ‘the best and fairest way’.

It could be, but it isn’t. And one person’s attempts to make things better have been met with denial, opposition, obfuscation, and the use of official processes to discourage media attention, by a public agency which is “independent of government”.

The Office of Qualifications and Examinations Regulation (Ofqual) is charged with regulating and maintaining standards and confidence in GCSEs, A levels, AS levels, and vocational and technical qualifications. Ten years ago Ofqual were aware of some potential problems in grading. To determine the extent of the problem, they took entire cohorts of GCSE, AS and A Level scripts and re-marked them, comparing the marks given by an ordinary examiner to comparable re-marks given by a senior examiner. Eventually this led to two careful and scholarly reports: Marking Consistency Metrics in 2016 and Marking Consistency Metrics – An Update in 2018.

The reports showed varying reliability in the grades awarded by examiners, compared with the ‘true’ or ‘definitive’ grade awarded by a senior examiner. Dennis Sherwood, an independent analyst and consultant, interpreted Ofqual’s measurements of grade reliability as a consequence of what he termed ‘fuzziness’. Fuzziness is the range around a senior examiner’s ‘definitive’ mark that contains the ‘legitimate’ marks given by an ordinary examiner. The 2018 report found that grades for, say, English and History are much less reliable than those for Maths and Physics. In Sherwood’s terms, the ‘fuzziness’ of the marks associated with English and History is greater than for Maths and Physics.

Problems arise when a marking range straddles a grade boundary. For example, if a script is legitimately marked in a range from 38-42, but a grade boundary is set at 40, then more than one grade could result from that one script, depending on who marks it and how. Ofqual have admitted that this is the case:

“…more than one grade could well be a legitimate reflection of a student’s performance and they would both be a sound estimate of that student’s ability at that point in time based on the available evidence from the assessment they have undertaken.” (Ofqual, 2019).

The 2016 report says: “… the wider the grade boundary locations, the greater the probability of candidates receiving the definitive grade.” GCSEs have nine grades plus unclassified, and A-levels have six plus unclassified, meaning grade widths are inevitably narrower than, for example, university degree classifications with just four plus fail. With comparatively narrow grade widths more candidates will be close to a boundary. In other words, and however good the marking is, grading for many candidates will not always give a ’true’ or ‘definitive’ grade.

This situation is admitted by Ofqual and has been known for more than five years, since the 2018 Report. Dr Michelle Meadows, formerly Ofqual’s Executive Director for Strategy, Risk and Research, said in evidence to the House of Lords Education for 11-16 year olds Committee (2023) on 30 March 2023:

“It’s really important that people don’t put too much weight on any individual grade. … I know, unfortunately, that a lot of weight is placed on particular GCSEs for progression, maths and English being the obvious ones. In maths that is less problematic because the assessment in maths is generally highly reliable. In English that is problematic. This is not a failure of our GCSE system. This is the reality of assessment. It is the same around the world. There is no easy fix, I am afraid. It is how we use the grades that needs to change rather than creating a system of lengthy assessments.” (emphasis added).

Dame Glenys Stacey, Ofqual’s Chief Regulator until 2016, was reappointed as Acting Chief Regulator after the departure of Sally Collier in the aftermath of the 2020 results, and she said in 2020 (House of Commons Education Committee, 2020a: Q1059):

“It is interesting how much faith we put in examination and the grade that comes out of that. We know from research, as I think Michelle mentioned, that we have faith in them, but they are reliable to one grade either way.” (emphasis added)

According to Ofqual’s own research, we have a national system of grading that is only 95% reliable – and then only if you accept that grades are reliable within plus or minus a grade. The problem is that most people use grades more precisely than that. If you don’t get a grade 4 or above in GCSE English or Mathematics, you may be allowed to progress to educational routes post-16, but you must take a resit alongside your next phase of study, and will not be allowed to continue if your resit grade is still 3 or below. If you miss out by just one grade at A-level, your chosen university may reject you. Although marking meets the best international standards, grading still contains much individual unfairness. That means many students may miss out on their preferred university, be forced to wait a year to try again, or decide not to enter higher education at all.

We know this mainly because of the efforts of Dennis Sherwood, who started writing about problems with grading five years ago. Sherwood’s analyses attracted media attention but often his findings were rejected by Ofqual, for example in Camilla Turner’s Daily Telegraph report of 25 August 2018, when an Ofqual spokesman was quoted as saying: ‘Mr Sherwood’s research is “entirely without merit” and has drawn “incorrect conclusions”’ (Turner, 2018).

Ofqual tried to shut down Sherwood’s commentaries, and complained to the Independent Press Standards Organisation (IPSO) about a Sunday Times article headlined ‘Revealed – A-level results are 48% wrong’ published on 11 August 2019. IPSO’s finding upheld the complaint, but only on the narrow grounds that the newspaper had not made it sufficiently clear that the use of the word ‘wrong’ was the newspaper’s, and not Ofqual’s, characterisation of the research. However the IPSO ruling said:

“It was not significantly misleading to report that 48% of grades could be “wrong”, in circumstances where the research indicated that, in 48% of cases, a senior examiner could have awarded a different grade to that awarded by the examiner who had marked the paper. The complainant had accepted that different grades could be awarded as a result of inconsistencies in marking, but disagreed with the characterisation of the research which had been adopted by the publication.”

Sherwood’s argument has never been refuted. Ofqual, with its statutory responsibility to maintain public confidence in qualifications, was trying to ignore or attack stories that ‘one grade in four is wrong’. That tactic might have succeeded, were it not for Covid. The story of the infamous examinations algorithm, ultimately abandoned, need not be repeated here. However it showed, first, that few parents and indeed teachers understood how the grading system worked. Secondly, Ofqual’s defence of the flawed 2020 algorithm was so focused on the collective unfairness of grade inflation between one year and the next that they failed to recognise that their ‘solution’ moved grading from a national competition to an intensely local one. That made individual unfairnesses very visible, there was a public outcry and the algorithm was abandoned. Individual unfairness in grading persists – but has reverted to its former obscurity.

Dennis Sherwood accordingly wrote a book, Missing the Mark, which I reviewed for HEPI, setting out his arguments in detail. It seemed to be persuading more in the educational media to give his arguments the space they deserved. He was no longer entirely alone, with a small group (including me) finding his arguments convincing. Support from various media, notably the HEPI blog, gave him space to make his argument. However, as in the case of Mr Bates and the Post Office, there were still just a few individuals ranged against the forces of Ofqual and (some of) the educational establishment.

On 8 June 2023 I wrote ‘If A-level grades are unreliable, what should admission officers do? for HEPI, arguing that universities should recognise the limited reliability of A-level grades by giving candidates the benefit of the doubt, uplifting all achieved results by one grade. That blog was perhaps provocative but it did at least recognise the problem and suggest a short-term fix. My 2020 explanation about the algorithm had become the most-read HEPI blog ever, and I was invited, as I had been every year since 2020, to contribute a further blog to HEPI, to be published near to A-level results day. My follow-up to the June blog advised students and parents how to respond if they had fallen short of an offer they had accepted. I submitted it to HEPI but it was not accepted. HEPI did however publish a blog by one of its trustees, Mary Curnock Cook, on 14 August, the Monday before results day on Thursday.

Curnock Cook is the widely-respected former head of UCAS. She began:

“In this blog, I want to provide some context and challenge to two erroneous statements that are made about exam grades:

That ‘one in four exam grades is wrong’
That grades are only reliable to ‘within one grade either way’”

She asserted that the statement ‘one in four exam grades is wrong’ was a ‘gross misunderstanding’, but then said:

“In many subjects there will be several marks either side of the definitive mark that are equally legitimate. They reflect the reality that even the most expert and experienced examiners in a subject will not always agree on the precise number of marks that an essay or longer answer is worth. But those different marks are not ‘wrong’.”

In other words, as admitted by Ofqual, more than one grade could be a ‘legitimate’ assessment of the outcome for an individual. Huy Duong, another critic of the 2020 algorithm, had been widely quoted in the media in 2020 after he predicted the exact outcomes of the algorithm a week before the publication of results. He commented on Curnock Cook’s blog:

”… a lot of this is simply playing with words … whichever definitions of ‘wrong’ and ‘rights’ the establishment chooses to use, it is irrefutable that students are subjected to a grade lottery … If, as the author and the establishment contend, for a given script, both “Pass” and “Fail” are equally legitimate, then for the student’s certificate to state only either “Pass” or “Fail”, that certificate is stating a half truth.”

Curnock Cook then addressed the supposedly ‘erroneous’ statement that “grades are only reliable to ‘within one grade either way” – the statement made by Glenys Stacey as Chief Regulator – saying:

“Some commentators have chosen to weaponise this statement in a way that shows poor understanding of the concepts underpinning reliable and valid assessment and risks doing immense damage to students and to public confidence in our exam system.”

How it is that Sherwood’s analysis shows ‘poor understanding’ is not explained. On the contrary, he seems to have a clear understanding of what Ofqual themselves have admitted. Curnock Cook said the claim about reliability had been taken out of context, but the context is not international tests of collective grading reliability, but the way universities and individual students actually use the grades.

Curnock Cook’s blog was welcomed by influential commentators like Jonathan Simons of Public First, a government favourite for research and PR, and some educationists such as Geoff Barton of the Association of School and College Leaders. She said that talking about unreliable grades “risks doing immense damage to students and to public confidence in our exam system”. Indeed it does, but the risk lies not in pointing out that the emperor has no clothes. The real risk is in not changing the system which remains unfair to so many individuals. The emperor still has no clothes, and it is time to redress things.

Most people who suffer injustice in grading do not even know it has happened. For individuals who do know, most will find that using official procedures to complain or appeal is expensive, and unlikely to change the outcome. In his campaign to illuminate the problem Mr Sherwood, like Mr Bates, met denial, opposition and the use of official processes to discourage the media from continuing to cover the story. People in the organisations concerned know how the system actually works, but they don’t want it to be widely known, for the sake of public confidence in the system. Groupthink puts collective inter-cohort ‘fairness’ ahead of fairness to every individual in every cohort. There was even, in 2020, blind faith in a computer system which was later proved to be faulty.

Public confidence in the qualifications and examinations system is of course absolutely vital. But the need for public confidence does not mean that individual unfairness on a large scale should be tolerated and ignored. There are several possible solutions to the problems of grading unreliability, and many would have little direct cost. HE institutions would have to take even greater care in using grades, as part of their wider assessment of the potential and abilities of candidates for their courses. That is a small price to pay for maintaining public confidence in a national system which everyone could be proud of for its fairness as well as its international standing.

This editorial draws on my article first published in The Oxford Magazine No 458, ‘Maintaining public confidence in an unfair system – the case of school examination grades’, and uses some parts of the text with permission.

Rob Cuthbert is Emeritus Professor of Higher Education Management, University of the West of England and Joint Managing Partner, Practical Academics rob.cuthbert@btinternet.com. Twitter @RobCuthbert

March 9, 2021
by SRHE News Blog 1 Comment

Some different lessons to learn from the 2020 exams fiasco

by Rob Cuthbert

The problems with the algorithm used for school examinations in 2020 have been exhaustively analysed, before, during and after the event. The Royal Statistical Society (RSS) called for a review, after its warnings and offers of help in 2020 had been ignored or dismissed. Now the Office for Statistics Regulation (OSR) has produced a detailed review of the problems, Learning lessons from the approach to developing models for awarding grades in the UK in 2020. But the OSR report only tells part of the story; there are larger lessons to learn.

The OSR report properly addresses its limited terms of reference in a diplomatic and restrained way. It is far from an absolution – even in its own terms it is at times politely damning – but in any case it is not a comprehensive review of the lessons which should be learned, it is a review of the lessons for statisticians to learn about how other people use statistics. Statistical models are tools, not substitutes for competent management, administration and governance. The report makes many valid points about how the statistical tools were used, and how their use could have been improved, but the key issue is the meta-perspective in which no-one was addressing the big picture sufficiently. An obsession with consistency of ‘standards’ obscured the need to consider the wider human and political implications of the approach. In particular, it is bewildering that no-one in the hierarchy of control was paying sufficient attention to two key differences. First, national ‘standardisation’ or moderation had been replaced by a system which pitted individual students against their classmates, subject by subject and school by school. Second, 2020 students were condemned to live within the bounds not of the nation’s, but their school’s, historical achievements. The problem was not statistical nor anything to do with the algorithm, the problem was with the way the problem itself had been framed – as many commentators pointed out from an early stage. The OSR report (at 3.4.1.1) said:

“In our view there was strong collaboration between the qualification regulators and ministers at the start of the process. It is less clear to us whether there was sufficient engagement with the policy officials to ensure that they fully understood the limitations, impacts, risks and potential unintended consequences of the use of the models prior to results being published. In addition, we believe that, the qualification regulators could have made greater use of opportunities for independent challenge to the overall approach to ensure it met the need and this may have helped secure public confidence.”

To put it another way: the initial announcement by the Secretary of State was reasonable and welcome. When Ofqual proposed that ranking students and tying each school’s results to its past record was the only way to do what the SoS wanted, no-one in authority was willing either to change the approach, or to make the implications sufficiently transparent for the public to lose confidence at the start, in time for government and Ofqual to change their approach.

The OSR report repeatedly emphasises that the key problem was a lack of public confidence, concluding that:

“… the fact that the differing approaches led to the same overall outcome in the four countries implies to us that there were inherent challenges in the task; and these 5 challenges meant that it would have been very difficult to deliver exam grades in a way that commanded complete public confidence in the summer of 2020 …”

“Very difficult”, but, as Select Committee chair Robert Halfon said in November 2020, things could have been much better:

“the “fallout and unfairness” from the cancellation of exams will “have an ongoing impact on the lives of thousands of families”. … But such harm could have been avoided had Ofqual not buried its head in the sand and ignored repeated warnings, including from our Committee, about the flaws in the system for awarding grades.”

As the 2021 assessment cycle comes closer, attention has shifted to this year’s approach to grading, when once again exams will not feature except as a partial and optional extra. When the interim Head of Ofqual, Dame Glynis Stacey, appeared before the Education Select Committee, Schools Week drew some lessons which remain pertinent, but there is more to say. An analysis of 2021 by George Constantinides, a professor of digital computation at Imperial College whose 2020 observations were forensically accurate, has been widely circulated and equally widely endorsed. He concluded in his 26 February 2021 blog that:

“the initial proposals were complex and ill-defined … The announcements this week from the Secretary of State and Ofqual have not helped allay my fears. … Overall, I am concerned that the proposed process is complex and ill-defined. There is scope to produce considerable workload for the education sector while still delivering a lack of comparability between centres/schools.”

The DfE statement on 25 February kicks most of the trickiest problems down the road, and into the hands of examination boards, schools and teachers:

“Exam boards will publish requirements for schools’ and colleges’ quality assurance processes. … The head teacher or principal will submit a declaration to the exam board confirming they have met the requirements for quality assurance. … exam boards will decide whether the grades determined by the centre following quality assurance are a reasonable exercise of academic judgement of the students’ demonstrated performance. …”

Remember in this context that Ofqual acknowledges “it is possible for two examiners to give different but appropriate marks to the same answer”. Independent analyst Dennis Sherwood and others have argued for alternative approaches which would be more reliable, but there is no sign of change.

Two scenarios suggest themselves. In one, where this year’s results are indeed pegged to the history of previous years, school by school, we face the prospect of overwhelming numbers of student appeals, almost all of which will fail, leading no doubt to another failure of public confidence in the system. The OSR report (3.4.2.3) notes that:

“Ofqual told us that allowing appeals on the basis of the standardisation model would have been inconsistent with government policy which directed them to “develop such an appeal process, focused on whether the process used the right data and was correctly applied”.

Government policy for 2021 seems not to be significantly different:

“Exam boards will not re-mark the student’s evidence or give an alternative grade. Grades would only be changed by the board if they are not satisfied with the outcome of an investigation or malpractice is found. … If the exam board finds the grade is not reasonable, they will determine the alternative grade and inform the centre. … Appeals are not likely to lead to adjustments in grades where the original grade is a reasonable exercise of academic judgement supported by the evidence. Grades can go up or down as the result of an appeal.” (emphasis added)

There is one crucial exception: in 2021 every individual student can appeal. Government no doubt hopes that this year the blame will all be heaped on teachers, schools and exam boards.

The second scenario seems more likely and is already widely expected, with grade inflation outstripping the 2020 outcome. There will be a check, says DfE, “if a school or college’s results are out of line with expectations based on past performance”, but it seems doubtful whether that will be enough to hold the line. The 2021 approach was only published long after schools had supplied predicted A-level grades to UCAS for university admission. Until now there has been a stable relationship between predicted grades and examination outcomes, as Mark Corver and others have shown. Predictions exceed actual grades awarded by consistent margins; this year it will be tempting for schools simply to replicate their predictions in the grades they award. Indeed, it might be difficult for schools not to do so, without leaving their assessments subject to appeal. In the circumstances, the comments of interim Ofqual chief Simon Lebus that he does not expect “huge amounts” of grade inflation seem optimistic. But it might be prejudicial to call this ‘grade inflation’, with its pejorative overtones. Perhaps it would be better to regard predicted grades as indicators of what each student could be expected to achieve at something close to their best – which is in effect what UCAS asks for – rather than when participating in a flawed exam process. Universities are taking a pragmatic view of possible intake numbers for 2021 entry, with Cambridge having already introduced a clause seeking to deny some qualified applicants entry in 2021 if demand exceeds the number of places available.

The OSR report says that Ofqual and the DfE:

“… should have placed greater weight on explaining the limitations of the approach. … In our view, the qualification regulators had due regard for the level of quality that would be required. However, the public acceptability of large changes from centre assessed grades was not tested, and there were no quality criteria around the scale of these changes being different in different groups.” (3.3.3.1)

The lesson needs to be applied this year, but there is more to say. It is surprising that there was apparently such widespread lack of knowledge among teachers about the grading method in 2020 when there is a strong professional obligation to pay attention to assessment methods and how they work in practice. Warnings were sounded, but these rarely broke through to dominate teachers’ understanding, despite the best efforts of education journalists such as Laura McInerney, and teachers were deliberately excluded from discussions about the development of the algorithm-based method. The OSR report (3.4.2.2) said:

“… there were clear constraints in the grade awarding scenario around involvement of service delivery staff in quality assurance, or making the decisions based on results from a model. … However, we consider that involvement of staff from centres may have improved public confidence in the outputs.”

There were of course dire warnings in 2020 to parents, teachers and schools about the perils of even discussing the method, which undoubtedly inhibited debate, but even before then exam processes were not well understood:

“… notwithstanding the very extensive work to raise awareness, there is general limited understanding amongst students and parents about the sources of variability in examination grades in a normal year and the processes used to reduce them.” (3.2.2.2)

My HEPI blog just before A-level results day was aimed at students and parents, but it was read by many thousands of teachers, and anecdotal evidence from the many comments I received suggests it was seen by many teachers as a significant reinterpretation of the process they had been working on. One teacher said to Huy Duong, who had become a prominent commentator on the 2020 process: “I didn’t believe the stuff you were sending us, I thought it [the algorithm] was going to work”.

Nevertheless the mechanics of the algorithm were well understood by many school leaders. FFT Education Datalab was analysing likely outcomes as early as June 2020, and reported that many hundreds of schools had engaged them to assess their provisional grade submissions, some returning with a revised set of proposed grades for further analysis. Schools were seduced, or reduced, to trying to game the system, feeling they could not change the terrifying and ultimately ridiculous prospect of putting all their many large cohorts of students in strict rank order, subject by subject. Ofqual were victims of groupthink; too many people who should have known better simply let the fiasco unfold. Politicians and Ofqual were obsessed with preventing grade inflation, but – as was widely argued, long in advance – public confidence depended on broader concerns about the integrity and fairness of the outcomes.

In 2021 we run the same risk of loss of public confidence. If that transpires, the government is positioned to blame teacher assessments and probably reinforce a return to examinations in their previous form, despite their known shortcomings. The consequences of two anomalous years of grading in 2020 and 2021 are still to unfold, but there is an opportunity, if not an obligation, for teachers and schools to develop an alternative narrative.

At GCSE level, schools and colleges might learn from emergency adjustments to their post-16 decisions that there could be better ways to decide on progression beyond GCSE. For A-level/BTEC/IB decisions, schools should no longer be forced to apologise for ‘overpredicting’ A-level grades, which might even become a fairer and more reliable guide to true potential for all students. Research evidence suggests that “Bright students from poorer backgrounds are more likely than their wealthier peers to be given predicted A-level grades lower than they actually achieve”. Such disadvantage might diminish or disappear if teacher assessments became the dominant public element of grading; at present too many students suffer the sometimes capricious outcomes of final examinations.

Teachers’ A-level predictions are already themselves moderated and signed off by school and college heads, in ways which must to some extent resemble the 2021 grading arrangements. There will be at least a two-year discontinuity in qualification levels, so universities might also learn new ways of dealing with what might become a permanently enhanced set of differently qualified applicants. In the longer term HE entrants might come to have different abilities and needs, because of their different formation at school. Less emphasis on preparation for examinations might even allow more scope for broader learning.

A different narrative could start with an alternative account of this year’s grades – not ‘standards are slipping’ or ‘this is a lost generation’, but ‘grades can now truly reflect the potential of our students, without the vagaries of flawed public examinations’. That might amount to a permanent reset of our expectations, and the expectations of our students. Not all countries rely on final examinations to assess eligibility to progress to the next stage of education or employment. By not wasting the current crisis we might even be able to develop a more socially just alternative which overcomes some of our besetting problems of socioeconomic and racial disadvantage.

Rob Cuthbert is an independent academic consultant, editor of SRHE News and Blog and emeritus professor of higher education management. He is a Fellow of the Academy of Social Sciences and of SRHE. His previous roles include deputy vice-chancellor at the University of the West of England, editor of Higher Education Review, Chair of the Society for Research into Higher Education, and government policy adviser and consultant in the UK/Europe, North America, Africa, and China.

October 22, 2020
by SRHE News Blog Leave a comment

Policymaking in a pandemic

By Rob Cuthbert

Policymaking in a pandemic must be decisive, transparent and inclusive (1)

After Secretary of State Gavin Williamson announced in March that there would be no GCSE or A-level examinations in Summer 2020, higher education focused at first on whether it would be desirable or even possible for students to begin the new year in Autumn 2020, with particular doubts over international students’ ability and willingness to travel. With the number of UK 18-year-olds in a demographic trough we expected extreme pressure on universities at the exposed end of the market, and there was much talk about the ten or 12 or 14 institutions said to be already especially financially vulnerable. The response of a number of institutions was to make tens of thousands of conditional offers unconditional, reducing uncertainty for themselves and also for their potential students. But ‘conditional unconditional’ offers, even in the market decreed by the government, never seemed to respect the integrity of student choice; it seemed reasonable that they should be outlawed, but government and the OfS went much further.

OfS published its regulation on unconditional offers on 4 May (updated on 17 August 2020, after A-level results by algorithm were announced), enabling OfS to take “… action against higher education providers that use offer-making practices which would not be in the interests of students and the wider higher education sector in these exceptional circumstances.” These included: “Other unconditional offers to UK students that could materially affect the stability and integrity of the English higher education sector …”, which in theory might have threatened selective institutions aiming to hoover up home students to compensate for a possible shortfall of international students, regardless of the effects on universities less well-placed in the market. But after the government imposed temporary student number controls no-one was in much doubt that the target was precisely those less well-placed, in case students dared to choose them rather than those higher up the league tables. Government policy is that student choice is paramount, but only if students choose the institutions which the government think they should choose.

On 16 July the DfE announced a ‘restructuring regime’ in response to Covid19, a mixture of University Strategic Planning 101 and oddly selective messages about the specific requirements to be satisfied by the minority of universities expected to need ‘support’. The Secretary of State’s foreword said: “Public funding for courses that do not deliver for students will be reassessed. … all universities must, of course, demonstrate their commitment to academic freedom and free speech, as cornerstones of our liberal democracy. … The funding of student unions should be proportionate and focused on serving the needs of the wider student population rather than subsidising niche activism and campaigns. Vice-chancellor pay has for years faced widespread public criticism … equally concerning is the rapid growth over recent decades of spending on administration more broadly, which should be reversed.”

The announcement was much criticised but it receded from view as the threat of ‘restructuring’ diminished. Demand for HE with a 2020 start remained strong, with UCAS numbers higher than expected. The intentions of international students were still in doubt, but attention shifted to the slow-motion shambles of A-levels, and the hardly less shambolic, though less remarked, handling of International Baccalaureate and technical and vocational qualifications. Ofqual and DfE remained committed to their A-levels algorithm, doubling down on the assertion that it was the fairest way to determine grades in this unprecedented situation. This was despite the growing clamour of expert opinion pointing out the many faults and unfairnesses in the approach determined by Ofqual. The DfE/Ofqual response might have seemed resolutely decisive, but was neither transparent nor inclusive. A series of blogs from HEPI and many others provided more transparency than the government and Ofqual statements which had led most people to believe wrongly that ‘teachers are determining grades’ and ‘there is a robust appeal system’.

Scottish Higher assessments followed a similar approach to the English but were announced on 6 August, a week ahead of A-levels. Facing mass public protest, First Minister Nicola Sturgeon admitted on 10 August they had got it wrong; education minister John Swinney the next day announced they would abandon their algorithm and use only Centre-Assessed Grades (CAGs), a reaction which ticked the decisive/transparent/inclusive boxes, albeit after the last minute. The Scots decision sent the English DfE into panic mode. Gavin Williamson had repeatedly nailed his colours to the this-algorithm-is-robust-and-fair mast; he would not follow Scotland’s lead, and there was no sensible alternative. So he went for something that wasn’t sensible – the announcement late on Tuesday night (11 August, just 36 hours before students would get their grades) that students could use mock grades under certain circumstances instead of the algorithm’s grades. It was a decision made without consultation with anyone, so not at all inclusive, and certainly less than decisive, but at least it seemed transparent.

For thousands of students who had taken mocks, it sounded like blessed relief. Not only could they apparently now make an individual appeal (something previously ruled out), they knew it would succeed. But that was late Tuesday night. By Wednesday morning Ofqual, Schools Minister Nick Gibb and Universities Minister Michelle Donelan were doing their best to dilute and obscure the message, saying only that mocks might form part of the grounds for an appeal and even suggesting that not many appeals were expected. Schools and colleges, who had only that day received their students’ grades with shock and horror, pointed out the huge variability and complete lack of standardisation of mocks even within one school, let alone across the whole sector. Williamson stood firm on his ‘triple lock’ – mocks or algorithm grades or Autumn exams. It was presented as a solution for all, when it was nothing of the sort. He had announced that Ofqual (who had not been consulted in advance) would issue guidance on how the new appeals system would work; Ofqual understandably said they would need a few days to work out how to operationalise the process. They issued advice on the amended appeals process by early afternoon on Saturday, suggesting (correctly) that CAGs were a more reliable basis for judgment than mock exams. Then very late on Saturday evening Ofqual withdrew its advice, saying that the Ofqual board would review it and another statement would follow ‘in due course’. Speculation centred on the suspicion that it was the mention of CAGs that might have caused the Department for Education to tell Ofqual to change tack, mostly because of a report in The Sunday Telegraph by the well-briefed Camilla Turner. This was the position at midday on Sunday.

The next day (Monday 17 August) came the final climbdown, as Williamson confirmed that England would follow Scotland in using CAGs rather than the grades determined by the algorithm. Universities were left scrambling to cope with the U-turn, and many students were left wondering whether they still had the place they originally wanted, as many in-demand courses had naturally been filled as usual very soon on the day of the announcement of results, 13 August. Former NUS President and chair of BPP University Aaron Porter wrote for Schoolsweek on 18 August 2020 about the consequences of government ‘passing the buck’ to universities to sort out the A-levels fiasco, and Education Select Committee chair Robert Halfon called for the abolition of Ofqual.

Universities minister Michele Donelan wrote to universities on 20 August 2020 confirming the lifting of all student number controls and the establishment of a task force to oversee clearing and admissions for 2020. She said: “The interests of students were at the heart of the change in awarding results … we all agree that providers should: (1) Honour all offers accepted to date. (2) Honour all offers made and met through the new arrangements for both firm and insurance offers where students would like to take them, wherever this is possible.” That ‘wherever this is possible’ gave everyone a get-out clause, while doing its best to shift the blame away from government and onto the universities, but the blame game picked up speed. A VC’s diary in The Guardian on 21 August 2020 accused government ministers of incompetence and lack of compassion, and it was clear that universities could hardly be blamed for the A-levels mess. Ofqual’s attempts to shift the blame onto schools and colleges were equally unconvincing. It had emerged that the Royal Statistical Society had much earlier offered Ofqual the services of the redoubtable Guy Nason (Imperial) and the statistically legendary Sharon Witherspoon, but the RSS had declined to sign the non-disclosure agreement which Ofqual had proposed. Roger Taylor, chair of Ofqual, wrote to the RSS on 21 August 2020 saying “nothing to see here, you were being much too picky” (we paraphrase), but the next morning Stian Westlake of the RSS was on Radio 4 Today saying the NDA was far too broad and vague to be acceptable.

The first head rolled: Ofqual chief executive Sally Collier stepped down on 25 August with immediate effect; Collier’s predecessor Glenys Stacey was drafted as an interim replacement. Ofqual were summoned to an Education Select Committee hearing on 3 September, and Roger Taylor released a statement just hours before the hearing, memorably summed up by Committee chair Robert Halfon as saying “Not me, guv”. Taylor, it emerges, is also chair of the Centre for Data Ethics and Innovation, which advises the government on artificial intelligence – presumably not including what the Prime Minister called Ofqual’s ‘mutant algorithm’. Taylor made various promises to the Committee of transparency, of which some remain unfulfilled. It was reported that Taylor had kept his chair’s role because he threatened to publish all the correspondence between DfE and Ofqual, showing how much DfE had known all along about the algorithm and its effects.

Samantha Booth reported for SchoolsWeek on 21 August 2020 that Susan Acland-Smith, “has been appointed as second permanent secretary at the DfE for six weeks, temporarily leaving her role as chief executive of the HM Courts and Tribunals Service. The government said she will work “closely” with permanent secretary Jonathan Slater and “support” the department’s response to this year’s exam results.” Slater’s position was said to be under threat, and sure enough, Slater’s departure was confirmed on 26 August, with Acland-Smith becoming his permanent successor.

Taylor, against the odds, remains as Ofqual chair. In an unusual step, the respected Institute for Government Director Bronwen Maddox called for Secretary of State for Education Gavin Williamson to resign, in her 27 August 2020 blog. “The misjudgements in education have been some of the worst the government has made since the start of the pandemic. They were avoidable, given the time available to plan … they are serious in their impact on children’s education, the gap in achievement between social groups and the ability of the nation to get back to work. At the heart of these misjudgements are decisions that could only be made by politicians, not civil servants.” Senior Tory backbencher Bernard Jenkin said Williamson had “lost the trust of his officials to such an extent that he can no longer serve effectively in the cabinet”, according to a report by Toby Helm and Michael Savage in The Observer on 23 August 2020. My HEPI blog on 16 August 2020 about the A-levels debacle said: “for five months the Government and Ofqual have been too secretive, made bad choices, refused to listen to constructive criticism, tried to tough it out and then made the wrong concessions too late.” Not decisive, not transparent, not inclusive, and not how to make policy in a pandemic.

That was the view of Ramathi Bandaranayake and Merl Chandana (both at LIRNEasia, a regional digital policy think tank based in Colombo, Sri Lanka) on the LSE Impact Blog on 1 October 2020.

Rob Cuthbert is Emeritus Professor of Higher Education Management, University of the West of England and Joint Managing Partner, Practical Academics

	Fenella Watson on How many Black professors shou…
	Zarus Cenac on How many Black professors shou…
	Jennie Golding on How many Black professors shou…
	Rob Warwick on When papers become currency
	srhebloged on When papers become currency

SRHE Blog

The Society for Research into Higher Education

Tag Archives: Ofqual