REF analysis | SRHE Blog

by Ian McNay

Research Excellence Framework 2021

The irritations researchers experience when working with secondary data are exemplified in looking at the REF 2021 results and comparing with 2014. The 2021 results by Unit of Assessment (UoA) on screen are laid out with all four profiles in one line across the page. Four are fitted on to one page. When you try to print, or, at least when I do, they are laid out in a single column, so one UoA takes a full page. To add to that, the text preceding the tabulations takes just enough space to put the name of the HEI at the bottom of the page and the profiles on the next page. I know, I should have checked before pressing ‘print’. So they take 80+ pages, lots of paper, lots of ink, but I can’t work with screen based data. My bad, perhaps.

When I access the 2014 results the four profiles – overall, outputs, impact, environment – are listed on four separate documents, within which English HEIs are listed first, then Scotland, Wales and Northern Ireland. The 2021 listings take a unionist view, starting with Aberdeen rather than Anglia Ruskin. Clicking to get to UoA pages pops up a message saying ‘this page is not currently available’. I do find another route to access them.

I will first give the summary of results, set alongside those from 2014, against advice, but one role of the REF is to demonstrate more and better research. Encouraging that has never been set as an objective – the sole purpose for a long time was ‘to inform funding’ – but the constant improvement implied by the figures is the basis for getting more money out of the Treasury. One of the principles the funding bodies set way back was continuity, yet there has never been an exercise that has replicated its predecessor. This time, following the Stern Report, there were at least 12 major changes in requirements and processes. More are promised after the Future Research Assessment Programme (FRAP) consultation reports. One of those changes was to give greater recognition to inter-disciplinary research. The report of the Interdisciplinary Research Advisory Panel (IDAP) at the end of June claimed that treatment was more visible and equitable, but that much still needs to be done. Panels are still learning how to treat work beyond their boundaries and institutions are reluctant to submit work because of its treatment in getting lower grades for the disciplines that constitute its elements.

Procedural propriety

A coincidence of timing led to a disturbing voice in my head as I read the reports from Main Panel C, covering Social Sciences, and the Education panel. The Main Panel asserts that “throughout the assessment process Main Panel C and its sub-panels ensured adherence to published ‘Panel criteria and working methods’ and consistency in assessment standards through a variety of means [and so] has full confidence in the robustness of the processes followed and the outcomes of the assessment in all its sub-panels.” The mantra was repeated in different forms by the Education sub-panel: “Under the guidance and direction from the main panel and the REF team, the sub-panel adhered to the published REF 2021 ‘Panel criteria and working methods’ in all aspects of its processes throughout the planning and assessment phases.” “The protocol requiring sub-panel members [with declared conflicts of interest] to leave panel meeting discussions was strictly followed for all parts of the REF assessment.” “A transparent process on the reconciliation of grades and conversion of grades to the status of panel agreed grades was documented and signed off by panel members”. And so on again and again. The voice in my head? “Any gatherings that took place, did so observing the Covid protocols and regulations at all times. There were no breaches.” Work within Neyland et al (2019), based on interviews with 2014 panel members, suggests that all records were destroyed at the end of the processes and that reconciliation was used to ensure conformity to the dominant view of the elite power holders who define both what research is and what constitutes quality. The brief description of the moderation process in Education suggests that this may have been repeated. There were four members from modern universities on the Education panel, out of 20; and one out of 13 assessors. There were none on Main Panel C, just as there had been none on the Stern Committee, despite a commitment from HEFCE early in the last decade that diversity of membership would reflect institutional base.

Executive Chair of Research England David Sweeney was confident that universities had ‘behaved responsibly’ and also ‘played by the rules’ preventing importing of highly rated researchers from around the globe, and requiring all staff with significant responsibility for research to be submitted. (I should declare an interest: David claims his participation in a programme I ran resulted in his changing the course of his career and led him to HEFCE and now UKRI. I accept the responsibility, but not the blame.)

It is surprising, then, that one easily spotted deviation from the framework, not commented upon by the panels (despite a footnote on intent in the ‘Summary Report across the four main panels’) was on that requirement that ‘all staff with a significant responsibility for research’ should be submitted. I took that to be mandatory, and it led to many staff being moved to ‘teaching only’ contracts. Yet, in Education, only 42 UoAs, out of 83, met that criterion; eight being modern universities. 4 submitted more than 50%, a mix of Liverpool Hope, the OU, Ulster, and Leeds (at 95%). 25 fell between 25% and 49%, and 24 had 24% or below. All those in the last two groups are post-92 designations. Special mention for the University of the Highlands and Islands with … 605%. There were other overshoots: in History, Cambridge submitted 170%, Oxford 120%, perhaps linked to college staff not based in a department. UHI submitted 110%, but that was only 7.3 people.

The commitment to equity was also not met according to the Equality, Diversity and Inclusion Panel: “Although many institutions had successfully implemented several gender-related initiatives, there was much less attention given to other protected groups. The panel therefore had little confidence that the majority of institutional environments would be sufficiently mature in terms of support for EDI within the next few years”.

Statistics: ‘key facts’	2014	2021
HEIs	154	157
FTE staff	52,150	76,132
Outputs	191,150	185,594
Impact case studies	6,975	6,781

Quality %	4*	3*	2*	1*
Overall
2014	30	46	20	3
2021	41	43	14	2
Outputs
2014	22.4	49.5	23.9	3.6
2021	35.9	46.8	15.4	1.6
Impact
2014	44	39.9	13	2.4
2021	49.7	37.5	10.8	1.7
Environment
2014	44.6	39.9	13.2	2.2
2021	49.6	36.9	11.6	1.9

So, more submissions and many more staff submitted fewer outputs and case studies, reducing the evidence base for judging quality. At Main Panel level, Panel C was the only one to have more UoA submissions, more outputs and more case studies. It had the biggest increase in staff submitted – 63%. The other 3 panels all received fewer outputs and case studies, despite staff numbers increasing by between 34% and 47%.

The Main Panel C feedback acknowledges that the apparent increase in quality can be attributed in part to the changes in the rules. It also credits the ‘flourishing research base’ in HEIs, but a recent report from DBEIS making international comparisons of the UK research base shows that between 2016 and 2020, the UK publication share declined by 2.7% a year, its citation share by 1.4% a year, its field-weighted impact by 0.2% a year and its highly-cited publication share by 4.5% a year. The 2020 QS league tables show elite UK universities drifting downwards despite favourable funding and policy preferentiality aiming to achieve the exact opposite. I suggest that better presentation of REF impact case studies and investment in promoting that internally contributed to the grade inflation there.

Note that 4* overall grades are significantly enhanced by ratings in impact and environment, confirming the shift to assess units not individuals. Ratings in both impact and environment are in multiples of either 12.5% (one eighth) or 16.7% (one sixth) in contrast to outputs, where they go to decimal points. The 2014 approach to impact assessment attracted serious and severe criticism from Alis Oancea (Oxford) and others because of the failure to do any audit of exaggerated claims, some of them to an outrageous extent. This time seems to have been better on both sides. There is still some strategic management of staff numbers – the units submitting just under 20 or 30 staff were many times higher than submitting one more, which would have required an extra case study. Some staff may, then, have lost out and been re-classified as not engaged in research.

Education

I won’t claim things leap out from the stats but there are some interesting figures, many attributable to the many changes introduced after Stern. The number of staff (FTE) submitted went up by over 50%, to 2168, but the number of outputs went down by 4.5%, from 5,526 to 5,278. Under the new rules, not all those submitted had to have four outputs, and for 2021, in Education, 1,192 people – 51% of the headcount of 2330 – submitted only one. 200 submitted four, and 220, five. The gaming was obvious and anticipated – get the most out of your best staff, prune the lower rated items from middle ranking output and get the best one from people not previously submitted to get the average required of 2.5 per FTE, and get close to 100% participation. Interestingly, in Education, output grades from new researchers had the same profile as from more longstanding staff though more – 65% – submitted only one, with 21 – 7% – submitting four or five. Across all panels there was little or no change in the numbers of new researchers. 199 former staff in Education also had output submitted, where similar selectivity could operate; 28 had four or five submitted.

Within Main Panel C, Education had the poorest quality profile: the lowest % score of 3* and 4* combined, and by far the highest 1* score (7%), when the Panel C average was 3%. Where it did score well was in the rate of increase of doctoral degree awards where it was clearly top in number and ‘productivity’ per FTE staff member. Between 2013-4 and 2019-20, annual numbers went up from 774 to 964, nearly 20%. I postulate that that links to the development of EdD programmes with admission of students in group cohorts rather than individually.

Profiles	2014	2021
UoAs	76	83
FTE staff	1,441.76	2,168.38
Outputs	5,526	5,278
Impact case studies	218	232

Quality %	4*	3*	2*	1*
Overall
2014	30	36	26	7
2021	37	35	20	7
Outputs
2014	21.7	39.9	29.5	7.8
2021	29.8	38.1	23.7	7.6
Impact
2014	42.9	33.6	16.7	6.0
2021	51.1	29.0	14.3	4.8
Environment
2014	48.4	25	18.1	7.8
2021	45.1	27.5	17.1	9.9

Environment obviously posed problems. Income generation was a challenge and crowded buildings from growth in student numbers may have reduced study space for researchers. In 2014 the impact assessors raised queries about the value for money of such a time consuming exercise and their feedback took just over a page and dealt with organisation structures and processes for promoting impact not their outcome. This time it was much fuller and more helpful in developmental terms.

Feedback

Learn for next time, when, of course, the panel and its views may be different…

Two universities – Oxford and UCL – scored 100% 4* for both impact and environment, moving the UCL 4* score from 39.6% for output to 62% overall quality. That is a big move. Nottingham, which had 2×100% in 2014, dropped on both, to 66.7% in impact and 25% for environment. The total number of 100% scores was seven for impact, up from four; four for environment, down from eight. The two UoAs scoring 0% overall (and therefore in all components) in 2014 moved up. Only two scored zero at 4* for impact, and not other components, one being a pre-92 institution. 17 got their only zero in environment, five being pre-92ers, including Kent which did get 100% … at grade 1*, and Roehampton, which, nevertheless, came high in the overall ratings. Dundee, Goldsmiths and Strathclyde had no 4* rating in either impact or environment, along with 30 post-92 HEIs.

Outputs

Those getting the highest grades demonstrated originality, significance and rigour in diverse ways, with no strong association with any particular methods, and including theoretical and empirical work. A high proportion of research employing mixed methods was world leading or internationally excellent.

Outputs about professional practice did get some grades across the range, but (as in 2014) some were limited to descriptive or experiential accounts and got lower grades. Lower graded outputs in general showed ‘over-claiming of contribution to knowledge; weak location in a field; insufficient attention to the justification of samples or case selection; under-development of criticality and analytical purchase’. No surprises there.

Work in HE had grown since 2014, with strong work with a policy focus, drawing on sociology, economics and critical policy studies. Also strong were outputs on internationalisation, including student and staff mobility. The panel sought more work on this, on higher technological change, decolonisation and ‘related themes’, the re-framing of young people as consumers in HE, and links to the changing nature of work, especially through digital disruption. They encouraged more outputs representing co-production with key stakeholders. They noted concentrations of high quality work in history and philosophy in some smaller submissions. More work on teaching and learning had been expected – had they not remembered that it was banned from impact cases last time, which might have acted as a deterrent until that was changed over halfway in to the period of the exercise? – with notable work on ICT in HE pedagogy and professional learning. What they did get, since it was the exemplification of world class quality by the previous panel, were strong examples of the use of longitudinal data to track long-term outcomes in education, health, well-being and employment, including world-class data sets submitted as outputs.

Impact

The strongest case studies:

Provided a succinct summary so that the narrative was strong, coherent and related to the template
Clearly articulated the relationship between impact claims and the underpinning research
Provided robust evidence and testimonials, judiciously used
Not only stated how research had had an impact on a specific area, but demonstrated both reach and significance.

There was also outstanding and very considerable impact on the quality of research resources, research training and educational policy and practice in HEIs themselves, which was often international in reach and contributed to the quality of research environments. So, we got to our bosses, provided research evidence and got them to do something! A quintessential impact process. Begin ‘at home’.

Environment

The panel’s concerns on environment were over vitality and sustainability. They dismissed the small fall in performance, but noted that 16 of the 83 HEIs assessed were not in the 2014 exercise – implying scapegoats, but Bath – a high scorer – was one of those. The strongest submissions:

Had convincing statements on strategy, vision and values, including for impact and international activities
Showed how previous objectives had been addressed and set ambitious goals for the future
Linked the strategy to operations with evidence and examples from researchers themselves
Were analytical not just descriptive
Showed how researchers were involved in the submission
Included impressive staff development strategies covering well-being (a contrast to reports from Wellcome and UNL researchers among others about stress, bullying and discrimination)
Were from larger units, better able to be sustained
Had high levels of collaborative work and links to policy and practice.

But… some institutions listed constraints to strategic delivery without saying what they had done to respond; some were poor on equity beyond gender and on support for PGRs and contract researchers. The effect of ‘different institutional histories’ (ie length of time being funded and accumulating research capital) were noted but without allowance being made, unlike approaches to contextual factors in undergraduate student admissions. The total research funding recorded was also down on the period before the 2014 exercise, causing concern about sustainability.

Responses

The somewhat smug satisfaction of the panels and the principals in the exercise was not matched by the commentariat. For me, the most crucial was the acknowledgement by Bahram Bekhradnia that the REF “has become dysfunctional over time and its days must surely be numbered in its present form”. Bahram had instituted the first ‘full-blown’ RAE in 1991-2 when he was at HEFCE. (Another declaration of interest, he gave me a considerable grant to assess its impact (!) on staff and institutional behaviour. Many of the issues identified in my report are still relevant). First he is concerned about the impact on teaching, which “has no comparable financial incentives”, and where TEF and the NSS have relatively insignificant impact. Second, in a zero sum game, much effort, which improves performance, gets no reward, yet institutions cannot afford to get off the treadmill, which had not been anticipated when RAE started, so wasted effort will continue for fear of slipping back. I think that effort needs re-directing in many cases to develop partnerships with users to improve impact and provide an alternative source of funding. Third, concentration of funding is now such that differentiation at the top is not possible, so risking international ratings: “something has to change, but it is difficult to know what”.

Jonathan Adams balanced good and bad: “Assessment has brought transparent direction and management of resources [with large units controlling research, not doing it], increased output of research findings, diversification of research portfolios [though some researchers claim pressure to conform to mainstream norms], better international collaboration and higher relative citation impact [though note the DBEIS figures above]. Against that could be set an unhealthy obsession with research achievements and statistics at the expense of broader academic values, cutthroat competition for grants, poorer working conditions, a plethora of exploitative short-term contracts and a mental health crisis among junior researchers”.

After a policy-maker and a professor, a professional – Elizabeth Gadd, Research Policy Manager at Loughborough, reflecting on the exercise after results day, and hoping to have changed role before the next exercise. She is concerned that churning the data, reducing a complex experience for hundreds of people to sets of numbers, gets you further from the individuals behind it. The emphasis on high scorers hides what an achievement 2*, “internationally recognised” is: it supports many case studies, and may be an indication of emergent work that needs support to develop further, to a higher grade next time or work by early career researchers. To be fair, the freedom of how to use unhypothecated funds can allow that at institutional level, but such commitment to development (historic or potential) is not built in to assessment or funding, and there are no appeals against gradings. She agonised over special circumstances, which drew little in rating terms despite any sympathy. The invisible cost of scrutinising and supporting such cases is not counted in the costs on the exercise (When I was a member of a sub-panel, I was paid to attend meetings. Time on assessing outputs was unpaid; it was deemed to be part of an academic’s life, paid by the institution, but as I was already working more hours than my fractional post allowed, I did my RAE work in private time).

There are many other commentaries on WonkHE, HEPI and Research Professional sites, but there is certainly an agenda for further change, which the minister had predicted, and which the FRAP committee will consider. Their consultation period finished in May, before the results came out – of course – but their report may be open to comment. Keep your eyes open. SRHE used to run post -Assessment seminars. We might have one when that report appears.

SRHE Fellow Ian McNay is emeritus professor at the University of Greenwich.

By Ian McNay

A recent report to the Leadership Foundation for HE (Franco-Santos et al, 2014) showed that ‘agency’ approaches to performance management favoured by managers, which ‘focus on short-term results or outputs through greater monitoring and control’ are ‘associated with …lower levels of institutional research excellence’. A small project I conducted last summer on the REF experience showed such approaches as pervasive for many institutions in the sample. Nevertheless, the main feature of the 2014 REF was the grade inflation over 2008.

Perhaps such outstanding global successes, setting new paradigms, are under-reported, or are in fields beyond the social sciences where I do most of my monitoring. More likely ‘strategic entry policies’ and ‘selectivity in many submissions’, noted by the panel, have shifted the profile from 2008. A second successive reduction of 15 percent in staff numbers submitted, with 15 institutions opting out and only nine coming in since 2008 will also have had an effect. Welcome back to Cardiff, hidden in a broad social sciences submission last time. The gain from Cardiff’s decision is that it will scoop all of the QR funding for education research in Wales. None of the six other institutions making submissions in 2008, with some 40 staff entered, appeared in 2014 results. That may be a result of the disruptions from mergers, or a set of strategic decisions, given the poor profiles last time

What has emerged, almost incidentally, is an extended statement of the objectives/ends of the exercise, which now does not just ‘inform funding’. It provides accountability for public investment in research and evidence of the benefit of this investment. The outcomes provide benchmarking information and establish reputational yardsticks for use within the HE sector and for public information. Still nothing about improving the quality of research, so any such consequence is not built in to the design of the exercise, but comes as a collateral benefit.

My analysis here is only of the Education Unit of Assessment [UoA 25]. More can be found at www.ref.ac.uk .

Something that stands out immediately is the effect of the contribution of scores on environment and impact on the overall UoA profile. Across all submissions, the percentage score for 4* ratings was 21.7 for outputs, but 42.9 for impact and 48.4 for environment. At the extreme, both the Open University and Edinburgh more than doubled their 4* score between the output profile and the overall profile. Durham, Glasgow and Ulster came close to doing so, and Cardiff and the Institute of Education added 16 and 18 points respectively to their percentage 4* overall profile.

Seven of the traditional universities, plus the OU, had 100 per cent ratings for environment. All submitted more than 20 FTE staff for assessment – the link between size and a ‘well found’ department is obvious. Several ‘research intensive’ institutions that might be expected to be in that group are not there: Cambridge scored 87.5 per cent. Seven were judged to have no 4* elements in their environment. Durham, Nottingham and Sheffield had a 100 per cent score for impact in their profile, making Nottingham the only place with a ‘double top’.

In England, 21 traditional universities raised their overall score; 6 did not. Among the less research intensive universities, 29 dropped between the two, but there were also seven gains in 4* ratings: Edge Hill went from 2.1 per cent [output] to 9 percent [overall] because of a strong impact profile; Sunderland proved King Lear wrong in his claim that ‘nothing will come of nothing’: they went from zero [output rating] to 5 per cent overall, also because of their impact score. Among the higher scoring modern universities, Roehampton went down from 31 to 20 at 4*, but still led the modern universities with 71 percent overall on the 3*/4* categories.

I assume there was monitoring of inter-assessor comparability, but there appears to have been those who used a 1-10 scale and added a zero, and those who used simple fractions, for both impact and environment. Many do not get beyond 50/50. For output, it is different; even with numbers submitted in low single figures, most scores go to the single decimal point allowed.

For me, one of the saddest contrasts was thrown up by alphabetic order. Staffordshire had an output rating of 25 percent at 4*; Southampton had 18 percent; when it came to overall profile, the rankings were reversed: Staffordshire went down to 16 because it had no scores at 3* or 4* in either impact or environment; Southampton went up to 31 percent. There is a personal element here: in 2001 I was on the continuing education subpanel. In those days there was a single grade; Staffordshire had good outputs – its excellent work on access was burgeoning, but was held down a grade because of concerns about context issues – it was new and not in a traditional department. Some other traditional departments were treated more gently because they had a strong history.

The contribution of such elements was not then quantified, nor openly reported, so today’s openness at least allows us to see the effect. I believed, and still do,that, like the approach to A level grades, contextual factors should be taken account of, but in the opposite direction. Doing well despite a less advantaged context should be rewarded more. My concern about the current structure of REF grading is that, as in 2001, it does the exact opposite. The message for units in modern universities, whose main focus is on teaching, is to look to the impact factor. A research agenda built round impact from the project inception stage may be the best investment for those who wish to continue to participate in future REFs, if any. There is a challenge because much of their research may be about teaching, but the rules of the game bar impact on teaching from inclusion in any claim for impact. That dismisses any link between the two, the Humboldtian concept of harmony, and any claims to research- led teaching. As a contrast, on the two criticisms outlined here, the New Zealand Performance –Based Research Fund [PBRF] has, among its aims:

To increase the quality of basic and applied research
To support world-leading research-led teaching and learning at degree and postgraduate level.

Might the impending review of REF take this more integrated and developmental approach?

Reference: Franco-Santos, M., Rivera, P. and Bourne, M.(2014) Performance Management in UK Higher Education Institutions, London, Leadership Foundation for Higher Education

SRHE Fellow Ian McNay is emeritus professor at the University of Greenwich.

	Fenella Watson on How many Black professors shou…
	Zarus Cenac on How many Black professors shou…
	Jennie Golding on How many Black professors shou…
	Rob Warwick on When papers become currency
	srhebloged on When papers become currency

SRHE Blog

The Society for Research into Higher Education

Tag Archives: REF analysis

REF 2021: reflecting on results, rules and regulations, and reform (again)

REF outcomes: Some comments and analysis on the Education Unit of Assessment