by Kathryna Kwok and Siân Alsop
As a sector, we have long known that a student’s ethnicity has a relationship to the average class of degree they are awarded. Students from Black, Asian and minority ethnic (or ‘BAME’) backgrounds can expect to leave UK universities with lower classifications than their White counterparts – up to 20% lower (UUK & NUS, 2019). Universities have been paying attention to how institutional factors like curricula, assessment and staffing decisions might contribute to this awarding gap in a welcome shift away from focusing on the role student background might play. But not much is really changing. The OfS (2022) reports that although the gap has somewhat narrowed in the last five years, Black students in particular continue to receive 17.4% lower degree outcomes than White students. At this point, it is clear that sweeping, quick fix, or isolated interventions are not fit for purpose. The mechanisms underpinning awarding disparities are complex and entrenched. It seems more likely that a sharpened focus and multi-pronged approach to unpicking the multiple ways in which we disadvantage certain students would be useful, if we are serious about change.
In this vein, we took as our starting point the need to get granular – to dig into ‘who’, as binary ‘BAME’/White distinctions are uninformative at best, and ‘how’, as a degree award is the culmination of many parts. We focused on understanding differences in awarded marks at the module level. But what’s the best way to operationalise module mark differences between two groups? One obvious option would be, simply, to calculate the difference in means (ie mean of group A – mean of group B). However, this has the potential to be misleading. What if, for instance, there were a lot of variation in students’ marks?
Our solution, as we describe in our 2019 SRHE Research Award final report, was to use the formula for the t-statistic, which is a measure reflecting the difference between two groups’ means scaled to group variation and size. In the context of calculating module mark gaps, we refer to this as the difference index (DI) to avoid confusion with the t-values calculated as part of other statistical testing. The formula for DI is below – here, n is the number of students in a group, s is the standard deviation and x̄ is the group mean.
A larger absolute value indicates a larger difference between the two group means (with group size and variation held constant). A positive value means that group A outperformed group B, while a negative value means that group B outperformed group A. If multiple comparisons are being performed with one common baseline group, it is recommended that the baseline group is consistently positioned as group A.
The DI offers a straightforward yet nuanced way to operationalise module mark gaps using data that universities already routinely collect. As a measure of module-level differences, it also offers a way to characterise, monitor and investigate the awarding gap at a granular level – something which percentage point differences in final degree outcomes are much less able to do.
This said, the DI does have its disadvantages. It is not as intuitively interpretable as a simple difference in means. This is perhaps the trade-off for the added nuance it offers. As an illustration, for two groups each with 10 students and an equal spread of module marks (SD = 10.00), a difference of five marks equates to a DI of 1.12, and a difference of ten marks equates to a DI of 2.24. Another disadvantage of the DI is that it can only be used to compare two groups at a time, meaning separate calculations have to be performed for each pairing. Analyses involving multiple groups (or multiple combinations of groups) could thus quickly become unwieldy. In our case study, which we describe in the report linked above, we used regression modelling to investigate whether module characteristics (eg level, credit value, exam weight) could predict DI. This required us to compute one regression model for each ethnicity pairing (White v Asian, White v Black, White v Mixed, White v Other). One of our findings was that module characteristics significantly predicted DI only between White and Black students, which we note ‘highlight[ed] the importance of recognising the heterogeneity of student experiences and needs’ (Kwok & Alsop, 2021: 16). We hope to conduct similar analyses with a much larger dataset, ideally from multiple institutions, which would enable us to utilise multilevel regression techniques to more elegantly capture and explore granular differences in the marks awarded to different groups of students.
To our knowledge, there is no measure that is systematically being used to operationalise the ethnic awarding gap at a level more granular than final degree outcome. We argue that this limits universities’ abilities to understand the awarding gap and identify what can be done to address it at an institutional level. We believe that the DI offers a solution to this. It can be used by researchers, as we have done, to investigate what curriculum-related factors may be contributing to the awarding gap. Those who teach can use the DI to explore module mark differences in their programmes, both within and between cohorts. Those with access to institutional-level data can investigate these trends on an even larger scale, for instance to explore if there are particular areas or time points where modules have consistently high or low (or negative) DIs. The impact of the pandemic can also be explored in this way, for example by using student cohort data either side of Covid-19. Further, while this article has discussed the DI in the context of investigating the ethnic awarding gap, it can also be used to compare students grouped in other ways. It is important that institutions utilise the data they already have to understand awarding discrepancies in a clear and sufficiently granular way – using the DI would help accomplish this.
Kat Kwok is an Educational Researcher at the Oxford Centre for Academic Enhancement and Development at Oxford Brookes University. She is interested in using quantitative methods to investigate race and gender disparity in higher education, and the student experience. Kat recently started her PhD at Coventry University where she will be using a mixed methods approach to investigate the relationships between feedback and student characteristics, and the impact of feedback.
Dr Siân Alsop is Research Fellow in the Centre for Global Learning at Coventry University. She is a corpus linguist whose research areas include attainment disparities in higher education, feedback, and the language of lectures. Siân was previously a Lecturer in Academic Writing and has worked on a number of projects relating to academic discourse, including the development of the British Academic Written English (BAWE) corpus and the Engineering Lecture Corpus (ELC).
May 5, 2022 at 5:02 pm
I really can’t see what is new here, the proposed DI is just the t-statistic. The authors state that the DI is not as intuitive as a simple difference in means. So why not just present the two means (or their difference) and use the t-test to compare the means statistically rather than present the un-intuitive DI measure?
Furthermore, there are a number of assumptions that must be satisfied before the t-statistic (or DI) is valid which the authors have not considered. The two groups should have equal variance before the pooled variance can be used. The data in each group should be normally distributed. The two groups should be independent and the authors need to consider whether the mean is the most appropriate statistic to compare. In the example provided, the authors suggest using the DI on a sample of two groups with 10 students. In my opinion it would not be appropriate to the use the DI (t-statistic) on such a small sample as the data will not be normally distributed and the mean may not be the most appropriate summary measure.
When reporting attainment gaps between groups I would recommend presenting the means or medians for each group and using the appropriate statistical test (t-test or Mann-Whitney for two groups and ANOVA or Kruskal-Wallis for multiple groups) to assess whether the difference between the groups is statistically significantly different from zero. Then multiple regression methods could be used to establish the independent impact of each variable having accounted for possible confounders.