srhe

The Society for Research into Higher Education


Leave a comment

The potential of automated text analysis for higher education research

by Stijn Daenekindt

Together with Jeroen Huisman, I recently published an article in which we mapped the field of research on higher education. In a previous blogpost we reflected on some key findings, but only briefly mentioned the method we used to analyze the abstracts of 16,928 research articles (which totals to over 2 million words). Obviously we did not read all these texts ourselves. Instead, we applied automated text analysis. In the current blogpost, I will discuss this method to highlight its potential for higher education research.

Automated text analysis holds tremendous potential for research into higher education. This because, higher education institutions—ie our research subjects— ‘live’ in a world that is dominated by the written word. Much of what happens in and around higher education institutions eventually gets documented. Indeed, higher education institutions produce an enormous amount and variety of texts, eg grant proposals, peer reviews and rejection letters, academic articles and books, course descriptions, mission statements, commission reports, evaluations of departments and universities, policy reports, etc. Obviously, higher education researchers are aware of the value of these documents and they have offered a lot of insightful case studies by closely reading such documents. However, for some types of research questions, analysing a small sample of texts just doesn’t do the job. When we want to analyse huge amounts of text data, which are unfeasible for close reading by humans, automated text analysis can help us.

There are various forms of automated text analysis. One of the most popular techniques is topic modelling. This machine learning technique is able to automatically extract clusters of words (ie topics). A topic model analyses patterns of word co-occurrence in documents to reveal latent themes. Two basic principles underlie a topic model. The first is that each document consists of a mixture of topics. So, imagine that we have a topic model that differentiates two topics, then document A could consist of 20% topic 1 and 80% topic 2, while document B might consist of 50% topic 1 and 50% topic 2. The second principle of topic modelling is that every topic is a mixture of words. Imagine that we fit a topic model on every edition of a newspaper over the last ten years. A first possible topic could include words such as ‘goal’, ‘score, ‘match’, ‘competition’ and ‘injury’. A second topic, then, could include words such as ‘stock’, ‘dow_jones, ‘investment, ‘stock_market’ and ‘wall_street’. The model can identify these clusters of words, because they often co-occur in texts. That is, it is far more likely that the word ‘goal’ co-occurs with the word ‘match’ in a document, then it is to co-occur with the word ‘dow_jones’.

Topic models allow us to reveal the structure of large amounts of textual data by identifying topics. Topics are basically a set of words. More formally, topics are expressed as a set of word probabilities. To learn what the latent theme is about we can order all the words in decreasing probability. The two illustrative topics (see previous paragraph) clearly deal with the general themes ‘sports’ and ‘financial investments’. In this way, what topic models do with texts actually closely resembles what exploratory factor analysis does with survey data, ie revealing latent dimensions that structure the data. But how is the model able to find interpretable topics? As David Blei explains, and this may help to get a more intuitive understanding of the method, topic models trade off two goals: (a) the model tries to assign the words of each document to as few topics as possible, and (b) the model tries, in each topic, to assign high probability to as few words as possible. These goals are at odds. For example, if the model allocates all the words of one document to one single topic, then (b) becomes unrealistic. If, on the other hand, every topic consists of just a few words, then (a) becomes unrealistic. It is by trading off both goals that the topic model is able to find interpretable sets of tightly co-occurring words.

Topic models focus on the co-occurrence of words in texts. That is, they model the probability that a word co-occurs with another word anywherein a document. To the model, it does not matter if ‘score’ and ‘match’ are used in the same sentence in a document or if one is used in the beginning of the document while the other one is used at the end. This puts topic modelling in the larger group of ‘bag-of-words approaches’, a group of methods that treat documents as …well … bags of words. Ignoring word order is a way to simplify and reduce the text, which yields various nice statistical properties. On the other hand, this approach may result in the loss of meaning. For example, the sentences ‘I love teaching, but I hate grading papers’ and ‘I hate teaching, but I love grading papers’ obviously have different meanings, but this is ignored by bag-of-words techniques.

So, while bag-of-word techniques are very useful to classify texts and to understand what the texts are about, the results will not tell us much about how topics are discussed. Other methods from the larger set of methods of automated text analysis are better equipped for this. For example, sentiment analysis allows one to analyze opinions, evaluations and emotions. Another method, word embedding, focusses on the context in which a word is embedded. More specifically, the method finds words that share similar contexts. By subsequently inspecting a words’ nearest neighbors — ie which are the words often occurring in the neighborhood of our word of interest — we get an idea of what that word means in the text. These are just a few examples of the wide range of existing methods of automated text analysis and each of them has its pros and cons. Choosing between them ultimately comes down to finding the optimal match between a research question and a specific method.

More collections of electronic text are becoming available every day. These massive collections of texts present massive opportunities for research on higher education, but at the same time they present us with a problem: how can we analyze these? Methods of automated text analysis can help us to understand these large collections of documents. These techniques, however, do not replace humans and close reading. Rather, these methods are, as aptly phrased by Justin Grimmer and Brandon Stewart, ‘best thought of as amplifying and augmenting careful reading and thoughtful analysis’. When using automated text analysis in this way, the opportunities are endless and I hope to see higher education researchers embrace these opportunities (more) in the future.

Stijn Daenekindt is a Postdoctoral Researcher at Ghent University (Department of Sociology). He has a background in sociology and in statistics and has published in various fields of research. Currently, he works at the Centre for Higher Education Governance Ghent. You can find an overview of his work at his Google Scholar page.


Leave a comment

The (future) state of higher education research?

by Stijn Daenekindt and Jeroen Huisman

Parallel to the exponential growth of research on higher education, we see an increasing number of scientific contributions aiming to take stock of our field of research. Such stock-taking activities range from reflective and possibly somewhat impressionistic thoughts of seasoned scholars to in-depth reviews of salient higher education themes. Technological advancements (such as easy electronic access to research output and an increasingly broader set of analytical tools) obviously have made life easier for analysts. We recently embarked upon a project to explore the thematic diversity in the field of research in higher education. The results have recently been published in Higher Education. Our aim was to thematically map the field of research on higher education and to analyse how our field has evolved over time.

For this endeavour, we wanted our analysis to be large-scale. We aimed at including a number of articles that would do justice to the presumed variety in research into higher education. We did not, however, want the scale of our analysis to jeopardize the depth of our analysis. Therefore, we decided not to limit our analyses to, for example, an analysis of citation patterns or of keywords. Finally, to forestall bias (stemming from our personal knowledge about and experience in the field), we applied an inductive approach. These criteria led us to collect 16,928 journal articles on higher education published between 1991 and 2018 and to analyse each article’s abstract by applying topic modelling. Topic modelling is a method of automated text analysis and a follow-up blogpost (also on srheblog.com) will address the method. For now, it suffices to know that topic modelling is a machine learning technique that automatically analyses the co-occurrence of words to detect themes/topics and to find structure in a large collection of text.

In this blogpost, we present a glimpse of our findings and some additional thoughts for further discussion. In our analysis, we differentiate 31 research topics which inductively emerged from the data. For example, we found topics dealing with university ranking and performance, sustainability, substance use of college students, research ethics, etc. The bulk of these research topics were studied at the individual level (16 topics), with far fewer at the organisational (5) and system level (3). A final set of topics related either clearly to disciplines (eg teaching psychology) or to more generic themes (methods, academic writing, ethics). This evidences the richness of research into higher education. Indeed, our field of research certainly is not limited in terms of perspectives and unleashes “the whole shebang” of possible perspectives to gain new insights into higher education.

The existence of different perspectives also comprises potential dangers, however. Studies applying a certain approach on higher education — say, a system-level approach — may suffer from tunnel vision and lose sight of individual- and organization-level aspects of higher education. This may be problematic as processes on the different levels are obviously related to one another. In our analysis we find that studies indeed tend to focus on one level. For example, system-level topics tend to be exclusively combined with other system-level topics. This should not come as a big surprise, but there is potential danger in this and it may hamper the development of a more integrated field of research on higher education.

In our analysis, we also find a certain restraint to combine topics which are located at the same level. For example, topics on teaching practices are very rarely combined with topics on racial and ethnic minorities — even though both topics are situated at the individual level. To us, this was surprising as the combination of ethnicity and educational experiences is a blossoming field in the sociology of education. The fact that topics at the same level are only rarely combined is less understandable then the fact that topics on different levels are rarely combined. We hope that our analysis aids others researchers to identify gaps in the literature and that it motivates them to address these gaps.

A second finding we wish to address here relates to specialisation. Our analysis suggests that there is a trend of specialisation in our field of research. We looked at the number of topics combined in articles and we see that topic diversity declines over time. This is, on the one hand, not that surprising. Back in 1962, Kuhn already argued that the system of modern science encourages researchers towards further specialisation. So, it makes sense that over time, and parallel to the growth of the field of research on higher education, researchers specialise more and demarcate their own topic of expertise. On the other hand, it may be considered a problematic evolution as it can hamper our field of research to develop towards further maturity.

But what should we think of the balance between healthy expansion and specialization, on the one hand, and inefficient fragmentation, on the other? We lean towards evaluating the current state of higher education research as moving towards fragmentation. Other researchers, such as Malcom Tight, Bruce Macfarlane and Sue Clegg have similarly lamented the fragmented nature of our field of research. Our analysis adds to this by showing the trends over time: we observe more specialisation (not necessarily bad), but there are also signs of disintegration over time (not good). Other analyses we are currently carrying out also indicate thematic disintegration and suggest clear methodological boundaries. It looks like many researchers focusing on the same topic remain in their “comfort zone” and use a limited set of methods. For sure, many methodological choices are functional (as in fit-for-purpose), but the lack of diversity is striking. Moreover, we see that many higher education researchers stick to rather traditional techniques (survey, interviews, case studies) and that new methods hardly get picked up in our field. A final observation is that we hardly see methodological debates in our field. In related disciplines we often see healthy methodological discussions that improve the available “toolkit” (for example here). In our field, it appears that scholars shy away from such discussions and it suggests methodological conservatism and/or methodological tunnel vision.

There are still many things to investigate to arrive at a full assessment of the state of the art. One important question is how our field compares to other fields or disciplines. But if we were to accept the idea of fragmentation, it is pertinent to start thinking how to combat this. Reversing this trend is obviously not straightforward. But here are a few ideas. Individual scholars could try to get out of their comfort zone by applying other perspectives to their favourite research object and/or by applying their favourite perspective to new research topics. Related, researchers should be encouraged to use techniques less commonly used in our field and see whether they yield different outcomes (vignettes, experimental designs, network analysis, QCA/fuzzy logic, [auto-]ethnography and – of course – topic models). In addition, journal editors could be more flexible and inclusive in terms of the format of the submissions they consider. For example, they could explicitly welcome submissions in the format of ‘commentaries/ a reply to’. This would stimulate debate and open up the floor for increased cross-fertilisation of research into higher education and, in general, signal the maturity of research into higher education. Finally, there is scope for alternative peer review processes. Currently, only editors (and sometimes peer reviewers seeing the outcome of a peer review process) gain full insight in feedback offered by peers. If we would make these processes more visible to a broader readership – e.g. through open peer review, which still can be double-blind – we would gain much more insight in methodological and theoretical debates, that would definitely support the healthy growth of our field.  

This post is based on the article: Daenekindt, S and Huisman, J (2020) ‘Mapping the scattered field of research on higher education. A correlated topic model of 17,000 articles, 1991–2018’ Higher Education, 1-17. Stijn Daenekindt is a Postdoctoral Researcher at Ghent University (Department of Sociology). SRHE Fellow Jeroen Huisman is a Full Professor at Ghent University (Department of Sociology).

Paul Temple


Leave a comment

Weirdos and misfits? I’ve met a few…

By Paul Temple

Perhaps, like me, you’ve had some harmless fun recently in drawing up a mental list of the “weirdos and misfits…with odd skills” you know in university life who might work with Dominic Cummings at Number 10. (In a few cases, I couldn’t decide who I’d feel sorriest for.) Now that Brexit has been “done”, it seems that Cummings plans to “turn the UK into a leading centre for science, putting it at the cutting edge of artificial intelligence, robotics and climate change” and needs some hired help. (This and other quotes come from a Financial Times profile of Cummings of 18/19 January 2020, said to have been fact-checked by its subject.)

The irony here, presumably unintended, would be almost funny if it wasn’t completely maddening. I’d be surprised if you could find a single working research scientist in the country who doesn’t view Brexit, so far as science is concerned, somewhere on a spectrum from “unfortunate” to “utter disaster”. Certainly, if there are any Brexiteer scientists working at UCL they’ve kept a very low profile indeed over the past few years. And now the man who has done as much as anyone to damage UK academic work by destroying our links with European partners calmly tells us that his “new agenda” – sensibly distancing himself from the tedious details of working out a new trade deal with the EU – is to achieve a scientific renaissance.

But Cummings, it seems, is thinking beyond the UK merely becoming better at science than it has so far managed when working collaboratively with European science networks. Cummings, an Oxford ancient and modern history graduate, clearly considers that he possesses the skills to apply science “to understanding and solving public policy problems”. This is probably what most social scientists, if pressed, would say they are trying to do, but I don’t think that the humdrum problems that most of us work on are what Cummings has in mind. Instead, “his inspiration is the US government’s Manhattan Project…[and how] the failing NASA bureaucracy [became] an organisation that could put a man on the moon…[he also plans] to set up a civilian version of the US Defense Advanced Research Projects Agency”. Big, shiny projects are what he wants.

I’ve used the Manhattan Project as a case study in my teaching, and I’ve no doubt that much can be learned from it. It helped that J Robert Oppenheimer was both a world-class physicist and, as it turned out, a world-class project director, who was able to work with a multi-national group of scientific egoists in a collection of army huts in the New Mexico desert and produce the world’s first atomic explosion within 28 months of starting work. But Oppenheimer knew what he had to do, had a fair idea about how to go about it, and could call on all the resources of the world’s scientific and engineering superpower. It doesn’t at all detract from his achievements to say that the Manhattan Project was in a certain sense straightforward compared to, say, improving health care or reducing crime for a large population. Leaving aside resource limitations, knowing “what works” in these and other areas of social policy has a different meaning to knowing “what works” in nuclear engineering or rocket design. Habermas described this difference in terms of “the ideology of technique”. Even defining what “improved health care” might look like will be contested, as will its measures of success. Nobody doubted that they’d know a nuclear explosion when it happened. (Actually, Oppenheimer might have agreed that quantum mechanics and problems in social policy do have something in common: if you think you understand what it is you’re observing, you’ve got it wrong.)

So my guess is that the clever Oxford humanities graduate, with no formal training in either natural or social science, is going to become very frustrated in attempting to apply methods from the former to try to solve complex problems in the domain of the latter. Paradoxically (or maybe not), this puts me in mind of the education research that I had some acquaintance with in the afterlife of the old Soviet Union. There, the necessary assumption was that if enough data were collected, and the precepts of scientific Marxism-Leninism were correctly applied to them, then a definitive solution to whatever the problem was would be found. There had to be a “scientific” answer to every question, if only you did enough work on it. To suggest otherwise would be, literally, unthinkable in a Marxist worldview.

Still, perhaps Cummings will show that answers to problems in big science do in fact read across to social policy: after all, compared to making Brexit the tremendous national success story that we’ve been assured it will be, it should be quite easy.

SRHE member Paul Temple is Honorary Associate Professor, Centre for Higher Education Studies, UCL Institute of Education, University College London. See his latest paper ‘University spaces: Creating cité and place’, London Review of Education, 17(2): 223–235 at https://doi.org/10.18546


Leave a comment

Why we should care about comparative higher education?

by Ariane de Gayardon

In contrast to comparative education, whose history dates back to the beginning of the 19th century, comparative higher education is a relatively recent construct of research originating in the 1970-1980s. This early period gave us the first comparative instruments, still widely used today, as lenses to analyse national higher education systems. These include Clark’s triangle of coordination (1983), Altbach’s use of the concept of centre and periphery (1981) and Trow’s definition of elite, mass and universal systems (1973). Therefore, early on, comparative higher education proved very successful in increasing our understanding of higher education globally. But, since then, what has it accomplished?

While there are many users of comparative higher education – that is, researchers whose research could be considered comparative – there is still little written critically on comparative higher education research. The debate is alive, led by individual researchers, including Kosmützky, Bleiklie, and Valimaa. However, there is little acknowledgment of their efforts by users of comparative research, showing a clear divide between efforts to conceptualise and theorise comparative research in higher education and actual research practice. As a result, the field of comparative higher education is lacking rigour, as exemplified by the lack of appropriate rationales for sampling choices – why countries are included – in the vast majority of comparative papers (Kosmützky, 2016). This puts comparative higher education at odds with comparative studies in other disciplines, that have been focused on the comparative method as a way to reach causality or improve generalisation.

What researchers in comparative higher education have failed to achieve in the past 40 years is to elevate comparative studies in higher education to a (sub-)field of study. An academic field is built on the emergence of two dynamics: an intellectual debate and an institutional structure (Manzon, 2018). The debate around comparative higher education has been focused on proposing conceptual and theoretical frameworks, but it remains marginal. Additionally, questions that are still to be raised and answered include the objectives and purpose of comparative higher education, as well as what unites researchers undertaking comparative projects. At the same time, there is a lack of academic space for this debate to happen. Comparative higher education lacks specific journals – with the exception of the Journal of Comparative and International Higher Education, societies and associated conferences, and research centres. Unlike comparative education, it has not yet permeated into the teaching function of higher education, with an absence of textbooks and dedicated degrees (although some courses do exist). Comparative higher education therefore remains on the margins, a practice of research that is still to be properly understood.

This deficit of reflective and critical thinking on comparative higher education matters. The use of comparative higher education for cross-country comparisons remains essential in understanding higher education systems. It provides unique settings to deepen our knowledge of higher education phenomena through the way they manifest in different environments and in contact with different cultures. This leads to improved theorisation of higher education phenomena that transcends borders, helping to fight assumptions and opening new avenues for conceptualising higher education. Consequently, it helps us understand our own higher education system better, through knowledge of the ‘other’ and combatting “comparative chauvinism” and “comparative humility” (Teichler, 2014). And because comparative higher education is not limited to international comparisons, it provides an opportunity to increase our knowledge of within-system variations through tools to analyse both the local and the global in higher education.

Comparative higher education research is also of tremendous importance to evidence-based policy. Higher education policies remain decided at the country (state) level in most countries around the globe, which means that comparison is essential to understand the consequences of different policies. Policy evaluation in higher education needs comparative studies, internationally and historically in particular. Understanding higher education policies beyond the national context is also important in a world where policy-borrowing and lending is prevalent. Knowledge of the ways different policies adapt in different environments helps prevent the spread of seemingly successful policies that would have detrimental consequences if translated elsewhere.

Finally, higher education research already evolves in an international context. Higher education stakeholders – students and faculty in particular – are mobile beyond borders, while knowledge does not know national boundaries. As a result, the vast majority of researchers in higher education have frames of reference that extend beyond their national context. This means that most higher education research might be unintentionally comparative. This is problematic in two ways. First, the way you do research is important to recognise and understand to reach research rigor. Second, researchers might not be acknowledging properly their positionality and bias, by not reflecting on what they know and don’t know about higher education globally.

After 40 years of existence, it might be time to stop and reflect on comparative higher education research and decide what its mission is. To do so, we can rely on endless research and debate in the field of comparative education, as well as a robust literature on comparative studies, that would provide strong basis for the construction of a field of comparative higher education. This reflection will help strengthen the higher education research done comparatively, leading to a tremendous increase in our knowledge of higher education generally.

References

Altbach, PG (1981) ‘The university as center and periphery’ Teachers College Record, 82(4): 601-621

Clark, B (1983) The higher education system : Academic organization in cross-national perspective Berkeley, CA: University of California Press

Kosmützky, A (2016) ‘The precision and rigor of international comparative studies in higher education’ in Theory and Method in Higher Education Research (pp 199-221) Emerald Group Publishing Limited

Manzon, M (2018) ‘Origins and traditions in comparative education: challenging some assumptions’, Comparative Education, 54(1): 1-9

Teichler, U (2014) ‘Opportunities and problems of comparative higher education research: The daily life of research’ Higher Education, 67(4): 393-408

Trow, M (1973) Problems in the transition from elite to mass higher education Berkeley, CA: Carnegie Commission on Higher Education

Ariane de Gayardon is a Senior Research Associate in the Centre for Global Higher Education at the UCL Institute of Education and is Assistant Editor of the Journal of Studies in International Education