The Society for Research into Higher Education

1 Comment

Fair use or copyright infringement? What academic researchers need to know about ChatGPT prompts

by Anita Toh

As scholarly research into and using generative AI tools like ChatGPT becomes more prevalent, it is crucial for researchers to understand the intersections of copyright, fair use, and use of generative AI in research. While there is much discussion about the copyrightability of generative AI outputs and the legality of generative AI companies’ use of copyrighted material as training data (Lucchi, 2023), there has been relatively little discussion about copyright in relation to user prompts. In this post, I share an interesting discovery about the use of copyrighted material in ChatGPT prompts.

Imagine a situation where a researcher wishes to conduct a content analysis on specific YouTube videos for academic research. Does the researcher need to obtain permission from YouTube or the content creators to use these videos?

As per YouTube’s guidelines, researchers do not require explicit copyright permission if they are using the content for “commentary, criticism, research, teaching, or news reporting,”as these activities fall under the umbrella of fair use (Fair Use on YouTube – YouTube Help, 2023).

What about this scenario? A researcher wants to compare the types of questions posed by investors on the reality television series, Shark Tank, with questions generated by ChatGPT as it roleplays an angel investor. The researcher plans to prompt ChatGPT with a summary of each Shark Tank pitch and ask ChatGPT to roleplay as an angel investor and ask questions. In this case, would the researcher need to obtain permission from Shark Tank or its production company, Sony Pictures Television?

In my exploration, I discovered that it is indeed crucial to obtain permission from Sony Pictures Television. ChatGPT’s terms of service emphasise that users should “refrain from using the service in a manner that infringes upon third-party rights. This explicitly means the input should be devoid of copyrighted content unless sanctioned by the respective author or rights holder” (Fiten & Jacobs, 2023).

I therefore initiated communication with Sony Pictures Television, seeking approval to incorporate Shark Tank videos in my research. However, my request was declined by Sony Pictures Television in California, citing “business and legal reasons”. Undeterred, I approached Sony Pictures Singapore, only to receive a reaffirmation that Sony cannot endorse my proposed use of their copyrighted content “at the present moment”. They emphasised that any use of their copyrighted content must strictly align with the Fair Use doctrine.

This evokes the question: Why doesn’t the proposed research align with fair use? My initial understanding is that the fair use doctrine allows re-users to use copyrighted material without permission from the right holders for news reporting, criticism, review, educational and research purposes (Copyright Act 2021 Factsheet, 2022).

In the absence of further responses from Sony Pictures Television, I searched the web for answers.

Two findings emerged which could shed light on Sony’s reservations:

  • ChatGPT’s terms highlight that “user inputs, besides generating corresponding outputs, also serve to augment the service by refining the AI model” (Fiten & Jacobs, 2023; OpenAI Terms of Use, 2023).
  • OpenAI is currently facing legal action from various authors and artists alleging copyright infringement (Milmo, 2023). They contend that OpenAI had utilized their copyrighted content to train ChatGPT without their consent. Adding to this, the New York Times is also contemplating legal action against OpenAI for the same reason (Allyn, 2023).

These revelations point to a potential rationale behind Sony Pictures Television’s reluctance: while use of their copyrighted content for academic research might be considered fair use, introducing this content into ChatGPT could infringe upon the non-commercial stipulations (What Is Fair Use?, 2016) inherent in the fair use doctrine.

In conclusion, the landscape of copyright laws and fair use in relation to generative AI tools is still evolving. While previously researchers could rely on the fair use doctrine for the use of copyrighted material in their research work, the availability of generative AI tools now introduces an additional layer of complexity. This is particularly pertinent when the AI itself might store or use data to refine its own algorithms, which could potentially be considered a violation of the non-commercial use clause in the fair use doctrine. Sony Pictures Television’s reluctance to grant permission for the use of their copyrighted content in association with ChatGPT reflects the caution that content creators and rights holders are exercising in this new frontier. For researchers, this highlights the importance of understanding the terms of use of both the AI tool and the copyrighted material prior to beginning a research project.

Anita Toh is a lecturer at the Centre for English Language Communication (CELC) at the National University of Singapore (NUS). She teaches academic and professional communication skills to undergraduate computing and engineering students.

Leave a comment

What do artificial intelligence systems mean for academic practice?

by Mary Davis

I attended and made a presentation at the SRHE Roundtable event ‘What do artificial intelligence systems mean for academic practice?’ on 19 July 2023. The roundtable brought together a wide range of perspectives on artificial intelligence: philosophical questions, problematic results, ethical considerations, the changing face of assessment and practical engagement for learning and teaching. The speakers represented a range of UK HEI contexts, as well as Australia and Spain, and a variety of professional roles including academic integrity leads, lecturers of different disciplines and emeritus professors.

The day began with Ron Barnett’s fierce defence of the value of authorship and the concerns about what it means to be a writer in a Chatbot world. Ron argued that use of AI tools can lead to an erosion of trust; the essential trust relationship between writer and reader in HE and wider social contexts such as law may disintegrate and with it, society. Ron reminded us of the pain and struggle of writing and creating an authorial voice that is necessary for human writing. He urged us to think about the frameworks of learning such as ‘deep learning’ (Ramsden), agency and internal story-making (Archer) and his own ‘Will to Learn’, all of which could be lost. His arguments challenged us to reflect on the far-reaching social consequences of AI use and opened the day of debate very powerfully.

I then presented the advice I have been giving to students at my institution using my analysis of student declarations of AI use which I had categorised using a traffic light system for appropriate use (eg checking and fixing a text before submission); at risk use (eg paraphrasing and summarising); and inappropriate use (eg using assignment briefs as prompts and submitting the output as own work). I got some helpful feedback from the audience that the traffic lights provided useful navigation for students. Coincidentally, the next speaker Angela Brew also used a traffic light system to guide students with AI. She argued for the need to help students develop a scholarly mindset, for staff to stop teaching as in the 18th Century with universities as foundations of knowledge. Instead, she proposed that everyone at university should be a discoverer, a learner and producer of knowledge, as a response to AI use.

Stergios Aidinlis provided an intriguing insight into practical use of AI as part of a law degree. In his view, generative AI can be an opportunity to make assessment currently fit for purpose. He presented a three-stage model of learning with AI comprising: stage 1 as using AI to produce a project pre-mortem to tackle a legal problem as pre-class preparation; stage 2 using AI as a mentor to help students solve a legal problem in class; and stage 3 using AI to evaluate the technology after class. Stergios recommended Mollick and Mollick (2023) for ideas to help students learn to use AI. The presentation by Stergios stood out in terms of practical ideas and made me think about the availability of suitable AI tools for all students to be able to do tasks like this.

The next session by Richard Davies, one of the roundtable convenors, took a philosophical direction in considering what a ‘student’s own work’ actually means, and how we assess a student’s contribution. David Boud returned the theme to assessment and argued that three elements are always necessary: assuring learning outcomes have been met (summative assessment), enabling students to use information to aid learning (formative assessment) and building students’ capacity to evaluate their learning (sustainable assessment). He argued for a major re-design of assessment, that still incorporates these elements but avoids tasks that are no longer viable.

Liz Newton presented guidance for students which emphasized positive ways to use AI such as using it for planning or teaching, which concurred with my session. Maria Burke argued for ethical approaches to the use of AI that incorporate transparency, accountability, fairness and regulation, and promote critical thinking within AI context. Finally, Tania Alonso presented her ChatGPTeaching project with seven student rules for use of ChatGPT, such as proposing use only for areas of the student’s own knowledge.

The roundtable discussion was lively and our varied perspectives and experiences added a lot to the debate; I believe we all came away with new insights and ideas. I especially appreciated the opportunity to look at AI from practical and philosophical viewpoints. I am looking forward to the ongoing sessions and forum discussions. Thanks very much to SRHE for organising this event.

Dr Mary Davis is Academic Integrity Lead and Principal Lecturer (Education and Student Experience) at Oxford Brookes University. She has been a researcher of academic integrity since 2005 and has carried out extensive research on plagiarism, use of text-matching tools, the development of source use, proofreading, educational responses to academic conduct issues and focused her recent research on inclusion in academic integrity. She is on the Board of Directors of the International Center for Academic Integrity and co-chair of the International Day of Action for Academic Integrity.

Leave a comment

Understanding the value of EdTech in higher education

by Morten Hansen

This blog is a re-post of an article first published on It is based on a presentation to the 2021 SRHE Research Conference, as part of a Symposium on Universities and Unicorns: Building Digital Assets in the Higher Education Industry organised by the project’s principal investigator, Janja Komljenovic (Lancaster). The support of the Economic and Social Research Council (ESRC) is gratefully acknowledged. The project introduces new ways to think about and examine the digitalising of the higher education sector. It investigates new forms of value creation and suggests that value in the sector increasingly lies in the creation of digital assets.

EdTech companies are, on average, priced modestly, although some have earned strong valuations. We know that valuation practices normally reflect investors’ belief in a company’s ability to make money in the future. We are, however, still learning about how EdTech generates value for users, and how to take account of such value in the grand scheme of things.

Valuation and deployment of user-generated data

EdTech companies are not competing with the likes of Google and Facebook for advertisement revenue. That is why phrases such as ‘you are the product’ and ‘data is the new oil’ yield little insight when applied to EdTech. For EdTech companies, strong valuations hinge on the idea that technology can bring use value to learners, teachers and organisations – and that they will eventually be willing to pay for such benefits, ideally in the form of a subscription. EdTech companies try to deliver use value in multiple ways, such as deploying user-generated data to improve their services. User-generated data are the digital traces we leave when engaging with a platform: keyboard strokes and mouse movements, clicks and inactivity.

The value of user-generated data in higher education

The gold standard for unlocking the ‘value’ of user-generated data is to bring about an activity that could otherwise not have arisen. Change is brought about through data feedback loops. Loops consist of five stages: data generation, capture, anonymisation, computation and intervention. Loops can be long and short.

For example, imagine that a group of students is assigned three readings for class. Texts are accessed and read on an online platform. Engagement data indicate that all students spent time reading text 1 and text 2, but nobody read text 3. As a result of this insight, come next semester, text 3 is replaced by a more ‘engaging’ text. That is a long feedback loop.

Now, imagine that one student is reading one text. The platform’s machine learning programme generates a rudimentary quiz to test comprehension. Based on the students’ answers, further readings are suggested or the student is encouraged to re-read specific sections of the text. That is a short feedback loop.

In reality, most feedback loops do not bring about activity that could not have happened otherwise. It is not like a professor could not learn, through conversation, which texts are better liked by students, what points are comprehended, and so on. What is true, though, is that the basis and quality of such judgments shifts. Most importantly, so does the cost structure that underpins judgment.

The more automated feedback loops are, the greater the economy of scale. ‘Automation’ refers to the decoupling of additional feedback loops from additional labour inputs. ‘Economies of scale’ means that the average cost of delivering feedback loops decreases as the company grows.

Proponents of machine learning and other artificial intelligence approaches argue that the use value of feedback loops improves with scale: the more users engage in the back-and-forth between generating data, receiving intervention and generating new data, the more precise the underlying learning algorithms become in predicting what interventions will ‘improve learning’.

The platform learns and grows with us

EdTech platforms proliferate because they are seen to deliver better value for money than the human-centred alternative. Cloud-based platforms are accessed through subscriptions without transfer of ownership. The economic relationship is underwritten by law and continued payment is legitimated through the feedback loops between humans and machines: the platform learns and grows with us, as we feed it.

Machine learning techniques certainly have the potential to improve the efficiency with which we organise certain learning activities, such as particular types of student assessment and monitoring. However, we do not know which values to mobilise when judging intervention efficacy: ‘value’ and ‘values’ are different things.

In everyday talk, we speak about ‘value’ when we want to justify or critique a state of affairs that has a price: is the price right, too low, or too high? We may disagree on the price, but we do agree that something is for sale. At other times we reject the idea that a thing should be for sale, like a family heirloom, love or education. If people tell us otherwise, we question their values. This is because values are about relationships and politics.

When we ask about the values of EdTech in higher education, we are really asking: what type of relations do we think are virtuous and appropriate for the institution? What relationships are we forging and replacing between machines and people, and between people and people?

When it comes to the application of personal technology we have valued convenience, personalisation and seamlessness by forging very intimate but easily forgettable machine-human relations. This could happen in the EdTech space as well. Speech-to-text recognition, natural language processing and machine vision are examples of how bonds can be built between humans and computers, aiding feedback loops by making worlds of learning computable.

Deciding on which learning relations to make computable, I argue, should be driven by values. Instead of seeing EdTech as a silver bullet that simply drives learning outcomes, it is more useful to think of it as technology that mediates learning relations and processes: what relationships do we value as important for students and when is technology helpful and unhelpful in establishing those? In this way, values can help us guide the way we account for the value of edtech.

Morten Hansen is a research associate on the Universities and Unicorns project at Lancaster University, and a PhD student at the Faculty of Education, University of Cambridge, United Kingdom. Hansen specialises in education markets and has previously worked as a researcher at the Saïd Business School in Oxford.