Real World Data Science - Why large language models should come with a content warning
Real World Data Science
News and views
Why large language models should come with a content warning
The outputs of LLMs seem impressive, but users need to be wary of possible bias, plagiarism and model ‘hallucinations’
Report an issue
Anyone who has ever been set a writing task will probably have wished at some point that somebody else could write it for them. As a journalist of 20-plus years, the thought has certainly crossed my mind more than a few times. Which probably explains why a recent headline in Nature caught my attention: “Could AI help you to write your next paper?”
The article, by Matthew Hutson, looks at how researchers are using artificial intelligence (AI) tools built on large language models (LMMs) as “assistants”. Starting with a prompt, such as “Write a headline for a blog post about large language models being used by academic researchers as research assistants”, an LLM will produce a text output. For example, using the same prompt with OpenAI’s GPT-3 , I got:
“New research assistants: large language models”
Asked to “Write a headline for a blog post that critiques academic researchers’ use of large language models as research assistants”, GPT-3 produced:
“Too much reliance on large language models can hinder research”
And when I asked “Why can too much reliance on large language models hinder research?”, GPT-3 wrote:
“If academic researchers rely too heavily on large language models as research assistants, they may become reliant on the models and may not be able to conduct research without them. This can hinder research because the researchers may not be able to think for themselves and may not be able to come up with new ideas.”
A fair point, I suppose. But I sense there’s more to this story, and rather than continue quizzing GPT-3, I sat down with Detlef Nauck, a member of the Real World Data Science Editorial Board and head of AI and data science research for BT’s Applied Research Division, to ask a few more questions.
Thanks for joining me today, Detlef. To start, could you give a brief overview of these large language models, what they are, and how they work?
Detlef Nauck (DN): Essentially, LLMs match sequences to sequences. Language is treated as a sequence of patterns, and this is based on word context similarity. The way these things work is that they either reuse or create a word vector space, where a word is mapped to something like a 300-dimensional vector based on the context it’s normally found in. In these vector spaces, words like “king” and “queen”, for example, would be very similar to each other, because they appear in similar contexts in the written texts that are used to train these models. Based on this, LLMs can produce coherent sequences of words.
But the drawback of this approach is that these models have bias, because they are trained with biased language. If you talk about “women”, for example, and you look at which job roles are similar to “women” in a vector space, you find the stereotypically “female” professions but not technical professions, and that is a problem. Let’s say you take the word vector for “man” and the word vector for “king”, and you subtract “man” and then add this to “woman”, then you end up with “queen”. But if you do the same with “man”, “computer scientist”, and “woman”, then you end up maybe at “nurse” or “human resources manager” or something. These models embed the typical bias in society that is expressed through language.
The other issue is that LLMs are massive. GPT-3 has something like 75 billion parameters, and it cost millions to train it from scratch. It’s not energy efficient at all. It’s not sustainable. It’s not something that normal companies can afford. You might need something like a couple of hundred GPUs [graphics processing units] running for a month or so to train an LLM, and this is going to cost millions in cloud environments if you don’t own the hardware yourself. Large tech companies do own the hardware, so for them it’s not a problem. But the carbon that you burn by doing this, you could probably fly around the globe once. So it’s not a sustainable approach to building models.
Also, LLMs are quite expensive to use. If you wanted to use one of these large language models in a contact centre, for example, then you would have to run maybe a few hundred of them in parallel because you get that many requests from customers. But to provide this capacity, the amount of memory needed would be massive, so it is probably still cheaper to use humans – with the added benefit that humans actually understand questions and know what they are talking about.