#141 – Richard Ngo on large language models, OpenAI, and striving to make the future go well

Large language models like GPT-3, and now ChatGPT, are neural networks trained on a large fraction of all text available on the internet to do one thing: predict the next word in a passage. This simple technique has led to something extraordinary — black boxes able to write TV scripts, explain jokes, produce satirical poetry, answer common factual questions, argue sensibly for political positions, and more. Every month their capabilities grow.
But do they really ‘understand’ what they’re saying, or do they just give the illusion of understanding?
Today’s guest, Richard Ngo, thinks that in the most important sense they understand many things. Richard is a researcher at OpenAI — the company that created ChatGPT — who works to foresee where AI advances are going and develop strategies that will keep these models from ‘acting out’ as they become more powerful, are deployed and ultimately given power in society.
One way to think about ‘understanding’ is as a subjective experience. Whether it feels like something to be a large language model is an important question, but one we currently have no way to answer.
However, as Richard explains, another way to think about ‘understanding’ is as a functional matter. If you really understand an idea, you’re able to use it to reason and draw inferences in new situations. And that kind of understanding is observable and testable.
One experiment conducted by AI researchers suggests that language models have some of this kind of understanding.
If you ask any of these models what city the Eiffel Tower is in and what else you might do on a holiday to visit the Eiffel Tower, they will say Paris and suggest visiting the Palace of Versailles and eating a croissant.
One would be forgiven for wondering whether this might all be accomplished merely by memorising word associations in the text the model has been trained on. To investigate this, the researchers found the part of the model that stored the connection between ‘Eiffel Tower’ and ‘Paris,’ and flipped that connection from ‘Paris’ to ‘Rome.’
If the model just associated some words with one another, you might think that this would lead it to now be mistaken about the location of the Eiffel Tower, but answer other questions correctly. However, this one flip was enough to switch its answers to many other questions as well. Now if you asked it what else you might visit on a trip to the Eiffel Tower, it will suggest visiting the Colosseum and eating pizza, among other changes.
Another piece of evidence comes from the way models are prompted to give responses to questions. Researchers have found that telling models to talk through problems step by step often significantly improves their performance, which suggests that models are doing something useful with that extra “thinking time”.
Richard argues, based on this and other experiments, that language models are developing sophisticated representations of the world which can be manipulated to draw sensible conclusions — maybe not so different from what happens in the human mind. And experiments have found that, as models get more parameters and are trained on more data, these types of capabilities consistently improve.
We might feel reluctant to say a computer understands something the way that we do. But if it walks like a duck and it quacks like a duck, we should consider that maybe we have a duck — or at least something sufficiently close to a duck it doesn’t matter.
In today’s conversation, host Rob Wiblin and Richard discuss the above, as well as:
- Could speeding up AI development be a bad thing?
- The balance between excitement and fear when it comes to AI advances
- Why OpenAI focuses its efforts where it does
- Common misconceptions about machine learning
- How many computer chips it might require to be able to do most of the things humans do
- How Richard understands the ‘alignment problem’ differently than other people
- Why ‘situational awareness’ may be a key concept for understanding the behaviour of AI models
- What work to positively shape the development of AI Richard is and isn’t excited about
- The AGI Safety Fundamentals course that Richard developed to help people learn more about this field
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.
Producer: Keiran Harris
Audio mastering: Milo McGuire and Ben Cordell
Transcriptions: Katy Moore


















