I would recommend everyone who has calibrated intuitions about AI timelines spend some time doing stuff with real robots and it will probably … how should I put this? … further calibrate your intuitions in quite a humbling way.
Dactyl is an AI system that can manipulate objects with a human-like robot hand. OpenAI Five is an AI system that can defeat humans at the video game Dota 2. The strange thing is they were both developed using the same general-purpose reinforcement learning algorithm.
How is this possible and what does it show?
In today’s interview Jack Clark, Policy Director at OpenAI, explains that from a computational perspective using a hand and playing Dota 2 are remarkably similar problems.
A robot hand needs to hold an object, move its fingers, and rotate it to the desired position. In Dota 2 you control a team of several different people, moving them around a map to attack an enemy.
Your hand has 20 or 30 different joints to move. The number of main actions in Dota 2 is 10 to 20, as you move your characters around a map.
When you’re rotating an objecting in your hand, you sense its friction, but you don’t directly perceive the entire shape of the object. In Dota 2, you’re unable to see the entire map and perceive what’s there by moving around — metaphorically ‘touching’ the space.
Read our new in-depth article on becoming an AI policy specialist: The case for building expertise to work on US AI policy, and how to do it
This is true of many apparently distinct problems in life. Compressing different sensory inputs down to a fundamental computational problem which we know how to solve only requires the right general purpose software.
OpenAI used an algorithm called Proximal Policy Optimization (PPO), which is fairly robust — in the sense that you can throw it at many different problems, not worry too much about tuning it, and it will do okay.
Jack emphasises that this algorithm wasn’t easy to create, and they were incredibly excited about it working on both tasks. But he also says that the creation of such increasingly ‘broad-spectrum’ algorithms has been the story of the last few years, and that the invention of software like PPO will have unpredictable consequences, heightening the huge challenges that already exist in AI policy.
Today’s interview is a mega-AI-policy-quad episode; Jack is joined by his colleagues Amanda Askell and Miles Brundage, on the day they released their fascinating and controversial large general language model GPT-2.
- What are the most significant changes in the AI policy world over the last year or two?
- How much is the field of AI policy still in the phase of just doing research and figuring out what should be done, versus actually trying to change things in the real world?
- What capabilities are likely to develop over the next five, 10, 15, 20 years?
- How much should we focus on the next couple of years, versus the next couple of decades?
- How should we approach possible malicious uses of AI?
- What are some of the potential ways OpenAI could make things worse, and how can they be avoided?
- Publication norms for AI research
- Where do we stand in terms of arms races between countries or different AI labs?
- The case for creating a newsletter
- Should the AI community have a closer relationship to the military?
- Working at OpenAI vs. working in the US government
- How valuable is Twitter in the AI policy world?
Rob is then joined by two of his colleagues — Niel Bowerman and Michelle Hutchinson — to quickly discuss:
- The reaction to OpenAI’s release of GPT-2
- Jack’s critique of our US AI policy article
- How valuable are roles in government?
- Where do you start if you want to write content for a specific audience?
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below.
The 80,000 Hours Podcast is produced by Keiran Harris.