#201 – Ken Goldberg on why your robot butler isn’t here yet

By Luisa Rodriguez and Keiran Harris · Published September 13th, 2024 ·

#201 – Ken Goldberg on why your robot butler isn’t here yet

By Luisa Rodriguez and Keiran Harris · Published September 13th, 2024

Perception is quite difficult with cameras: even if you have a stereo camera, you still can’t really build a map of where everything is in space. It’s just very difficult. And I know that sounds surprising, because humans are very good at this. In fact, even with one eye, we can navigate and we can clear the dinner table.

But it seems that we’re building in a lot of understanding and intuition about what’s happening in the world and where objects are and how they behave. For robots, it’s very difficult to get a perfectly accurate model of the world and where things are. So if you’re going to go manipulate or grasp an object, a small error in that position will maybe have your robot crash into the object, a delicate wine glass, and probably break it. So the perception and the control are both problems.

— Ken Goldberg

In today’s episode, host Luisa Rodriguez speaks to Ken Goldberg — robotics professor at UC Berkeley — about the major research challenges still ahead before robots become broadly integrated into our homes and societies.

They cover:

Why training robots is harder than training large language models like ChatGPT.
The biggest engineering challenges that still remain before robots can be widely useful in the real world.
The sectors where Ken thinks robots will be most useful in the coming decades — like homecare, agriculture, and medicine.
Whether we should be worried about robot labour affecting human employment.
Recent breakthroughs in robotics, and what cutting-edge robots can do today.
Ken’s work as an artist, where he explores the complex relationship between humans and technology.
And plenty more.

Producer: Keiran Harris
Audio engineering: Dominic Armstrong, Ben Cordell, Milo McGuire, and Simon Monsour
Content editing: Luisa Rodriguez, Katy Moore, and Keiran Harris
Transcriptions: Katy Moore

Highlights

Moravec's paradox

Ken Goldberg: Moravec’s paradox can simply be summed up as: What’s hard for humans, like lifting heavy objects, is easy for robots; and what’s easy for humans, like stacking some blocks on a table or cleaning up after a dinner party, is very hard for robots. That’s the paradox, and it’s been true for 35 years, since Hans Moravec — who’s still around; he’s based in Pittsburgh — observed this, and correctly labelled it as a paradox. Because it is counterintuitive, right? Why should we have this differential? But it’s still true today. So the paradox itself is undeniable, I think.
But why is this paradox is a very interesting question. And you raised one possible explanation about evolution: that humans have had the benefit of millions of years of evolution. We’ve evolved these reactions and these sensory capabilities that are giving us that fundamental stratum, substrate that lets us perform these tasks. And you’re right in that speaking language is much more recent, comparatively.
I can tell you from one perspective: let’s take it from the dimensionality perspective. This means how many degrees of freedom do you have in a system? So a one-dimensional system is you’re just moving along a line, two dimensions is you’re moving in a plane, and then three dimensions is you’re moving in space.
But then for robots, you also have to think about: you have an object in space, and you have its position in space. So here’s my glasses, and they’re in a position in space, but then there’s also their orientation in space. So that’s three more degrees of freedom. So there’s six degrees of freedom from an object moving in space. That’s what we typically talk about with robots. That’s why robots need at least six joints to be able to achieve an arbitrary position and orientation in space. So that’s six degrees of freedom.
But now if you add all the nuances, let’s say, of a hand with fingers, you now have maybe 20 degrees of freedom, right? So each time you do that, you think of a higher-dimensional space, and each one of those is much bigger. It grows exponentially with the number of degrees of freedom. That is an exponential. There’s no doubt about it. That’s a real exponential.
And if you look at language, it’s actually a beautiful example of this, because it’s really linear: it’s one-dimensional. Language is a sequence, a string of words. And the set of words is relatively small: it’s something like 20,000 words are typically used in, say, English. So you’re now looking at combinations of all different combinations of those 20,000 words in a linear sequence. So there’s many — a huge, vast number — but it’s much smaller than, say, the combination of motions that can occur in space. Because those are many infinities greater than the number of sentences that can be put together.

Successes in robotics to date

Luisa Rodriguez: In general, what are the kind of characteristics of tasks that robots have been able to master?
Ken Goldberg: I would start by saying there’s two areas that have been very successful for robots in the past decade. The first is quadrotors — that is, drones. That was a major set of breakthroughs that were really in the hardware and control areas, and that allowed these flying robots, drones, to be able to hover stably in space. And then once you could get them to hover, you could start to control their movement very precisely. And that has been a remarkable set of developments. And now you see spectacular results of this. If you’ve seen some of these drone sky exhibitions — where they’re in formations, moving around in three dimensions — it’s incredible.
Luisa Rodriguez: Yeah, they’re incredible.
Ken Goldberg: Yeah. And that has also been very useful for inspection and for photography. They’re extremely widely used in Hollywood, and just at home sales — typically, drones fly around and give you these aerial views. So that’s been a really big development, and there’s a lot of beautiful technology behind that.
Another one, interestingly, is also quadrupeds, which are four-legged robots. And those are the dogs, like you see from Boston Dynamics, which was really a pioneer there, but now there’s many of those. In fact, there’s a Chinese company that sells one for about $2,000 on eBay, Unitree — and it’s amazing, because they’ve pretty much gotten very similar functionality. It’s not as robust, and it’s smaller and more lightweight, but it has the ability to walk over very complex terrain. You can take it outside and it will climb over rubble and rocks things like that very well. And it just works out of the box.
Luisa Rodriguez: Cool!
Ken Goldberg: That was, again, the result of a number of things: new advances in motors and the hardware, but also in the control. And here, there were a lot of nuances in being able to control the legs, and learning played a key role there. In particular, this technique called model predictive control — which is an older technique, but, combined with deep learning, was able to address that problem and basically get these systems to be able to walk over very complex terrain and adapt, and even jump over things. And you can probably see that they can fall, roll down some stairs, and get up and keep going. So they can roll over, and they can, in some cases, do a backflip, which is really incredible.

Why perception is a big challenge for robotics

Ken Goldberg: Perception is quite difficult with cameras: even if you have a stereo camera, you still can’t really build a map of where everything is in space. It’s just very difficult. And I know that sounds surprising, because humans are very good at this. In fact, even with one eye, we can navigate and we can clear the dinner table.
But it seems that we’re building in a lot of understanding and intuition about what’s happening in the world and where objects are and how they behave. For robots, it’s very difficult to get a perfectly accurate model of the world and where things are. So again, if you’re going to go manipulate or grasp an object, a small error in that position will maybe have your robot crash into the object, a delicate wine glass, and probably break it. So the perception and the control are both problems.
Then the other one that’s more subtle and I think really interesting is in physics. The nuance there is that you can imagine, just take up a pencil or pen, put it in front of you on a flat table, and then just push it with one finger forward. What will happen is it’ll move for a minute, and then it’ll rotate away from your finger as you do it. Now, why does that happen? It turns out that it really depends on the very microscopic details of the surface of your table and the shape of your pencil. Those are essentially making contacts and breaking contacts as you’re moving it along, as you’re pushing it.
The nature of those contacts is impossible to perceive, because they’re underneath the pencil. So you, by looking down, can’t see what’s going on, so you really can’t predict how that pencil is going to move. And this is an example of really an undecidable problem. I mean, it’s unsolvable. And I like to say you don’t have to go into quantum physics to find the very difficult problems that inherently cannot be solved. It’s not a matter of getting better sensors or better physics models, but we just can never predict that. We’ll never be able to, because it depends on these conditions which we can’t perceive.
So we have to find ways to compensate for all three of these factors: the control errors, the perception errors, and the physics errors, I call it. And humans do it. We have an existence proof. We know it can be done. And humans have the same limitations. So how do we do it? That’s the million-dollar question, or billion-dollar question.
Luisa Rodriguez: Yeah. No kidding. Well, that made me want to ask: you said it’s an unsolvable problem, and maybe I’m just kind of misunderstanding, but do you mean that it’s unsolvable with the perception we have now for robotics? Because I don’t know exactly where the pencil I push on the table is going to end up, but I do manipulate pencils and make pretty good predictions about what would happen if I nudged it forward.
Ken Goldberg: Yeah. So you do. But what I mean is that you cannot predict exactly where that pencil is going to move if you start to push it.
Luisa Rodriguez: So humans can’t do it as well, this thing that you’re talking about?
Ken Goldberg: Right, right. No one can. I’m saying that no model in the universe can solve this problem, because it depends on what’s happening at the really almost submicroscopic level that is basically going to influence how that pencil responds to being pushed. It’s friction, but in a very nuanced way. And in friction, we have this model we all learn in college or in high school called Coulomb friction. It’s a reasonable approximation, but it’s a rough approximation to the real world, and it doesn’t really describe what happens when you push an object across a surface, how it’s going to move. So this is known to be a very nuanced and subtle problem, and it’s right in front of us.
And you’re right: we solve it. “Solve it” means not to predict it exactly; what we do is we compensate for the error by the scooping motion, where we move our fingers in a way around the object. And we have a word for that: we call it “caging.” Caging is where you put your fingers around an object in such a way that it is caged, meaning that it can’t escape. It can move around, it can rattle around inside that cage, but it can’t get out of the cage. Once you put your fingers in a cage around the object, now you start to close your fingers, there’s nowhere for the object to go, so it generally will end up in your [hand]. You’ll be able to pick it up.

Why low fault tolerance makes some skills extra hard to automate

Luisa Rodriguez: Why do you think we won’t have fully autonomous robot surgeons in the next 30 or 40 years?
Ken Goldberg: The issue here is fault tolerance. I’m glad you brought it up, because this is why self-driving cars are particularly complicated and challenging, because they’re prone to a small fault. A small error could be quite disastrous, as you know. If you’re driving on a cliff, small error and you go over the side. Or you bump into a stroller, run over a kid. So driving is very challenging because of that, in contrast to logistics — because in logistics, if you drop a package, it’s no big deal. In fact, it happens all the time; they expect it to happen a fairly large amount of time. So if something like 1% of packages get dropped, it’s OK, that’s not a big deal. You can live with it.
But driving is not very fault tolerant; in surgery, even less so. You have to be really careful because you don’t want to puncture an organ or something, or sew two things together that shouldn’t be sewn together, right? So there’s a huge consequence.
The other thing is perception. Because inside the body, it’s very challenging, because oftentimes there’s blood; or if it’s a laparoscopic surgery, you’re constantly essentially trying to avoid the blood and staunch out the blood so that you can see what’s going on.
And this is where, just as you were describing watching someone crack an egg, surgeons have developed this really good intuition — because they know what the organs are, they know what they should look like, how they’re positioned, and how, let’s say, thick or rough, or what their surfaces and their materials are.
So they have very good intuition behind that, so they can operate. Sometimes you cut a blood vessel and the whole volume fills with blood, and now you have to find that blood vessel and clamp it, so that you can stop the blood. And that’s like reaching into a sink filled with murky water and finding the thing, right? Surgeons are very good at that, and it’s a lot of nuance.
So the perception problem is extremely difficult, because everything is deformable. Deformable materials are particularly difficult for robots. We talked about cracking an egg or clearing a dinner table: generally, all those objects are rigid. But when you start getting into deformable things — like cables or fabrics or bags, or a human body, right now — all of a sudden, everything is just bending and movable in very complex ways. And that’s very hard to model, simulate, or perceive.
Luisa Rodriguez: Right. Yeah, I’m just finding it fascinating how the categories of things that are really troublesome, thorny problems for robots are just not what I’d expect. I mean, the fact that we’re making progress on suturing, but it gets really complicated as soon as an organ… You know, you could move it and it’s hard to predict how it’s going to look when it moves or where it’s going to be. It is just unexpected and really interesting.
Ken Goldberg: Absolutely. And as you’re saying this, I’m thinking, going back to the kitchen — you know, kitchen workers in restaurants — there’s so much nuance going on there, if you’re chopping vegetables or you’re unpacking things. Let’s say every single chicken breast is slightly different. So being able to manipulate those kinds of things, and then clean surfaces, and wipe things, and stir — there’s so many complex nuances.
So I think it’s going to be a long time before we have really fully automated kitchen systems. And the same is true for plumbers, carpenters, and electricians. Anyone who’s basically doing these kinds of manual tasks, fixing a car, they require a vast amount of nuance. So those jobs are going to be very difficult to automate.

How might robot labour affect the job market?

Ken Goldberg: So here’s where I also think there’s a huge amount of fear out there, and it’s really important to reassure workers that this is not imminent and that what they do is very valuable and safe from automation. I think everyone’s been saying for years that we’re going to have lights-out factories, and that humans will just sit around and have all this leisure time — maybe even like Wall-E or something, right?
But no, we’re very far from that. The fact is that there’s so many nuances to what humans do in jobs, and AI is a whole realm of office workers and what they do. And I think that there’s certainly many aspects of jobs that can be automated. For example, transcribing this interview is a perfect example: in the past it had to be done by a human; now you’ve got a machine to do most of it, then you tune it. You still have to fine-tune it, but you get a lot of the basic elements done. We have a lot of tools that make certain aspects of our job, but then, because every other aspect of our job needs more attention than we have, we can spend time on that.
The same is true for so many of these things we’re talking about. So gardeners, et cetera, you know, there’ll still be a need for the gardener to be doing the more subtle things, but maybe some machine will be out there doing the lawn. We’re getting closer to automated lawn mowers. And for workers, I think certain things in the kitchen may be automated. And obviously, we have dishwashers, we have certain automation already we’ve had for a long time. But that doesn’t mean we don’t need human workers. So I think we’re going to need them for things that are much more subtle.
So I’m sort of optimistic about the job market. I think that because of this demographic thing, that’s the biggest factor: we have a shortage of human workers, people who are of work age, so I don’t think there’s going to be this kind of unemployment that people are talking about or fearing for the foreseeable future.
Luisa Rodriguez: Yeah. If in, let’s say, 10 or 20 years, robots are super widespread and are causing real job displacement, what do you think played out differently to how you expected?
Ken Goldberg: OK, so let’s say one of these breakthroughs we’re talking about happens, and all of a sudden then robots are capable of learning from YouTube videos and repeating anything they watch. Or maybe you demonstrate, this is how I want to chop these vegetables, and now it’s able to repeat that reliably. So if we got those breakthroughs, then you could imagine that you’d have these robots. Another factor we haven’t even touched on, which is not as interesting, is just the cost, the financials of doing these. But let’s say that gets finessed, too.
So now all of a sudden, you have these robots, and they’re actually pretty capable, and we’re seeing them increasingly being put to use and actually doing something useful. Then I think it will be interesting. I think that would change our perceptions of them. My own sense is that we would find new work for humans to do, that we would basically shift toward other things that are more subtle, let’s say maybe it’s healthcare and things like that. We have a shortage of humans that can do those things, and also teaching and childcare. And there’s a lot of things where we are just still shorthanded. So I think that people will find new jobs, but some of these things might be automated.
I guess the extreme form of this is that you have a robot that can do anything that a human can do and you just have them doing it all. And then what? We hang out, we can spend time playing music and writing poetry and doing all the fun stuff. And it’s an interesting prospect. Maybe we’ll drive ourselves crazy because we’ll have so much free time, you know?

Articles, books, and other media discussed in the show

Ken’s work and appearances:

TED Talk: Why don’t we have better robots yet?
Kenneth Y. Goldberg: artist, engineer, inventor, entrepreneur & AI researcher — on the Robots for the Rest of Us podcast
Video clip from Ken’s Stanford talk: “Is data all you need? Large robot action models and good old-fashioned engineering“
Video clip on third-wave robot grasping
Watch this robot as it learns to stitch up wounds — an article on Ken’s team’s latest breakthrough in MIT Technology Review
Ken Goldberg isn’t scared of artificial intelligence by Coby McDonald
Art projects, including:
LERF: Language embedded radiance fields (with coauthors)
Dex-NeRF: Using a neural radiance field to grasp transparent objects (with coauthors)
Evo-NeRF: Evolving NeRF for sequential robot grasping of transparent objects (with coauthors)
Learning to localize, grasp, and hand over unmodified surgical needles (with coauthors)
Ambi Robotics and Jacobi Robotics, where Ken is cofounder and chief scientist
Ken’s homepage has all his latest research, media appearances, and artworks

Others’ work in this space:

The robot surgeon will see you now by Cade Metz
Scaling up learning across many different robot types — by Quan Vuong and Pannag Sanketi at DeepMind
Spot® – The Agile Mobile Robot by Boston Dynamics
Giving robots a sense of touch with GelSight — by Larry Hardesty at MIT News
OpenVLA: An open-source vision-language-action model by Moo Jin Kim et al.

Other 80,000 Hours podcast episodes:

Transcript

Table of Contents

1 Cold open [00:00:00]
2 Luisa’s intro [00:01:19]
3 The interview begins [00:02:51]
4 General purpose robots and the “robotics bubble” [00:03:11]
5 How training robots is different than training large language models [00:14:01]
6 What can robots do today? [00:34:35]
7 Challenges for progress: fault tolerance, multidimensionality, and perception [00:41:00]
8 Recent breakthroughs in robotics [00:52:32]
9 Barriers to making better robots: hardware, software, and physics [01:03:13]
10 Future robots in home care, logistics, food production, and medicine [01:16:35]
11 How might robot labour affect the job market? [01:44:27]
12 Robotics and art [01:51:28]
13 Luisa’s outro [02:00:55]

Cold open [00:00:00]

Ken Goldberg: A one-dimensional system is you’re just moving along a line, two dimensions is you’re moving in a plane, and then three dimensions is you’re moving in space.

But then for robots, you also have to think about: you have an object in space, and you have its position in space. But then there’s also their orientation in space. That’s three more degrees of freedom. So there’s six degrees of freedom from an object moving in space.

But now if you add all the nuances, let’s say, of a hand with fingers, you now have maybe 20 degrees of freedom. So each time you do that, you think of a higher-dimensional space, and each one of those is much bigger. It grows exponentially with the number of degrees of freedom.

And if you look at language, it’s actually a beautiful example of this, because it’s really linear: it’s one-dimensional. Language is a sequence, a string of words. And the set of words is relatively small: it’s something like 20,000 words are typically used in, say, English. So you’re now looking at combinations of all different combinations of those 20,000 words in a linear sequence. A huge, vast number — but it’s much smaller than the combination of motions that can occur in space. Because those are many infinities greater than the number of sentences that can be put together.

Luisa’s intro [00:01:19]

Luisa Rodriguez: Hi listeners, this is Luisa Rodriguez, one of the hosts of The 80,000 Hours Podcast.

In today’s episode, I speak with roboticist Ken Goldberg about how far away we are from a Jetsons-like future with robot helpers in every aspect of society, and whether the current buzz around robotics is realistic or just wishful thinking.

We talk about:

What the cutting-edge robots of today can do, and the areas where they still struggle.
Why it’s actually much easier for machines to learn language than it is for them to learn to manipulate objects in the real world.
Where we should expect to see robot labour in the coming decades — like warehouses, agriculture, and medicine — how much human workers should be worried.
Plus lots more.

As you’ll hear, Ken doesn’t put much weight on AI causing an intelligence explosion, and we didn’t spend our time debating that, because I wanted to learn as much from him about the current state of the art and bottlenecks in robotics. But if you want to hear the case for an intelligence explosion and haven’t listened to it yet, I highly recommend episode #191 – Carl Shulman on the economy and national security after AGI.

All right, without further ado, I bring you Ken Goldberg.

The interview begins [00:02:51]

Luisa Rodriguez: Today I’m speaking with Ken Goldberg. Ken is a professor and the William S. Floyd Jr. Distinguished Chair in Engineering at UC Berkeley, a cofounder and chief scientist at robotics startups Ambi Robotics and Jacobi Robotics, and also a well-known artist. Thanks for coming on the podcast, Ken.

Ken Goldberg: Thank you, Luisa. It’s a pleasure.

General purpose robots and the “robotics bubble” [00:03:11]

Luisa Rodriguez: So I’m hoping to talk about the impact robots will have on society and why we don’t have better robots already. But first, AI seems increasingly on its way to becoming general purpose. Is robotics headed in the same direction? To what extent is it on track to kind of master a bunch of human tasks?

Ken Goldberg: Good. All right. This is something I think is really important, and I feel somewhat of a professional responsibility to provide some perspective on this. Because, like everyone else, I’m watching the news. I’m seeing the announcements by Elon Musk and Jensen Huang from Nvidia, and this sort of huge wave of optimism around robots. And I don’t want to be a negative, but I think it’s really important to inject some realism into the conversation.

What I mean by that is that we are making progress. There’s a lot of reasons to be excited about these new developments — in particular, deep learning and generative AI and these large models — but there’s also a number of really deep and fundamental challenges in robotics. My specialty is in robotics, so I can really speak to that and why I feel that some of the expectations are exaggerated, and I worry that there’s a consequence to that — which is, if we’re not careful, these exaggerations will essentially blow up and we’ll have a huge decline. Basically, a bubble will burst, and then there’ll be a lot of negative response. Whiplash, essentially. So I want to put things into perspective.

Luisa Rodriguez: Great. Yeah, I want to ask about the bubble. First, for anyone who doesn’t follow the news as closely, what are the kinds of things — you mentioned Tesla and Nvidia — that are kind of giving the sense of robotics hype?

Ken Goldberg: Well, there’s these various waves of excitement around AI and robotics over the years, and generally, I want to be supportive of those. Obviously, they’re very good for our field, and that they bring a lot of attention and interest and resources for us to do research. So I’m not opposed to those.

But what’s happened is that in the last six months, there’s been a huge surge of interest in humanoids in particular. Jensen Huang went on stage and was surrounded by nine different humanoids and basically said, this is coming and this is the future. And he’s a very trusted authority because he was right about GPUs, and his company is one of the fastest growing, if not the fastest growing, and most successful company in the world. And similarly, Elon Musk has been talking about humanoids for several years. He’s very compelling, and he’s been very successful in space, and he’s made Tesla very successful. So I think people trust them, and they believe that they have a vision of what’s coming.

I think this also meshes with the ongoing and longstanding challenge, which is that science fiction has been talking about and portraying realistic, humanlike robots for generations. So that’s in our sort of subconscious: we’ve seen these things before, so they don’t feel unreal to us, and we’re sort of like, “That it seems like it should happen any day. Why hasn’t it even happened so far?” That’s the big question: “What are we waiting for? What’s taking so long?” And that’s the fundamental. I want to be able to address that question.

Luisa Rodriguez: Great. OK, so you’re worried about there being a bubble. Can you kind of flesh out the negative consequences if that bubble were to burst? So if there was a bunch of robotics hype, and then in the next decade the hype didn’t play out? And I’m sure the field will advance, but maybe we don’t get the kinds of advances people are expecting — given, for example, the advances in LLMs recently. What would be the cost of that?

Ken Goldberg: So there’s a well known pattern known as the Gartner hype cycle, which basically looks like there’s a big rise of expectations, then there’s a peak of expectations, then there’s a trough of disillusionment where the expectations are not met, and people really end up feeling very frustrated and essentially betrayed that they were led to believe something that doesn’t pan out. And then essentially, they throw the baby out with the bathwater: they say this whole thing is just hype, and there’s nothing there.

And then the field goes into what we would call a “winter season,” and then a lot of funding dries up, interest dries up — and that may last quite a long time until the next cycle starts.

Luisa Rodriguez: Right.

Ken Goldberg: But also, generally what happens is that there’s some renewed development years later that is actually real and comes gradually — and doesn’t ever live up to the great expectations, but lives up to something that’s reasonably important and significant.

A perfect example of this is the Internet pattern. So the early days of the internet — and you’re too young to remember this — there was huge excitement: this is going to change the world, world peace, all these things. Then there was a bubble, and it burst around the year 2000, and all these companies went bankrupt. There was a lot of sense that this was just all hype. And that somewhat lasted for years, and then it slowly ramped up. And now, of course, the internet is very significant, but obviously also has a number of unintended consequences.

But what I want to say is, language models have had this huge burst of development. So what I see is technology is not developed in terms of exponential growth. That is extremely rare. The counterexample everyone will talk about is Moore’s law in terms of microchips. And that is truly, that is surprising, shocking that it’s continued this long, and it keeps doubling every 18 months. That is amazing. But I think it’s very hard to find another example of anything like that. So for other things, we want that exponential. People like to talk about it, but the reality is it almost never works out to be exponential.

Now, in artificial intelligence, it’s complicated because of the fact that this is a technology that everybody, in a sense, feels they know — because they’ve seen it on TV and in books. I grew up with The Jetsons. Jetpacks and robots were around the corner. So that’s where these things are very visible. Whereas some technologies, like quantum computing, is something people don’t have much experience with. They don’t know, they don’t have expectations around it because that’s not really portrayed very commonly, or it’s not very easy to visualise. But robots are, so we all have a history of robotics in our minds.

And in fact, it goes way back to the ancient Egyptians and ancient Greeks. The Pygmalion story, up through the golem, Frankenstein, all these. There’s a beautiful history of archetypes around this idea of artificial creatures that almost always run amok and cause great suffering by their creators. That archetype is repeated over and over again in so many different stories, so it’s really deeply rooted, at least in the Western imagination. We also fear them. We want them, but we also fear them. So there’s this very complex relationship we have with robots.

Luisa Rodriguez: So yeah, this kind of vision people have — that is very influenced by media of humanoid robots who help us do all sorts of things, like in The Jetsons — is your view that it’s going to take a long time to get to anything like that, and that we’re currently kind of expecting too much in the short term? Or is your view more that that’s just a pretty unrealistic expectation for any time in the foreseeable… I don’t know, several decades?

Ken Goldberg: I think there’s certainly grounds for optimism. There’s been a number of breakthroughs, and I can go through what I see as breakthroughs over the last 20 years.

What I also see is there’s gaps between our ability to do certain things — like manipulation tasks: those are very challenging, and we can go into why those are challenging — and that it’s not clear that any of those breakthroughs are going to solve those problems. Now, there may be new breakthroughs, and I would love those to happen. But I also want to be realistic about the challenges that are here.

I think the big point is: although Tesla and Nvidia and others are saying we now have all the ingredients now where it’s going to happen, I don’t think that’s true. We do not have the ingredients. We have new ingredients — there’s several new technologies that we can put into place — but those ingredients are not sufficient to solve the problems that I believe are still incredibly challenging, which is manipulating objects the way humans do.

How training robots is different than training large language models [00:14:01]

Luisa Rodriguez: Yeah, I want to come back to what robots can and can’t do in a bunch of detail in a bit. But the vague picture I’ve gotten the sense of, from your work and from learning a bit about this topic, is just that it feels to me like we’ve achieved something closer to general-purpose thinking machines in LLMs than we have in robotics, as far as I’ve seen.

And we’ll talk about examples of this, but it’s things like motion, depth perception, grasping — which is the thing that you work on — are all so much more difficult than I would have expected when I’ve started to watch videos of the state of the art of these types of tasks. And one, I’m curious if you kind of agree that that’s roughly right. But also, if it is, why has it been so much easier to create algorithms that can imitate human language than it has been to create robots that can imitate human motions, like grasping?

Ken Goldberg: Good. OK, so there’s no doubt that the breakthrough of large language models is one of these punctuated moments, where it’s a major, major breakthrough. And I absolutely want to acknowledge the importance of that.

My sense of that is that it is largely that this model is very good at interpolation — meaning that if you give it enough data, it is able to interpolate: that is, respond in ways that are consistent with those patterns in the past. And it’s not just repeating; it’s definitely creating things that are novel. I would say it’s astoundingly good at things that surprise me — for example, creativity, in the sense of being able to write poetry or come up with novel sentences and essentially stories.

But this is what I mean by “interpolation”: that it’s able to take things it’s seen and combine them in interesting ways. Now, it has to have sufficient data to be able to do this. I think what’s the big surprise is that the amount of data that’s been collected on the internet, the terabytes of data that’s out there, has turned out to be sufficient to achieve this level of surprising performance.

Luisa Rodriguez: So the analogous thing with robotics would be something like, take all the data we have — let’s say videos that involve grasping — and see if you can use that to train a model of what it means to grasp, and teach a robot to do that? Is it just the case that that hasn’t worked, or doesn’t seem like it will work for reasons that might not be obvious?

Ken Goldberg: Good, so you’re right. That’s actually the hope right now, and that’s part of the excitement right now, is that we can do an analogous thing with robotics. In other words, if we see enough sequences of robots manipulating, then it can learn to manipulate novel scenarios. So this is what’s behind these major efforts. One of them is headed by Google DeepMind. That’s the cross-embodiment or robot transformer project, which is to collect vast datasets — and in this case, the data has to be a combination of words, video, and control signals to the robot. That is, we need to know what the robot is doing in parallel with the images and text.

So those examples you can’t find on YouTube or anywhere, because on YouTube you have videos — and it’s true, you have many videos of humans manipulating things — but we then have to extract the motions of the human hands, and then figure out how to map those onto robots. So there’s a lot of challenges, and even just seeing what’s happening in a scene is very challenging.

That’s the hope, though, and there’s a lot of excitement around it. There’s these major efforts underway to collect large datasets, and some very preliminary results that suggest that something can work. But this is where I’ve been particularly sceptical, I would say, of the extrapolation that many people are doing, saying, “OK, we’ve seen this result. Now, in the next year or so, this will start generalising, and you’ll have emergent general-purpose robots.”

And I can go through what I see as my concerns or my scepticism about this, but I’m giving this talk that’s called “Is data all we need? Large robot models and good old-fashioned engineering.” In a nutshell, that sort of tells you where I’m thinking — and a lot of roboticists are agreeing with me — that data is not all we need; that we actually need to combine with engineering, because the data is not available at the scale that we would need, because it’s a very high-dimensional problem.

Luisa Rodriguez: Yeah, I want to understand that bit better: the difference between the physical problems and the problem that is stringing words together to make coherent thoughts and creative thoughts.

One explanation for this is called Moravec’s paradox. I’ll try to give my understanding of it and then you can tell me where I’m wrong and add to it. So it’s kind of an evolutionary explanation: older skills like motion and depth perception have had more time for natural selection to hone a particular design, whereas more recent skills like language seem hard to us — not because they’re inherently difficult, but because they’re just relatively newer to us. Is that an explanation that you buy, or put weight on?

Ken Goldberg: Well, yes and no. First of all, I want to separate Moravec’s paradox into the paradox itself, and then possible explanation for the paradox.

Moravec’s paradox can simply be summed up as: What’s hard for humans, like lifting heavy objects, is easy for robots; and what’s easy for humans, like stacking some blocks on a table or cleaning up after a dinner party, is very hard for robots. That’s the paradox, and it’s been true for 35 years, since Hans Moravec — who’s still around; he’s based in Pittsburgh — observed this, and correctly labelled it as a paradox. Because it is counterintuitive, right? Why should we have this differential? But it’s still true today. So the paradox itself is undeniable, I think.

But why is this paradox is a very interesting question. And you raised one possible explanation about evolution: that humans have had the benefit of millions of years of evolution. We’ve evolved these reactions and these sensory capabilities that are giving us that fundamental stratum, substrate that lets us perform these tasks. And you’re right in that speaking language is much more recent, comparatively.

I can tell you from one perspective: let’s take it from the dimensionality perspective. This means how many degrees of freedom do you have in a system? So a one-dimensional system is you’re just moving along a line, two dimensions is you’re moving in a plane, and then three dimensions is you’re moving in space.

But then for robots, you also have to think about: you have an object in space, and you have its position in space. So here’s my glasses, and they’re in a position in space, but then there’s also their orientation in space. So that’s three more degrees of freedom. So there’s six degrees of freedom from an object moving in space. That’s what we typically talk about with robots. That’s why robots need at least six joints to be able to achieve an arbitrary position and orientation in space. So that’s six degrees of freedom.

But now if you add all the nuances, let’s say, of a hand with fingers, you now have maybe 20 degrees of freedom, right? So each time you do that, you think of a higher-dimensional space, and each one of those is much bigger. It grows exponentially with the number of degrees of freedom. That is an exponential. There’s no doubt about it. That’s a real exponential.

And if you look at language, it’s actually a beautiful example of this, because it’s really linear: it’s one-dimensional. Language is a sequence, a string of words. And the set of words is relatively small: it’s something like 20,000 words are typically used in, say, English. So you’re now looking at combinations of all different combinations of those 20,000 words in a linear sequence. So there’s many — a huge, vast number — but it’s much smaller than, say, the combination of motions that can occur in space. Because those are many infinities greater than the number of sentences that can be put together.

Luisa Rodriguez: That was so beautifully explained for me. I feel like I had some understanding of what your answer might be here, but that was super, super clear and helped me understand way better.

Ken Goldberg: Well, I appreciate that. I think it’s hard to wrap our heads around the complexity of these problem spaces. We call it the “state space” of different problems. So when we look at a sentence or something produced by ChatGPT, it’s very novel and interesting and surprising to us, and it’s hard for us to step back and realise, well, but it’s seen so many examples of that pattern, similar patterns, so it’s reached a critical mass and is now able to start to generalise.

We talk about generalisation “within distribution” and “outside the distribution.” So you understand that difference: “within distribution” is you’ve seen a whole bunch of points. I actually think of it as a target: you’re doing target practice, and you’re putting a bunch of points all around, and now you want to pick a new point that’s somewhere within those points you’ve sampled, and you’re pretty able to hit those points.

But now, all of a sudden, you have a target that’s way off: 20 yards off to the left. Well, now all those samples are not helping you, because you’re in a new domain you haven’t seen before. So that’s called “out-of-distribution” generalisation, and that’s very hard because you don’t have any basis to go on there. So you need to be able to sample well within the distribution.

In that case, the dartboard is two-dimensional, because you’re taking those samples. And in robotics, imagine that being 20 or higher dimensional degrees of freedom. Now you need a lot of samples there, and that’s why the objective of reaching that is very daunting: you need vast, unimaginable quantities of examples to be able to do something analogous to what has been done with language.

Luisa Rodriguez: Right. It makes tonnes of sense when I imagine a large language model having strings of text. I mean, it just is a single dimension. Whereas when I imagine a robot trying to learn from YouTube videos, I’m like, how do you exactly define what that hand is doing to lift something up? Like, I’m looking at it two dimensionally and so it feels like it should be not that complicated. I just can see the image of the hand.

But when I think about what’s actually happening, it’s thumbs, it’s joints, it’s ligaments, it’s muscles — and all of that is happening in a way that’s, one, hard to see, and two, hard to notice in us as humans because it’s so subconscious for us. But actually, it just is fundamentally incredibly complex.

So it seems ridiculous to me, honestly, that we have large language models, but we don’t have robots that could really reliably crack an egg, for example. But as soon as you actually break down what cracking an egg is, I start to actually be able to grapple with why.

Ken Goldberg: Good. Well, Luisa, I think you just articulated why there’s this excitement out there, because I think many people see exactly what you just described, which is: languages seem so complex, and it seems to be now we’ve had this breakthrough, so it seems like a very short step to getting robotics to do something similar.

And as you said, just being able to watch a human cracking an egg, the perception of that is so nuanced and complex. We don’t have the ability now to watch a video and tell you what all the things are moving around inside that video. Humans do that, as you said, subconsciously — we just do it effortlessly. And it’s because we somehow have this ability to look at this two-dimensional image, and in our minds, expand it into perceiving it as a three-dimensional experience. And that’s amazing.

Luisa Rodriguez: Right. And somehow we’re doing that with like shadows, and depth perception, and all sorts of things — in ways that if I were to try to imagine what that meant to mathematically model that, or to try to put a bunch of points on a grid or something, it becomes actually kind of a mind-blowing problem.

Ken Goldberg: Right. It is mind-blowing. But one thing I want to also point out is that there’s one glimmer of hope in this, which is that we have seen that it is possible to compress images dramatically with neural networks. This is something called an autoencoder, and it’s a beautiful result. It’s a very simple idea: you take an image on one end and the same image on the other end of a neural network, and you basically try and train the network to go down and squeeze that image down into something very small, and then reconstruct the large image from that. It turns out that you can train that with a fairly reasonable number of images, and then it’s able to do that.

Now, that is amazing, because what it’s saying is it’s somehow finding structure in these images that it’s able to recreate. When you look at the space of all the images out there, that’s unimaginably large. But what we’re able to show is that with this autoencoder you can squeeze this down to, say, 100 or 200 bits, which is amazingly compact. It says that all these images have a structure built into them.

Luisa Rodriguez: Right. I’m going to see if I understand that, and I’m going to try an analogy. Tell me if it’s totally going in the wrong direction. But if I were to try to describe an image very literally, maybe I would try to describe every point, every pixel as a colour or something. But I could instead try to distil it by describing concepts or a structure in it. And I can do that with language. And maybe these systems do that in some other way that doesn’t look like language to me. But instead of describing the pixels, you can describe, “It’s a cat on a red background,” which is a much simpler, more concise way of describing the image, and then you can build it back up. Is that a reasonable analogy?

Ken Goldberg: It is, yes. And you’re saying by just a few words, you can describe an image. The old adage is an image is worth 1,000 words, right? But even 1,000 words is far more compact than the representation if you had to represent an image in full detail.

The other way you can think about it is this: that we can take images and we can blur them and compress them down. So I can do a very pixelated vision of you right now, and I’d still be able to see who it is, that it’s a young woman, and she’s talking to me and smiling. So you can see all these things, because you can pick up things with far less information than full detail of the image.

And the reason I bring this up is that it is a sign that, when we talk about all the dimensions and everything else, it may turn out that we don’t need all those dimensions and all that accuracy to do these things. So that’s what I say is a glimmer of hope, because that is significant. And if we are able to really compress things, for example, motions may be similar — robot motions and actions of objects might be similar: we actually don’t need all these dimensions.

The evidence for this is that somehow the human brain, of course, is the existence proof that these things are doable. And maybe that’s where the breakthrough might happen, and we might be able to suddenly represent things in a much more compact way. And now, all of a sudden, you have a much smaller state space to work with, and now you need far fewer examples to have a good coverage of that state space. And that could lead to another breakthrough.

And this comes back to what I was saying at the beginning, which is that there are some big open questions right now. The development of large language models is a huge one, and it has made a huge breakthrough in natural language processing, but that alone is not sufficient to solve the robotics problem. These other breakthroughs are needed, and I just don’t know when they’re going to come. I think the danger is that we expect… The hype, as I’m saying, is saying, “OK, we have all the tools we need. It’s going to happen in a couple of years.” And my point is: no, we are missing some key tools. I don’t know when those are going to come. It might be 10, 20, 30 years before we get those.

Luisa Rodriguez: Yep. OK. Yeah, that’s a really helpful, concrete, I guess not a prediction you’re making. But it’s something like those are plausible timelines, too. The next couple years isn’t the inevitable timeline.

Ken Goldberg: Exactly. I’m not saying this will never happen, and it may happen sooner than I’m saying — I would welcome that; I’d be happy to be wrong — but I also think we have to be calibrated that if we start to say things are going to happen, they require certain breakthroughs that we don’t have in place. Then that’s a speculation, and history shows that you just can’t will that into being. You can’t just say, “We need this. It’s going to happen.” Breakthroughs take time.

What can robots do today? [00:34:35]

Luisa Rodriguez: Bringing it back to where we are with robotics now, I feel like I now have a sense of kind of a set of tasks that robots can and can’t do right this moment, but I still don’t feel like I have a great sense of the big picture. So I’d love to get more of an overview from you. In general, what are the kind of characteristics of tasks that robots have been able to master?

Ken Goldberg: I would start by saying there’s two areas that have been very successful for robots in the past decade. The first is quadrotors — that is, drones. That was a major set of breakthroughs that were really in the hardware and control areas, and that allowed these flying robots, drones, to be able to hover stably in space. And then once you could get them to hover, you could start to control their movement very precisely. And that has been a remarkable set of developments. And now you see spectacular results of this. If you’ve seen some of these drone sky exhibitions — where they’re in formations, moving around in three dimensions — it’s incredible.

Luisa Rodriguez: Yeah, they’re incredible.

Ken Goldberg: Yeah. And that has also been very useful for inspection and for photography. They’re extremely widely used in Hollywood, and just at home sales — typically, drones fly around and give you these aerial views. So that’s been a really big development, and there’s a lot of beautiful technology behind that.

Another one, interestingly, is also quadrupeds, which are four-legged robots. And those are the dogs, like you see from Boston Dynamics, which was really a pioneer there, but now there’s many of those. In fact, there’s a Chinese company that sells one for about $2,000 on eBay, Unitree — and it’s amazing, because they’ve pretty much gotten very similar functionality. It’s not as robust, and it’s smaller and more lightweight, but it has the ability to walk over very complex terrain. You can take it outside and it will climb over rubble and rocks things like that very well. And it just works out of the box.

Luisa Rodriguez: Cool!

Ken Goldberg: So that was, again, the result of a number of things: new advances in motors and the hardware, but also in the control. And here, there were a lot of nuances in being able to control the legs, and learning played a key role there. In particular, this technique called model predictive control — which is an older technique, but, combined with deep learning, was able to address that problem and basically get these systems to be able to walk over very complex terrain and adapt, and even jump over things. And you can probably see that they can fall, roll down some stairs, and get up and keep going. So they can roll over, and they can, in some cases, do a backflip, which is really incredible.

Luisa Rodriguez: Yeah, I think I’ve seen a video of that one, and it really, really surprised me. It’s funny because I would have thought that that set of problems, because I’ve seen the quadrupedal robot Spot — which is the one by Boston Dynamics — I’ve seen it walk across rocks and snow, and through, like you said, this really rough terrain. And that seems like it would have been a really hard problem to me. Is this an area where it’s not as hard as it looks? Or is it hard, but we’ve just focused really hard on it and made a bunch of progress?

Ken Goldberg: Well, that’s a great question. So I’ll address both of these.

First, in the quadrotor, the beauty of it is that you’re moving in open space, so you don’t have to make contact with anything. It turns out that’s much easier to do, just avoiding everything and staying stable. It’s nontrivial, but it’s not as hard as making contacts.

Now, in walking — in legged robots, quadrupeds — you do have to make contacts. So you’re right. I would say that’s a harder problem, and it was solved subsequent to quadrotors. And it’s been interesting. I think you’re right: it was surprising that you could get these to be so stable and so adaptive and to run so fast. I would say it’s interesting because you do have to make contacts; you have to coordinate, in this case, four legs; and maintain balance and move them — you know, adapt their movements quite rapidly to maintain, because you have gravity pulling on you. So as you’re moving, you have to be very responsive.

But it turned out that that in some way was learnable, and that you could see enough conditions that it would adapt, that it was not too hard. We talked about the training space — the dimensionality of the problem, the state space. And for these robots, it seems that there’s some tolerance. In other words, there’s a number of conditions, but they’re all somewhat similar, in that the legs don’t have to learn every single possible condition; they can sort of approximate the conditions that they’re feeling. So if the robot is tilted just slightly and it does some motion, that usually works, even if the exact arrangement of pebbles or sticks or things underneath it are not quite the same. So that means that the state space is lower than you might think, in that case.

Luisa Rodriguez: Yeah, that makes sense. It’s like stumbling around is pretty good, relative to the really kind of high-level balance that we have when we walk and run. Stumbling around gets you a good chunk of the way there. Whereas, I guess for cracking an egg, there’s not much room for error.

Challenges for progress: fault tolerance, multidimensionality, and perception [00:41:00]

Ken Goldberg: Let’s talk about that, because I like what you said: “it gets you pretty far.” That’s a great way to put it, Luisa. That’s exactly right. What you think about this is like: How far do you need to go to get performance that is tolerable?

So there’s these two aspects: one is the dimensionality of the state space, and the second is fault tolerance. So in quadrotors, drones, they’re fairly low-dimensional, because you’re just dealing with the six-dimensional motion in that space. You’re just floating around. And you’re fairly fault tolerant, in the sense that if you’re not perfectly balanced in one location and you stray a little bit, you can recover from that easily. Now, of course, if you bump into another quadrotor, and that’s very bad, and you could crash. So there’s not tolerance to everything, but certainly a lot of things you can tolerate. And the same, I think, is true for walking.

Now, let’s talk about manipulation, and your cracking the egg example. First, I want to raise an interesting point, which is: when we talk about eggs, people often think that it’s amazing if your robot can pick up an egg without cracking it. But I challenge you: if you take an egg out of the refrigerator and just put it in your hand and try to crack it by squeezing, it’s very hard to crack an egg that way. It’s easy to crack it as soon as you tap it onto something, but if you just squeeze it, it’s evolved amazingly well to be very resistant to just being crushed. So when you see a robot pick up an egg, don’t be super impressed, OK? That’s not that big of a deal.

But now, if you want to see a robot actually cracking the egg and getting the yolk out, like the way a chef would crack an egg, that’s manipulation. And that’s where I say it’s very hard, because now you have contacts, and you have friction, and you have deformations, and you have to manage a lot of nuance so that the things don’t drop. And that turns out to be very, very tricky, and even higher dimensional and less fault tolerant, because a one-millimetre error in the position of your contact often makes the difference between holding something and dropping it.

Luisa Rodriguez: Yeah, that makes a lot of sense. That actually makes me curious. I like that we can think about it in terms of tasks that are very multidimensional and very fault intolerant in one quadrant, and then the diagonal quadrant being very fault tolerant and fewer dimensions. Are there other things in the really difficult quadrant, that are very fault intolerant and very multidimensional?

Ken Goldberg: I really like that, Luisa. It’s a great way of thinking about it. So let’s imagine we have two axes. We have the low-dimensional to high-dimensional as the horizontal axis going from left to right: low is on the left, and high-dimensional is going off to the right. And then we have another axis, which is the amount of how tolerant the system is to faults. So going up is very tolerant, and going down is very intolerant. So then in the upper left, you might have something like drones: they’re tolerant and they’re lower dimensional, because they just have the six dimensions. And dogs, because they have more legs, they’re also fairly low dimensional; they have more dimensions, but they’re also fairly tolerant, as we were saying.

Now, logistics is where you have a robot picking up packages. I would put that in the upper-right quadrant, which is where you have things that are very tolerant and a little bit higher dimension, because you actually are making these contacts. But the kind of manipulation you’re asking about, like picking up things at home, that’s interesting, because there’s where you get into higher dimensions because you have complex objects and more nuances. So that may also be in the upper-right quadrant, but further to the right of logistics.

And then something like surgery is in the lower right. That is very difficult; that’s where you have very high dimensions and it’s fairly intolerant to errors. So those are really hard things, these things in the lower-right: high dimension, and not fault-tolerant environments. And that’s where I think a lot of manipulation [is]. If you’re thinking about things like even just cleaning up your dinner table, you might say it’s very difficult because it is fault intolerant in that you can drop glassware and break it, so that’s not acceptable. So these kinds of problems are very challenging.

And I would add one element also. Maybe it’s a third axis in this nice categorisation.

Luisa Rodriguez: Just in case anyone thought our quadrants were too easy, too simple.

Ken Goldberg: Yeah, exactly. So let’s go into a third dimension, and say that we have the challenge of sensing the environment and the perception difficulty. That would be where some things are much easier to perceive — if you have a nice, fairly static and opaque environment, and clear contrast between things. And then you have things where there’s a dynamic environment — let’s say underwater, or actually even traffic, where things are moving around you — it makes it really challenging.

Or in a surgery, the whole body is actually moving, because you’re breathing and your heart’s beating. So doing a surgery is very challenging. So that dimension, perception in the body, is very tricky with fluids and blood and glistening materials: finding things and perceiving them is very difficult. So that axis is another one.

Luisa Rodriguez: Yeah, I’m really glad you brought that up, because when I was learning a little bit about the state of things in preparation for speaking with you, sensing came up, and I was like, sensing can’t be that hard still. But then you show that, looking at a glass… I don’t know how we do it! If you look at a glass, it’s mostly clear. You’re just getting a couple of places where light is reflecting in a slightly different way that makes it easier to see.

But actually, if I were to try to tell a robot how to perceive a glass, it would seem like an extremely hard problem to me. And it seems like there just are loads of different ways this kind of thing can be hard, whether it’s clarity, or something like contrast, or different lights and reflections. So yeah, another example of this paradox, I guess.

Ken Goldberg: Definitely. And it’s interesting, because we understand how glasses are shaped roughly and their properties, so we kind of have a way of feeling about them. And then the other thing that I believe is really important is how humans operate, we also — like robots — have limited perception abilities and limited control abilities, right? So we make errors, but what we’ve learned — somehow, over years of evolution — is how to compensate for these errors.

So when we pick up a glass, we don’t just put our gripper right up next to it and close it. We sort of scoop it. And by that, I mean we reach around it in a way that as we close our fingers, we’re going to be able to compensate for small errors. That’s what we call “robust grasp” — that will work even though there’s uncertainty in the conditions. And that’s really what I’ve been very interested in on the research front: how can we build robust manipulation that will work, even though we have these errors that are inevitable and inherent in the problem?

Luisa Rodriguez: Yeah. Just to make sure I understand that, it’s something like, if someone were telling me to pick up a cup, they wouldn’t tell me the coordinates. I would move my hand toward the cup until I bumped it, kind of, very lightly, but my hand would get the feedback that I’ve now touched it, and it would curl around it. And that’s a much more robust thing to do than, possibly, get the coordinates slightly wrong and then knock the glass over.

Ken Goldberg: Right, exactly. Exactly. Now, here’s an interesting thing you just brought up, which is touch. The sense of touch.

Luisa Rodriguez: Of course!

Ken Goldberg: A lot of times robots don’t have that. And in the logistics environment, you don’t really have a sense of touch with these suction cups. You don’t get feedback from them, except you do know if you’ve picked the object up or not because of the way the airflow works. You can tell if it’s been picked up. But humans are making use of very nuanced tactile touch sensing, and that is something that we’ve struggled to reproduce. It’s very difficult to get a very-high-accuracy tactile sensor that can tell different shades of pressure, and also finding edges, and, let’s say, textures on objects.

Now, humans are very good at that. In fact, you can put something much smaller than a human hair on a surface, and you rub your finger across it and you can detect that. It’s incredible. I believe it’s at the micron level that you can detect a crack in something with your fingers.

So we don’t know how that works; we can’t reproduce it. But one thing that has been very exciting in the last few years is optical tactile sensors. The name for this is often “GelSight.” What they have is a gel, and then the camera inside the gel — so the gel presses on something, and the camera inside is looking up. So as you make contact with that device, you can sort of see the contact, if that makes sense.

Luisa Rodriguez: Right, right. The gel kind of compresses as the pressure of the thing pushes into it.

Ken Goldberg: Right. It’s sort of like you see the negative side of it. You look, sitting inside, and you see the gel deform, and so you can see very fine details. Let’s say you press on a penny. You can see the whole outline of the head or whatever is printed on the penny.

Luisa Rodriguez: That’s really cool.

Ken Goldberg: Yeah, it’s really nice. Now, those sensors have become increasingly popular. The price is coming down. There’s a number of companies making them. The challenge is, though, that they’re kind of bulky because you have to have the camera positioned behind it. It’s fairly large — larger than a human thumb, say. And then they also tend to be prone to small errors, because the surface gets abraded over time, so it sort of wears down.

Luisa Rodriguez: Yeah, that makes sense.

Ken Goldberg: There’s a lot of excitement in robotics about these, and we’re working with them to try and understand their properties, and think about how those will affect grasping and manipulation in the future.

Recent breakthroughs in robotics [00:52:32]

Luisa Rodriguez: I guess while we’re on the topic of kind of recent advances that are making some of these problems easier, if you had to name one of the most exciting breakthroughs in the field of robotics from the last three years, what would it be?

Ken Goldberg: Well, the big ones over the past, I want to say 12 years, both have to do with depth. The first one is deep learning. That was a huge breakthrough. I really think of it as two waves of AI: that we’ve had deep learning that came around 2012, and then almost a decade later there’s generative AI. So those are two very fairly distinct breakthroughs.

Now, there’s another one in robotics, which came just around the time of deep learning, which is depth sensors. These were cameras. These had actually existed before, but became very popular when Microsoft put out this thing called the Kinect, which was actually an extension of some results that came out of a lab — basically that you could use structured light to perceive the depth of points in space. Microsoft used it for games, so you could stand in front of your TV, and you could move around, and the thing would sense where your body was in space. So you could swing a tennis racket and things like that. Xbox was using it.

Luisa Rodriguez: Right. Just to make sure I understand the advance, it’s like if you point a big floodlight at a person, and then asked them to move around, you could measure the brightness or something, and that would give you a sense of how far away that person was or where they were, or how their body was blocking the light, and therefore where they were?

Ken Goldberg: OK, let me give you a better example. The first wave of this was like, imagine projecting a grid of lines in infrared over the scene. With the right camera, you would see that grid, but you know what the grid looks like. And now you see how the grid is distorted, and that gives you an idea about where those points are in space, so you could build up. Actually, you know what? It’s really exactly what you see in a map.

Luisa Rodriguez: In a contour map.

Ken Goldberg: The contour map. Yeah, exactly. That’s what they were able to do: you could point a camera now and build a contour map of the scene. So if it’s a person standing there, you could see their body and their face.

And being able to do that fast and at lower cost has been another big breakthrough in robotics, because now you have a way of getting depth information, and that changed the field dramatically. It was a crucial component to be able to essentially perceive the three-dimensional structure of points in space. So that’s an area that I think is huge and not often acknowledged, but that’s the big advantage that’s sitting behind the scenes in a lot of robotic advances.

Luisa Rodriguez: Cool. So that’s one. Is there another one worth talking about?

Ken Goldberg: Sure. Tactile sensing is. I would say this GelSight development has really been interesting. I think we’re still at the early stage of being able to apply it.

But one area that we’re very excited about is this multimodal learning. This is very new and relates more to [generative] AI. But in gen AI, you have a language model, as we talked about, and you also have vision language models that are co-trained using contrastive loss. The idea being that you can now link words and images together so that you can describe something in words and it will generate an image, or vice versa, which is really exciting. Now you can take an image and feed it into gen AI, and it will describe what is in that image. This is very exciting.

For example, I don’t know how much you played with ChatGPT-4V, or basically the latest version of ChatGPT has this ability. So you could just take an image of your table and say, “What’s here?” And it will remarkably tell you, “I see a glass over to the left of a plate with a hard-boiled egg…” and it’ll just go on and on, and tell you really incredible detail. So that’s cross-training between vision and language.

What people are also doing now is cross-training with tactile images, so now you have all three. What’s kind of cool with that is that then you can say, “I feel something with my tactile sensor. What is it? What am I feeling?” — and it can describe it in words, or maybe generate an image that’s consistent with that data.

Luisa Rodriguez: That’s incredible.

Ken Goldberg: Yeah. And actually, now we’re working on something where we incorporate sound. Sound is really interesting, because imagine a really sensitive microphone: as you’re doing manipulation, it will hear subtle interactions. So if we can start correlating those with the motions, then that actually could end up being a very useful modality.

I don’t think humans use sound because we’re just not that sensitive. But it seems like it would be useful to have microphones — and they’re inexpensive, and we know how to make them — and put those on each of the fingers, and just as you’re manipulating things, use that.

Luisa Rodriguez: Yeah, instead of using my hand to bump the glass and notice that I’ve bumped it because of the touch sensation, I noticed that I bumped it because of tiny microphones that have heard the bump. That’s amazing.

Ken Goldberg: Exactly. And there’s another thing which is related to that, which is vibration. So we can detect very fine vibration with our fingers too, and that’s subtle. But you can imagine that a microphone can do that too. So as you make contact, you hear this little bump, but then it has a frequency associated with it, and that you can use to determine what was the material.

Luisa Rodriguez: Oh, that’s so cool.

Ken Goldberg: Isn’t that interesting? So you could tell the difference between glass and wood and metal, say. And there’s interesting work going on there.

The new idea is even to introduce another modality, which might be temperature or humidity or moisture sensors, so you can have all these things built into it. And the cross-training is where you take all those examples, and you say, let’s train a network to be aware of all those different channels. And you give it examples where they’re all happening simultaneously, so it starts to link them together. Then you can pull one of them out, and the others will replace the one that’s missing — they fill in the blanks. That’s called cross-modal or multimodal learning. It’s a huge advantage, and very exciting. We’re just at the beginning of being able to use that.

And here’s where I want to come back to the vision language models. I think this does two things for robotics. One is that it means that we can start talking to robots using natural language. And I don’t want to be overly biassed toward English; it could be French or Mandarin Chinese or whatever. But you can now use words and say, “Pick up the tall tumbler on the table.” And somehow these systems are able to now interpret that, and light up the tall tumbler on the table, because it’s seen other tumblers in different environments, and it correlates that visual with that tumbler.

So we have something called “language-embedded radiance fields,” which is related to another advance, which is called “NeRFs”: neural radiance fields. These were developed just another three years ago. This is a breakthrough both within computer vision and graphics, and that was a technique for being able to take a number of images of a scene and then build a model that can reconstruct any other viewpoint of that scene. That’s called a NeRF.

Luisa Rodriguez: Wow.

Ken Goldberg: So that gives you an ability to essentially look around in a scene and see it from different viewpoints. It’s really powerful, and it’s being used in a number of applications in filmmaking and animation and things.

But it also turns out that we can adapt that, and now we can combine these language elements into it. So now we can look in that scene and start to identify things in the scene. You could say a word, like, “Find the elements of electricity,” and now all the wires and plugs and things will light up and basically get highlighted. It’s really interesting, surprising.

Luisa Rodriguez: It seems really big. It seems like one of the things closest to this more general-purpose thing that I think people might imagine is on the horizon, but maybe isn’t so close on the horizon. But this feels like it’s in that direction.

Ken Goldberg: Definitely. So I don’t want to sound negative. I want to really applaud and acknowledge the wonderful things that are happening. But also, at the same time, just point out that these don’t mean that we’ve solved all the problems. Just to set the expectations, as we said.

But this ability is really interesting, because now you can just look at a scene, you can ask a question in that scene, and these new models will be able to identify what parts of the scene are relevant, and identify what might need to be done. So if you said, “Clear the table,” it might be able to find each object, and then we can apply techniques to be able to grasp that object and avoid other objects, et cetera.

So it’s a combination of these breakthroughs in deep learning, as well as these new sensors and developments in hardware that can come together. And I would say also the good old-fashioned engineering: that is so important to remember and to keep in mind. But combinations of all those things, I think, are what are going to be able to get us to increasing our abilities to be able to do these manipulation tasks.

Luisa Rodriguez: Cool.

Barriers to making better robots: hardware, software, and physics [01:03:13]

Luisa Rodriguez: OK, let’s leave that there for now. I want to better understand the kind of barriers to making better and more versatile robots. We’ve kind of talked about the ways in which the problem is complex, and I feel like I have a good handle of that. But I’m curious what the specific technological bottlenecks are.

In your TED Talk, Why don’t we have better robots yet? — which is really good, and we’ll link to it — you talk through three areas that each pose their own challenges: hardware, software, and then physics. Let’s take those in turn. To start, to what extent is the main barrier to better robots the hardware?

Ken Goldberg: OK. I think a misperception that many people have — and I understand where it comes from — is if we want to manipulate things, we should start with a hand like humans have. So there’s these robot hands that have five fingers and are kind of similar to human hands.

It turns out that that’s extremely difficult, because the human hand is so complex and versatile and has so much nuance. But if you build a robot hand with all those joints, you have a very high-dimensional system. Again, there’s a lot of moving parts, and it’s very difficult to control all those joints accurately. Generally, you have to use cables, and then cables tend to be inaccurate because they stretch. And there’s something called hysteresis, where if you pull in one direction and you push in another direction, they don’t come back to the same point.

Then they’re also very heavy, these hands, and so every ounce that’s in the hand means you can pick up less weight with it. So that’s a big issue.

And then, most of all, it’s cost. So these are extremely expensive: $100,000 for a five-fingered hand today.

Luisa Rodriguez: Whoa.

Ken Goldberg: So I’m a big fan of the very simple grippers, a very simple parallel jaw to just basically a pincher. You can do amazing things with those. The example I talk about in the TED Talk is that surgical robots just have very simple grippers, and surgeons work with them all the time. And then, of course, every day, half of the Earth uses chopsticks to pick up food. Those are very capable. You can do a lot with simple grippers. So that’s one way I think about the hardware, is that we don’t need to build these complex hands.

But the other issue there is the control, which is that no matter what hand you have, if it’s on the end of a robot, you have all these motors that have to be controlled precisely to get both of those jaws to a particular point in space. And each of those motors has very small error associated with it. And by that I mean that you can command the motor to go to a specific point — and mathematically, we can tell where we want that motor to be — but that motor may not actually be there. It’s because there’s gears, small errors in the gears, and friction, other things that cause problems. So what you have is precision: how close can you get to a desired point in space?

So that’s a problem, but there’s a really nice breakthrough there, which is that you can use deep learning to compensate for those errors. We’ve had some really nice success, especially with surgical robots, being able to do that in the last couple of years. So we can actually address that to some degree, but not perfectly.

So that’s the first real challenge. The second is perception. Perception is quite difficult with cameras: even if you have a stereo camera, you still can’t really build a map of where everything is in space. It’s just very difficult. And I know that sounds surprising, because humans are very good at this. In fact, even with one eye, we can navigate and we can clear the dinner table.

But it seems that we’re building in a lot of understanding and intuition about what’s happening in the world and where objects are and how they behave. For robots, it’s very difficult to get a perfectly accurate model of the world and where things are. So again, if you’re going to go manipulate or grasp an object, a small error in that position will maybe have your robot crash into the object, a delicate wine glass, and probably break it. So the perception and the control are both problems.

Then the other one that’s more subtle and I think really interesting is in physics. The nuance there is that you can imagine, just take up a pencil or pen, put it in front of you on a flat table, and then just push it with one finger forward. What will happen is it’ll move for a minute, and then it’ll rotate away from your finger as you do it. Now, why does that happen? It turns out that it really depends on the very microscopic details of the surface of your table and the shape of your pencil. Those are essentially making contacts and breaking contacts as you’re moving it along, as you’re pushing it.

The nature of those contacts is impossible to perceive, because they’re underneath the pencil. So you, by looking down, can’t see what’s going on, so you really can’t predict how that pencil is going to move. And this is an example of really an undecidable problem. I mean, it’s unsolvable. And I like to say you don’t have to go into quantum physics to find the very difficult problems that inherently cannot be solved. It’s not a matter of getting better sensors or better physics models, but we just can never predict that. We’ll never be able to, because it depends on these conditions which we can’t perceive.

So we have to find ways to compensate for all three of these factors: the control errors, the perception errors, and the physics errors, I call it. And humans do it. We have an existence proof. We know it can be done. And humans have the same limitations. So how do we do it? That’s the million-dollar question, or billion-dollar question.

Luisa Rodriguez: Yeah. No kidding. Well, that made me want to ask: you said it’s an unsolvable problem, and maybe I’m just kind of misunderstanding, but do you mean that it’s unsolvable with the perception we have now for robotics? Because I don’t know exactly where the pencil I push on the table is going to end up, but I do manipulate pencils and make pretty good predictions about what would happen if I nudged it forward.

Ken Goldberg: Yeah. So you do. But what I mean is that you cannot predict exactly where that pencil is going to move if you start to push it.

Luisa Rodriguez: So humans can’t do it as well, this thing that you’re talking about?

Ken Goldberg: Right, right. No one can. I’m saying that no model in the universe can solve this problem, because it depends on what’s happening at the really almost submicroscopic level that is basically going to influence how that pencil responds to being pushed. It’s friction, but in a very nuanced way. And in friction, we have this model we all learn in college or in high school called Coulomb friction. It’s a reasonable approximation, but it’s a rough approximation to the real world, and it doesn’t really describe what happens when you push an object across a surface, how it’s going to move. So this is known to be a very nuanced and subtle problem, and it’s right in front of us.

And you’re right: we solve it. “Solve it” means not to predict it exactly; what we do is we compensate for the error by the scooping motion, where we move our fingers in a way around the object. And we have a word for that: we call it “caging.” Caging is where you put your fingers around an object in such a way that it is caged, meaning that it can’t escape. It can move around, it can rattle around inside that cage, but it can’t get out of the cage. Once you put your fingers in a cage around the object, now you start to close your fingers, there’s nowhere for the object to go, so it generally will end up in your [hand]. You’ll be able to pick it up.

Luisa Rodriguez: Yeah. If I’m catching a small ball, the motion is going to be for my fingers to come around as much of it as they can, so that when I clasp, even if I can’t quite move it in the exact direction I want, the fingers are just going to trap it.

Ken Goldberg: Right. And you know, a ball is a great example. I don’t want to use baseball because that’s such an American thing, but you have this glove that’s big. And we sometimes think about this as a funnel: that you funnel the uncertainty down, and it sort of uses natural physics. A funnel is a beautiful example. If you want to funnel a bunch of rice into a tiny opening, you use a funnel and then the rice sort of manages to find its way down just using physics.

Luisa Rodriguez: Right. And you’re not predicting how to get the rice to fall at the right angle into the jar or whatever.

Ken Goldberg: Exactly. You’re just setting up physical constraints such that it sort of works. And that is really the secret, I think, to robotics. If we can figure out how to do those kinds of things — generalised funnels — then we can start to move up that ladder of the dexterity of humans.

Luisa Rodriguez: Yeah, that makes sense. Are there other kinds of hardware, software, physics challenges worth talking about?

Ken Goldberg: Well, the challenges of perception get especially difficult when you have reflections and transparencies. That becomes extremely difficult, because any sensor is depending on the behaviour of light, and when light is unpredictable… For example, underwater is very challenging. And if you’re moving around lighting, it turns out to be itself complicated.

So a lot of times we’ll be working in the lab, we’ll get something working, and then that afternoon we’ll have a sponsor come in and it won’t work. And we’re like, what happened? Well, the lighting has changed, and it’s just because it’s later in the day and the very subtle change of lighting has caused our sensors to not work anymore. It’s because they’ve adjusted to one lighting condition, and now when you change them even slightly… Humans are amazingly good at adapting to changes of lighting, but that’s a challenge for robots.

And this, by the way, is one of the huge challenges in autonomous vehicles, because there’s these very rare lighting conditions where you get glare, and that’s where the system doesn’t see something, sees something it’s never seen before, and it crashes. So these variations in lighting, the human eye is incredibly adaptive to changes, and unconsciously we can manage. But that is still a challenge.

Luisa Rodriguez: That’s another case where it just would never have crossed my mind that the lighting of the time of day would be part of the robotics challenge.

Ken Goldberg: Oh, it is.

Luisa Rodriguez: It would never have occurred to me that my body is doing anything particularly interesting at sunset relative to noon.

Ken Goldberg: Right. It’s very true. And actually, it happens all the time in industry, where you have these systems and they’re in environments with humans, and the lighting changes. And sometimes it can be subtle things. Even changes in temperature and humidity have a big effect in robotic environments. Like for our logistics systems, all the packages can change, they can swell. If it’s really hot, and sometimes it gets very hot in these environments, the packages swell. So imagine they have a nice box, but now it’s swollen, so what happens is it’s not flat anymore. So when the suction cups come in, it’s almost curved or spherical.

Luisa Rodriguez: Slightly convex or something.

Ken Goldberg: Yeah, convex. So when you put your sensors on it, they don’t make contact cleanly. And it worked yesterday, but it doesn’t work today because it was just more humid and more hot.

Future robots in home care, logistics, food production, and medicine [01:16:35]

Luisa Rodriguez: Yeah. I’m still personally very curious about how robots are going to be integrated into the real world in the near future. Are there going to be any cases of near-term integration in particular sectors?

Ken Goldberg: Well, for example, I think the huge area of demand is going to be home care — that is, taking care of people who are ageing. And the reason this is a relatively near-term need is just the demographics. We have an ageing population in almost every country, and a shortage of workers — which is why I’m not worried about people being massively unemployed; I think we’re going to have plenty of jobs and a shortage of human workers for a long time.

But in these home care types of settings, that’s where there’s already a huge shortage. And this is going to increase dramatically in the next decade. It’s inevitable. This is, by the way, not a speculation, because it’s about demographics. We know how many people were born and died, and we know what the population looks like. This is a collision course for a massive imbalance in the age population. And as that starts to evolve, we’re going to need help there.

Now, I think that I can imagine this fairly clearly because I’m getting closer to that age, and my parents, my mother is. I think about today: it’s very hard to find help, to afford help, to have human workers help in the home. And there’s a lot of concern about finding someone who’s trustworthy and capable and reliable. That combination is rare. So the alternative is that I would love to have a robot that could do some of these things around my house.

So what would that be? Well, things like just keeping the house clean, decluttering. That would be really helpful. I would love to be able to have things just kept orderly. Actually, people talk about robots cooking meals for you. I don’t really. I like cooking. But I don’t like cleaning up. So I would love the robot to do that. And then it would just straighten up your desk and pick up clothes, put them away, fold your laundry, make your bed — you know, all those kinds of things around the house that I think would be very welcomed and beneficial.

And I think we’re getting closer to that, to some of those capabilities, and that would be very helpful, and there’s huge demand for that. The challenge, of course, is going to be cost, because you need to produce these at a level that people can afford them. But if you get a real breakthrough in the capabilities, then these costs might go down with volume. So just like cars are very complex and they have a lot of moving parts, but most people can afford a car. Maybe not a new one. But you get the economy of scale, right? So that could happen.

The hope is that, if we start to bootstrap and these robots are out there, then we could collect data as they move and continue, and then those data can start helping fill this huge gap that we’re talking about and then these robots will get better over time. So I think that that’s one category down the road.

There’s one thing that’s already happening and that’s logistics. Logistics is basically moving materials around and transportation of materials. So you have a huge economy of this, and the way most people see it is in their delivery service: when they order something online, then that package comes to them. The logistics is basically how that happens — how that thing gets transported, that whatever you ordered gets transported to your door.

And there’s a lot of steps: it has to be found in a warehouse, it has to be put into a box or package, and then it has to be delivered to a central place, and then to a more localised place, and then all the way down to the delivery person who drives it over and drops it on your front porch. So that whole thing. And by the way, it exists in business too, because businesses are constantly ordering things and storing things, et cetera.

So this whole giant area of logistics is something that has been around, is very important, but most people don’t really see it, most consumers.

Luisa Rodriguez: You don’t see it, yeah.

Ken Goldberg: It’s invisible. It’s sort of an invisible economy that’s out there, but it’s all the things that go on behind the scenes to get your packages on time. So that’s an area that has become more challenging because e-commerce is continuing to rise. People really like this idea of ordering online for all kinds of reasons, and it’s addictive in the sense that once you get hooked on it, you start doing it more and more. And COVID had a big boost: by necessity, we had to do a lot more ordering online. So what happened is there was a real surge in the amount of logistics that needs to be handled.

It turns out that the current, recent breakthroughs in robots are very well-matched to handling that problem. By that, I mean sorting packages, moving packages around, that is something robots are capable of. And they’re actually starting to be adopted in many different places. And that’s not science fiction, that’s not the future; that’s happening today.

Luisa Rodriguez: Right. What are the specific subtasks that they’re doing that they can do? Because it sounds a lot like grasping to me, but robots can’t reliably grasp yet.

Ken Goldberg: Great question, Luisa. So it’s very difficult for a robot to, say, clear a dinner table. And that’s because dinner tables are complex: they have a lot of fragile objects, transparent objects, shiny objects. Those are all particularly challenging for robots right now.

But packages have some nice qualities to them. Generally, they’re opaque, and they usually have some flat surfaces, so they can be grasped. It’s still nontrivial. For example, in a bin, if you dump a bunch of packages in there, the boxes are all different, just randomly arranged. And now finding out how you can grasp those objects is nontrivial. And the way that I say “grasp” them is, usually it’s done with suction cups. You have to reach in, and you have to find out where you can put the suction cup that won’t bump into something else or break something else. So that problem is actually harder than you might think: how to get objects out of a bin with suction cups. I know it sounds easy, but —

Luisa Rodriguez: It does, it just sounds ridiculous to me! Yet I did watch a video of suction cups sorting packages, and then when you try to empathise, like, “What would I do?”, you do notice that you have to think about where the suction cup goes to distribute the weight, and how not to bump other things, as you said. It really is just a lesson for me in how complex things are that seem incredibly simple to me. But anyways, it sounds like that is a thing that we’re doing better?

Ken Goldberg: Yes. So if you had to do it — and by the way, humans do this all the time — you reach in with two hands, and you reach your fingers around this box and you lift it up, and you sort of push other things away with your wrist as you do it to get that object out of that box and then put it there.

Now, robots generally are operating with one hand, so that’s one big challenge. Second of all, they have this huge wrist, because if you look at a robot, it’s very bulky because it’s got these big motors near the wrists and also this big suction head — which, by the way, is not just one suction cup; it usually has up to a dozen suction cups on it. So it’s this big, huge thing that it’s trying to reach in.

And now it has to approach that box from the right direction. If the box is on its side and, you know, just a corner of it, you have to kind of come in at the right angle. And there’s a lot of other boxes, so you have to figure out where to reach in to get the box. Then, as you said, very importantly, once you get a hold of it and you start pulling, now all of a sudden it can slip off, because the suction cups can be sheared off, and if you twist it, it will fall.

So a lot of nuances, but we’ve made a huge amount of progress on that. My company, Ambi Robotics, is doing that every day, and we have very good tools for being able to lift objects out of bins.

The other step is scanning objects. And this is also a little trickier than you might think. When you go to the supermarket, you know how the objects get scanned for the barcode? It turns out there’s a nuanced aspect to that. Nowadays you have to do it yourself many times at the self-checkout, and you realise it’s kind of interesting. You have to find the barcode visually and then stick it in front of the scanner. That’s tricky, because a lot of times the barcode is not obvious, and you have to find it, as we said, lifting it up with a suction cup.

But sometimes you scan it and you don’t see the barcode. And that could be for two reasons. Even if you have multiple cameras around, the barcode might be covered up by the gripper, which is a big problem, or the barcode is damaged — it got scraped or something; there’s a million reasons that it gets smudged or scraped or something — and then you can’t read it.

Now that all has to happen, and then you have to take that, if you’ve read it correctly, and now you have to drop it in another place where it gets moved to a separate bin or bag, depending on whatever the barcode says — because you’re trying to basically distribute it back out, so that you can get it out to ultimately that postal worker who’s going to carry it and drop it in your house, that delivery person. So you have to get it to their bag, into their vehicle that’s associated with your house.

So those steps, we’re actually making enormous progress there. And those things are practical and cost effective and they’re being adopted, and it’s really exciting to watch that area be transformed by machines now.

So Ambi Robotics has these machines in a number of facilities now, but they’re basically helping workers to be more productive. So the workers love these machines, and they’re getting raises, and then they’re eliminating the job that nobody wants to do: reaching into the heavy bin, lifting out heavy things. But their job is to maintain these machines so they can get much more work done per person. So the job becomes more pleasant, and then the whole shipping centre can handle more things.

So there’s still plenty of jobs to go around. And I also want to say that I’m not saying it doesn’t eliminate some jobs, but that’s not what we’re talking about, about wiping out all jobs.

Luisa Rodriguez: Interesting. Yeah, thanks for that take. I actually want to come back to how robots might affect the labour market in a bit. But first, I’d like to ask about the integration of robots into other sectors. What roles might robots play in agriculture and food production in the near future?

Ken Goldberg: OK, good. That’s another one. Like logistics, that’s challenging because there’s not enough workers, and it’s also seasonal. And by the way, logistics is seasonal too: around the holidays, you have a huge surge in packages coming out. In agriculture, it’s very seasonal, where there’s this harvest period, and all of a sudden you need 100 times more workers than you needed three weeks ago. And the harvests often are simultaneous in many different places, so everybody’s rushing to get workers. And it would be great to have machines that can help with harvesting. There are also machines that are helpful maintaining weeding.

One of the things we’ve been really excited about is the idea of managing pruning plants when we’re growing in polyculture environments. This is called “intercropping.” It’s becoming increasingly popular, because it’s an old technique — obviously, it’s the way people still tend to garden; it’s the contrast between polyculture and monoculture.

Most industrial farms are monoculture, just like row after row of soybeans, the same thing. The challenge is that you have to use a fairly high amount of pesticides and fertilisers and water. If you use polyculture, you’re using the natural benefits of plants helping each other, so that a lot of times you can get away with far less pesticides, fertilisers, and water if you use this polyculture. It’s very popular, for example, if you look in the wine industry: organic wines are often grown with this intercropping, with lots of different crops around the vineyards, so that they kind of help pollinate and support the vineyard, the grapes.

But these require much more labour, because you have to prune things, and everything is changing at different rates. So that’s an area where I think robots could be really interesting and be able to move through an environment and trim and maintain farms.

Luisa Rodriguez: That sounds really hard to me. That one I actually do have a visceral sense of that sounds challenging.

Ken Goldberg: Are you a gardener? Do you have a garden?

Luisa Rodriguez: I do have a garden. I basically am just relying on easy plants, and it rains a lot in the UK, so the garden does pretty well. But knowing what to prune, how to come at the right angles, traversing outdoor environments, that all does sound much harder. Is that far away, or am I just really inexperienced at gardening?

Ken Goldberg: No, no, it’s a hard problem. And you’re not alone. I think Michael Pollan has this great quote — “Nature abhors a garden” — which is basically, you know, gardens are very hard to grow. And England has a great rich history of beautiful gardens — beautiful, manicured, and just aesthetically beautiful. And, and of course, there’s European gardens are semi wild and all these things, the whole beautiful history of gardening and for aesthetic purposes.

But in terms of farming and growing edible plants, you have a lot of other considerations. And it is very complex, because pests come in and all of a sudden something’s eating your plant. You don’t know what to do about that. You have to prune — even something like lettuce, how you continuously prune it to take off the leaves that are edible but keep the plant growing. A lot of nuances.

So you’re right: this is beyond the current capabilities of robots. But I think in the future, you could imagine that you have something like this — and this is something we’re working on in the lab — is being able to, let’s say, find stems on a plant and then prune particular leaves, dead leaves, or maybe to extract the healthy leaves that you want to harvest. That would be interesting. So I think you can imagine that coming.

And there’s a lot of benefits. You could have these systems and increase the food production, especially for fresh produce, which is something that would be very desirable. And reducing water usage — because water, because of climate change, is an enormous issue right now, and we’ve had major droughts in many areas. And this is persistent; it’s not going away. So being able to have indoor farms that have robot assistants would be very helpful. And I do think that we’re going to see that this is going to increase over even the next decade.

Luisa Rodriguez: OK, wow. Interesting. Let’s take one more sector.

Ken Goldberg: How about surgery? Can we talk about that? That’s one I’m very excited about, and here’s why: surgery is — and healthcare more broadly — but surgery is really interesting because there’s a very big difference between different surgeons. The skill level varies widely.

So in particular, let’s say something like suturing, which is sewing up a wound. If it’s done right, if it’s done well, that means you have a nice, even distribution of the forces on the wound. Then you get very efficient healing and little scarring. But if you don’t do it so well, it’s a little sloppy, then you can have a lot of complications: things don’t heal properly, and you get a big, ugly scar. My father-in-law summed this up: he was a surgeon, and he said he could look at a scar, and he knew which surgeon did the operation. Because he took a lot of pride in his ability to suture and his skill in that, and he said he knew all the surgeons and how good they were.

So coming to robotics, this would be something where I think robots could help — again, not replacing the surgeon, but imagine that you’re in the operating room and we have these systems now, these robot assistants. And the way they work now is it’s completely controlled by the human surgeon — so it’s like a puppet, but there’s no sense of robotics there at all. But what we’re calling “augmented dexterity” is that we can introduce a little bit of robotics into these contexts.

And it’s very analogous to self-driving cars. We’re not saying we’re going to have a fully self-driving surgeon, right? Where the surgeon is off golfing, and the robot is doing the surgery. No, the surgeon is always going to be there. I think we’ll need surgeons for my lifetime, the foreseeable future — 30, 40 years at least. But the surgeon could be augmented by having this system that would allow them to perform this surgery, or the suturing, in a very nice, optimised way — so that the suture is very clean and even, and results in fast healing and limited scarring.

So that’s another thing we’re working on, and I think that is going to come also in the next decade.

Luisa Rodriguez: Cool. So what is the robotics problem of suturing, really concretely?

Ken Goldberg: It’s hard. It’s very tricky. And it starts with a needle, which is very difficult to even perceive, because the classic needle in a haystack. So what you have here is surgical needles are curved. They’re like the letter C for a variety of reasons, but that’s because you’re sewing into skin. So they’re always curved like that, but they’re very thin and shiny. So they have all these problems we talk about, where just finding the needle is hard.

So we’ve been working on a number of techniques using ultraviolet light and other things to train a robot to be able to perceive needles. Those seem to be showing promise, so we’re very excited about that. We have a demonstration where a robot can hand a needle back and forth between two robot grippers 50 times in a row. I’m very excited about that.

The next step though, once you find the needle, is to be able to manipulate it in, so that you can find the right point for the entry and the exit in the wound, and then push the needle through. And then you have to grasp the tip of the needle with the other gripper and pull it right through there, and get the right amount of tension — because you want to get it just right so it holds the wound, but doesn’t pinch the blood circulation — and then you have to hand it back to the first jaw to do another stitch.

And then you have this interesting problem, Luisa, which is thread management: the thread tends to get in your way.

Luisa Rodriguez: I’m laughing, because again, I would never have labelled these things as tasks or subtasks, but there are so many. But sorry, you were saying? Thread management gets in the way.

Ken Goldberg: Thread management is a really complex issue, because again, the thread is almost impossible to see. It’s even harder than the needle. And thread tends to get in the way in the most fiendish ways. You get tangled up. Human surgeons are very good at managing this, so when we watch videos of human surgeons, we see how they’re doing it. We’re trying to develop strategies for robots that can do something [similar]. So we’ve gotten up to being able to do six stitches in a row, with a robot fully autonomous, no human in the loop. So that’s a brand new result. We just presented it, in fact, yesterday at a conference.

Luisa Rodriguez: Congratulations.

Ken Goldberg: Thank you. So that’s one step, but it’s still not ready for human application. But I think that’s another area that we’re going to be able to get some levels of autonomy, augmenting surgeons’ ability, augmented dexterity for surgeons. And you can see that that can have big benefits for healthcare.

Luisa Rodriguez: Yeah, of course. Just because you’ve piqued my curiosity, why do you think we won’t have fully autonomous robot surgeons in the next 30 or 40 years?

Ken Goldberg: So the issue here is fault tolerance. I’m glad you brought it up, because this is why self-driving cars are particularly complicated and challenging, because they’re prone to a small fault. A small error could be quite disastrous, as you know. If you’re driving on a cliff, small error and you go over the side. Or you bump into a stroller, run over a kid. So driving is very challenging because of that, in contrast to logistics — because in logistics, if you drop a package, it’s no big deal. In fact, it happens all the time; they expect it to happen a fairly large amount of time. So if something like 1% of packages get dropped, it’s OK, that’s not a big deal. You can live with it.

But driving is not very fault tolerant; in surgery, even less so. You have to be really careful because you don’t want to puncture an organ or something, or sew two things together that shouldn’t be sewn together, right? So there’s a huge consequence.

The other thing is perception. Because inside the body, it’s very challenging, because oftentimes there’s blood, or if it’s a laparoscopic surgery, you’re constantly essentially trying to avoid the blood and staunch out the blood so that you can see what’s going on. And this is where, just as you were describing watching someone crack an egg, surgeons have developed this really good intuition — because they know what the organs are, they know what they should look like, how they’re positioned, and how, let’s say, thick or rough, or what their surfaces and their materials are.

So they have very good intuition behind that, so they can operate. Sometimes you cut a blood vessel and the whole volume fills with blood, and now you have to find that blood vessel and clamp it, so that you can stop the blood. And that’s like reaching into a sink filled with murky water and finding the thing, right? Surgeons are very good at that, and it’s a lot of nuance.

So the perception problem is extremely difficult, because everything is deformable. Deformable materials are particularly difficult for robots. We talked about cracking an egg or clearing a dinner table: generally, all those objects are rigid. But when you start getting into deformable things — like cables or fabrics or bags, or a human body, right now — all of a sudden, everything is just bending and movable in very complex ways. And that’s very hard to model, simulate, or perceive.

Luisa Rodriguez: Right. Yeah, I’m just finding it fascinating how the categories of things that are really troublesome, thorny problems for robots are just not what I’d expect. I mean, the fact that we’re making progress on suturing, but it gets really complicated as soon as an organ… You know, you could move it and it’s hard to predict how it’s going to look when it moves or where it’s going to be. It is just unexpected and really interesting.

Ken Goldberg: Absolutely. And as you’re saying this, I’m thinking of, going back to the kitchen — you know, kitchen workers in restaurants — there’s so much nuance going on there. If you’re chopping vegetables or you’re unpacking things. Let’s say every single chicken breast is slightly different. So being able to manipulate those kinds of things, and then clean surfaces, and wipe things, and stir — there’s so many complex nuances.

So I think it’s going to be a long time before we have really fully automated kitchen systems. And the same is true for plumbers, carpenters, and electricians. Anyone who’s basically doing these kinds of manual tasks, fixing a car, they require a vast amount of nuance. So those jobs are going to be very difficult to automate.

How might robot labour affect the job market? [01:44:27]

Luisa Rodriguez: Yeah. Well, that’s exactly what I want to ask next. How much do you expect the job market to be changed or affected with the increased presence of robots?

Ken Goldberg: So here’s where I also think there’s a huge amount of fear out there, and it’s really important to reassure workers that this is not imminent and that what they do is very valuable and safe from automation. I think everyone’s been saying for years that we’re going to have lights-out factories, and that humans will just sit around and have all this leisure time — maybe even like Wall-E or something, right?

But no, we’re very far from that. The fact is that there’s so many nuances to what humans do in jobs, and AI is a whole realm of office workers and what they do. And I think that there’s certainly many aspects of jobs that can be automated. For example, transcribing this interview is a perfect example: in the past it had to be done by a human; now you got a machine to do most of it, then you tune it. You still have to fine-tune it, but you get a lot of the basic elements done. We have a lot of tools that make certain aspects of our job, but then, because every other aspect of our job needs more attention than we have, we can spend time on that.

The same is true for so many of these things we’re talking about. So gardeners, et cetera, you know, there’ll still be a need for the gardener to be doing the more subtle things, but maybe some machine will be out there doing the lawn. We’re getting closer to automated lawn mowers. And for workers, I think certain things in the kitchen may be automated. And obviously, we have dishwashers, we have certain automation already we’ve had for a long time. But that doesn’t mean we don’t need human workers. So I think we’re going to need them for things that are much more subtle.

So I’m sort of optimistic about the job market. I think that because of this demographic thing, that’s the biggest factor: we have a shortage of human workers, people who are of work age, so I don’t think there’s going to be this kind of unemployment that people are talking about or fearing for the foreseeable future.

Luisa Rodriguez: Yeah. If in, let’s say, 10 or 20 years, robots are super widespread and are causing real job displacement, what do you think played out differently to how you expected?

Ken Goldberg: OK, so let’s say one of these breakthroughs we’re talking about happens, and all of a sudden then robots are capable of learning from YouTube videos and repeating anything they watch. Or maybe you demonstrate, this is how I want to chop these vegetables, and now it’s able to repeat that reliably. So if we got those breakthroughs, then you could imagine that you’d have these robots. Another factor we haven’t even touched on, which is not as interesting, is just the cost, the financials of doing these. But let’s say that gets finessed, too.

So now all of a sudden, you have these robots, and they’re actually pretty capable, and we’re seeing them increasingly being put to use and actually doing something useful. Then I think it will be interesting. I think that would change our perceptions of them. My own sense is that we would find new work for humans to do, that we would basically shift toward other things that are more subtle, let’s say maybe it’s healthcare and things like that. We have a shortage of humans that can do those things, and also teaching and childcare. And there’s a lot of things where we are just still shorthanded. So I think that people will find new jobs, but some of these things might be automated.

I guess the extreme form of this is that you have a robot that can do anything that a human can do and you just have them doing it all. And then what? We hang out, we can spend time playing music and writing poetry and doing all the fun stuff. And it’s an interesting prospect. Maybe we’ll drive ourselves crazy because we’ll have so much free time, you know?

Luisa Rodriguez: Yeah, I do find that question really hard to actually grapple with. On the one hand, I do feel kind of terrified of losing my job and sense of purpose and ways to fill my time. And on the other hand, I’m like, probably I could fill my time. Probably I could find ways to be pretty fulfilled.

Ken Goldberg: Yeah. I mean, for what you do — as a journalist, as a content creator — I think there’s always going to be an audience. People want to hear things, and they’re going to want human innovation there. I think in the arts and humanities, there’s going to be a lot of safety. People talk about journalists being replaced. I don’t see that at all, because journalists, what they bring is novel and interesting perspectives. They’re not just summarising some bunch of facts — that’s true that that could be replaced — but coming up with novel insights into scenarios, that’s very complex and nuanced, and I think people will want that for a long time.

So I think humans are going to have a very positive future for a very long time, and that the fear is exaggerated. I guess it’s a mix of optimism and fear that suddenly we’re going to have this automated world around us. It’s not something that I think we should be thinking about as at all imminent. Really, I see it as a part of a process that’s been happening for hundreds of years, since the Industrial Revolution. There’s going to be steps, but what seems to always happen is humans adapt and shift our attention and our time to other things that are somehow more constructive, in most cases — although with social media and the addictive nature of TikTok and other things, who knows? I don’t know.

Robotics and art [01:51:28]

Luisa Rodriguez: Let’s turn to our last topic, which is robotics and art. So you’re an engineer, you’re a roboticist, but you’re also an artist. And a lot of your art is kind of at the intersection of robotics and art. One of your pieces is called the Telegarden. It’s a live garden tended by a robot controlled by over 100,000 people via the internet. Can you say a bit about how it works?

Ken Goldberg: Sure. It’s still, I would say, my favourite project. It started in the early days of the internet, and I had been making art with robots, painting with robots, and doing installations when the internet came out in 1993 — or when I learned of it — and I suddenly saw that I wanted to contribute. We had a robot in the lab, and I wanted to build a robot system that people could interact with. So my students and I started working on this. It evolved into the Telegarden. We liked the idea of the contrast between the natural physicality of a garden and this digital world of robots and the internet.

What we didn’t expect was how many people were going to get interested in this. But it was the first participatory system where you could not only look at things with a camera, but you could also interact with something remote, so it attracted a lot of attention. It was an art installation on the web for a year in our lab. And then it got invited to be in an art museum in Austria, in Europe, and they maintained it for nine years.

Luisa Rodriguez: Wow.

Ken Goldberg: Yeah. It was online 24 hours a day, and that’s how it grew to 100,000 or more people. We don’t know exactly how many people were involved, but they were able to plant and water seeds.

And, as a gardener, you know that if you have a three-metre-by-three-metre plot of land, you can’t sustain that many people. So it was also a study in the tragedy of the commons, because you just couldn’t support that many. People would plant things, they would grow at different rates, and then periodically we’d have to wipe the garden clean and start over. And the people were very passionate about their plants. It was fascinating to us.

Luisa Rodriguez: That is fascinating.

Ken Goldberg: It was really an artwork in the sense that it was something fairly easy to understand, the Telegarden. And this idea, some people said it was the future of gardening — which was not our intention; we were seeing it almost as a little bit of a critique of the expectations around the internet, and saying that gardening would be hopefully the last thing you’d want to do online. But it was really interesting.

And we’ve evolved that. Recently, I wanted to revisit that project, and so just before the pandemic, we started developing a new system that we call “AlphaGarden” — that’s related to the AlphaGo and AlphaZero projects. What we wanted to do was to build a completely autonomous garden controlled by robots and AI — so no human in the loop.

Luisa Rodriguez: Wow.

Ken Goldberg: That was very fun. And it’s ongoing, in fact. We built that system, and it was timed to be in an art exhibit in New York City. I went to the opening in New York, and then the pandemic happened the week after, and we had to close down. The garden was in a greenhouse controlled by the robot, and we could watch it, but we couldn’t go into the greenhouse, so we had to just sit and watch it over a period of four weeks. And it essentially was drying up, because we couldn’t get to irrigate it.

So it was fascinating to watch this thing, and not what I expected, but the garden started in the last stages to throw up all these flowers and tendrils up. And it reminded me of Picasso’s Guernica painting — because it was reaching out desperately for attention, you know, just like it was slowly dying and you were watching in front of it, and it was just trying to strive to stay alive.

Luisa Rodriguez: Last efforts.

Ken Goldberg: Yeah. So it was really beautiful to watch that process, although tragic. And it was also interesting in regard to what was happening with this worldwide pandemic at the same time — because you had this sort of aberration in nature that nobody expected and couldn’t solve with all the technology we have. So it was very interesting that all these advances in AI were completely powerless with regard to something Mother Nature had developed, something that we couldn’t control.

So I’m really interested in this aspect of contrasting, using art to raise questions about these technologies — to, in a way, critique or challenge the conventional wisdoms about them. Because that’s where I think the robots and things, as we’ve talked about, there’s so many mythologies and histories and cultural memories about robots. So they’re interesting as a medium for art.

And the more recent project we’ve done is I’ve worked with a dancer, Catie Cuan, who is a professional ballet dancer, but also has a PhD in robotics. She’s remarkable. We started working together just over the last few years, just exploring ideas and trying things out. We were having her dance with a robot arm, and we were using her motions, using motion capture and something called OpenPose, which uses AI to track the position of the joints in a human. Then as she danced, we captured her motions and then basically interpreted those motions into sinusoids, and had the robot move somewhat analogously. Then she came back into the lab and danced with the robot as it moved. And we really got interested in sort of the contrast between the two.

Then we were invited to perform that in a space in Brooklyn, New York, called National Sawdust. And Catie said she wanted to do it as an eight-hour performance. I was really impressed with her ambition. Just imagine being on stage for eight hours solid, dancing, but she’s terrific.

And we did it. We brought the robot to New York. It was so much fun. We had this industrial arm moving on stage, and we had choreographed a number of motions. We wanted to contrast a lot of the fears around robots stealing jobs with the incredible nuance of the human body. So she was doing motions suggestive of different aspects of work — like stirring, and painting, and basically manual labour — and then her motions were in contrast to the motions of the arm. What was interesting is how much more nuanced and complex, and ultimately interesting, was her motion compared to what the robot could do.

Luisa Rodriguez: I’ve seen, obviously, just a snippet of this, and I found it really beautiful. And now, after our conversation, I do feel like I see it in a slightly new light. It feels like it is a great example of a lot of the things we’ve talked about already. The human body is incredibly complex. It’s doing much more complex motion sensing, et cetera, than we perceive — and robots can imitate bits of that, but there’s a bunch they can’t. And juxtaposing those is a great way to kind of see that really viscerally.

Ken Goldberg: Thank you. I have to give credit to Catie, because she’s just so expressive in her emotion. And you see that the human body is so complex, and, as you said, capable. I’ve learned more about robots and worked with robots over the last 40-plus years, and you know, I constantly am reminded of how complex and magnificent the human body is.

Luisa Rodriguez: That’s a perfect place to leave it. My guest today has been Ken Goldberg. Thank you so much for coming on. It’s been really fascinating.

Ken Goldberg: Thank you, Luisa. Really fun talking to you.

Luisa’s outro [02:00:55]

Luisa Rodriguez: All right, The 80,000 Hours Podcast is produced by Keiran Harris.

Content editing by me, Katy Moore, and Keiran Harris.

Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong.

Full transcripts and an extensive collection of links to learn more are available on our site, and put together as always by Katy Moore.

Thanks for joining, talk to you again soon.

Learn more

Engineering skills

Expert in AI hardware

What the past can tell us about how AI will affect jobs

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

January 24, 2024

#177 – Nathan Labenz on recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps

Listen now

May 14, 2024

#187 – Zach Weinersmith on how researching his book turned him from a space optimist into a “space bastard”

Listen now

May 5, 2023

#150 – Tom Davidson on how quickly AI could transform the world

Listen now

August 23, 2023

#161 – Michael Webb on whether AI will soon cause job loss, lower incomes, and higher inequality — or the opposite

Listen now

June 27, 2024

#191 – Carl Shulman on the economy and national security after AGI (Part 1)

Listen now

July 5, 2024

#191 – Carl Shulman on government and society after AGI (Part 2)

Listen now

December 27, 2018

#50 – We could feed all eight billion people through a nuclear winter. David Denkenberger is working to make it practical.

Listen now

November 29, 2021

#117 – David Denkenberger on using paper mills and seaweed to feed everyone in a catastrophe, ft Sahil Shah

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

Highlights

Moravec's paradox

Successes in robotics to date

Why perception is a big challenge for robotics

Why low fault tolerance makes some skills extra hard to automate

How might robot labour affect the job market?

Articles, books, and other media discussed in the show

Transcript

Cold open [00:00:00]

Luisa’s intro [00:01:19]

The interview begins [00:02:51]

General purpose robots and the “robotics bubble” [00:03:11]

How training robots is different than training large language models [00:14:01]

What can robots do today? [00:34:35]

Challenges for progress: fault tolerance, multidimensionality, and perception [00:41:00]

Recent breakthroughs in robotics [00:52:32]

Barriers to making better robots: hardware, software, and physics [01:03:13]

Future robots in home care, logistics, food production, and medicine [01:16:35]

How might robot labour affect the job market? [01:44:27]

Robotics and art [01:51:28]

Luisa’s outro [02:00:55]

Learn more

Engineering skills

Expert in AI hardware

What the past can tell us about how AI will affect jobs

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

About the show

What should I listen to first?