Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about the world’s most pressing problems and how you can use your career to solve them. I’m Rob Wiblin, Director of Research at 80,000 Hours.
Today’s episode was especially exciting for us – we love the book Algorithms To Live By, and really wanted Brian’s thoughts on how some of his ideas apply to big picture career choice questions. It’s likely to be both entertaining and useful to almost all of you listeners out there.
Though because we really wanted to get answers we could recommend on the explore vs exploit trade-off, the conversation just continues to dig deeper and deeper into that problem between 1h 40m and 2h 20m. If you ever feel like you’ve had enough of it, skip on to the final section, which is a buffet of the most interesting bits from the book, starting around 2h 19m.
I also have a really important favour to ask. Once a year, 80,000 Hours tries to figure out whether all the things we’ve been working on have actually been useful to you or not.
This year we’ve published 61 hours of the podcast, dozens of articles and hundreds of high impact jobs we’d like to fill.
Our donors need to know we’ve been changing people’s careers, so they know it’s a good idea to keep funding us, and we need to know that we shouldn’t just give up and get different jobs.
If no one had told us their stories in the past, 80,000 Hours just wouldn’t exist today.
So if anything 80,000 Hours has done, including this podcast, our website or our coaching has changed your career plans or otherwise helped you, please go to 80000hours.org/survey and take a few minutes to let us know how.
You can also let us know about any ways we’ve led you astray, or could be doing better.
Alright, here’s Brian.
Robert Wiblin: Today I’m speaking with Brian Christian. Brian is a non-fiction author, best known for The Most Human Human and Algorithms To Live By, which he co-authored with Tom Griffiths and which became a number one US bestseller in the non-fiction category. He studied computer science and philosophy at Brown University, and since 2012 has been a visiting scholar at UC-Berkeley. Since 2013, he’s been the Director of Technology at McSweeney’s Publishing and an open source contributor to projects like Ruby on Rails. He’s appeared in the New Yorker, the Atlantic, the Paris Review, and The Daily Show with Jon Stewart. Thanks for coming on the podcast, Brian.
Brian Christian: Thanks for having me.
Robert Wiblin: Today I expect to mostly talk about Algorithms To Live By, which is a really outstanding book that a lot of listeners should go out and read after they listen to this episode. The primary reason is that we have a lot of questions about how to apply some of the algorithms to real-life decisions, especially career decisions. And for that purpose, I’m joined today by Ben Todd, founder and CEO of 80,000 Hours, who was especially keen to have this interview, because he has some very pressing questions about the explore/exploit tradeoff. So welcome, Ben.
Benjamin Todd: Yeah, I’m really excited about this interview. Over the last year, this was one of my favorite books, and it really felt like it introduced me to a lot of new mental models that I feel like I should have been taught in school but wasn’t. In particular, very excited to talk about how it can apply to specific career decisions. Yeah, the book covered a lot of things I’d vaguely heard about before, such as the secretary problem, but it went into way more depth than it’s normally covered, but still remaining really clear, really easy to understand. So I found it really interesting.
Robert Wiblin: Great. We’ll get to that in a minute. But first to Brian, what are you working on now?
Brian Christian: I’m working on a new book right now, which is about normative issues in computer science, so the question of how do we try to capture human values in, for example, machine learning? This covers things like, for example, the fairness, accountability, and transparency movement within the ML community and also things like the value alignment problem that people are thinking about in AI safety. So I think it touches on a number of things probably of interest to 80,000 Hours listeners, and I’ll be excited to talk more about that next year.
Robert Wiblin: What’s that book called, and when it’s coming out?
Brian Christian: The title is currently wobbling between a few different possibilities, so I don’t want to say until we finally determine it.
Robert Wiblin: Not until we’re sure, yeah.
Brian Christian: But this will be out, my guess would be sometime in the fall of next year.
Robert Wiblin: What made you choose to write about that topic?
Brian Christian: I think partly we are seeing kind of the confluence of, I think, two major trends. One is just this kind of explosive progress in machine learning as a discipline, particularly with the rise of deep learning, starting from 2012 to the present. And that has in turn created this reaction of both jubilation and also concern, that has really launched this subfield unto itself of technical AI safety. And you have things like Nick Bostrom’s book, obviously, turning into what I think is this really remarkable technical research agenda. I’m really interested in how some of these big ideas are actually getting cashed out, in terms of PhD theses and so forth.
Brian Christian: At the same time, you’ve got this societal adoption of machine learning systems, increasingly into kind of morally and ethically relevant domains, driving being one obvious example, but also things like arraignments and sentencing. Increasingly, we are thinking about how to translate the social contract into explicitly algorithmic terms. That is very intriguing to me as being an area where philosophy and computer science are on this collision course, and I think that’s only to be a more pressing issue in the next few years. So that has really captured my attention.
Robert Wiblin: Your first book, The Most Human Human, was also about artificial intelligence, and that was, I think, back in 2011. What did you write about then?
Brian Christian: Yeah, The Most Human Human is about the Turing test, and in particular, my experience as what’s called a human confederate in the Turing test competition. I was one of these people hidden behind a curtain, trying to convince a panel of scientists that I was in fact a human being and not a chatbot claiming to be a human being. This was kind of a fascinating and bizarre experience for me, and led me on an investigation, both into the history of the Turing test, the history of chatbot technology, and also into just this broader linguistic question of what should you do if you are in this competitive scenario where your objective is to convince someone else that you are a human being? How does that manifest into actual linguistic strategies? I had a lot of fun researching that and learning about both the technology and also kind of the nature of human conversation.
Robert Wiblin: Yeah, what did you do, broadly speaking, to try to seem incredibly human?
Brian Christian: Well, I looked into the way that most chatbots of the time were being made. It’s funny, the book was published in 2011, which was before Siri, so it feels like another time, another age. One of the main strategies that chatbot developers were using, and this is still true in some ways today, is that they would essentially be sampling shards of conversation out of some huge corpus of previous human conversations. So for example, you could just download the entire chat transcripts of some message board and then map some user input to find the place-
Robert Wiblin: Nearest match.
Brian Christian: … in that archive that’s the nearest match, and then say the next thing that a real person said in that situation. Systems built this way are at times uncannily impressive. In the book I detail my interactions with one program called Cleverbot, where I said, “What’s 2 plus 2?” It says, “4.” I say, “What’s the capital of France?” It say, “Paris.” I say, “What’s the capital of Romania?” It says, “Uh, Budapest? I don’t know,” which in some ways is even more impressive, because the correct answer is Bucharest, and so this is like an example of graceful degradation and sort of a meta-level analysis of its own uncertainty, which is extremely impressive from a machine learning context.
Brian Christian: But the problem is that in a real human conversation, you are not only getting locally appropriate answers to each particular question, but you are building a model of the other person, you’re building a conversational history. That’s then going to influence the things that happen later. And so, for example, if you interact with these programs and you say, “Are you married?” it would say, “Yes, I’m happily married.” If you said, “Do you want to go on a date?” and it said, “Sure, I’m free on Friday.” They’ll sometimes say the word “color” with the British “u”, sometimes with the American spelling. The real tell, it’s not the sense that you aren’t interacting with a human, it’s that you sense that you aren’t interacting with a human.
Robert Wiblin: It has no memory.
Brian Christian: Exactly, yeah, and no long-term coherence. One element of my strategy, for example, was to go out of my way to very self-consciously tie all of my answers together into this broader narrative of who I was. So if they said, “Nice weather, huh?” Well, the Turing test took place in Britain, so it was actually really bad weather at the time, and I remember saying, “Yeah, this is pretty crappy, but I’m from Seattle, so that’s par for the course.” Later when we were talking about music, I would reference the grunge scene or something like that, just very explicitly flagging, I’m the same guy who answered that other question. Things like that, that I could do to kind of increase the long-term complexity of the interaction, I felt like put the truth on my side.
Robert Wiblin: You mentioned that it’s kind of a different era in chatbots now. What’s changed since 2010 or 2011?
Brian Christian: Yeah, I mean for me, I think one of the most fascinating things about Turing tests from a contemporary perspective is, it has essentially become woven into the fabric of everyday life. You get an email from someone that says, “Here are some exciting discounts on Viagra,” you’re not going to reply to your friend and say, “You might want to check with your doctor before using that.” You might rather say, “You better reset your password,” or something like this.
Brian Christian: In a way, communication in the 21st century is effectively a Turing test, and when you send a link to your friend, you now have to sort of performatively put in two or three sentences so they know it’s really you and not just some automated message. And I think really the climax of this is the 2016 election, where now in hindsight, we’re looking back and saying there was a huge amount of kind of automated, insincere activity happening in social media, and people couldn’t tell the difference. In some ways, that first book of mine, I now think about it from the perspective of the citizenry of a democracy in the 21st century needs training in order to navigate online discourse.
Brian Christian: I think it’s interesting to think about the idea that if social media discourse were at a higher level, or higher bandwidth, or more thoughtful, or more articulate, then the ruse wouldn’t have worked nearly as well. It’s partly an indictment of just the poverty of the actual medium and the way that language is used. Poets have always been interested in using the language as articulately and uniquely and expressively as possible, but I think increasingly this is also a question of national security, which to me is scary but fascinating.
Robert Wiblin: Do you think that’s a losing battle, trying to detect people who aren’t real on Twitter or Facebook? I guess, on the one hand, that the bots will be getting more and more sophisticated, but then also the technology for detecting them will also get more sophisticated. But I suppose at some point, surely they would approach humanity almost exactly, and then you just can’t tell the difference.
Brian Christian: Yeah, as I understand it there’s something of an open question just in the theoretical community, people that look at GANs and adversarial examples and so forth. Will we find that the long term fixed point is advantaged to the attacker or advantaged to the defender? I’ve heard arguments from people I respect on both sides of that, and my conclusion is we don’t really know. So, yeah, I think long term it’s a bit concerning, short term I do think there are relatively simple things that one can do, I mean, even just speaking or tweeting or writing in complete sentences, rather than in broken sentences, makes it easier to find out that someone is not a native English speaker who’s claiming to be. So, there are these little things that we can do to raise the level of discourse. Longer term, I don’t know, it’s a little bit spookier.
Robert Wiblin: Seems like Twitter hasn’t been trying that hard to get rid of these bots, so they could probably make quite a lot of progress if they just put some effort in, and actually were willing to pull some of them. I think, they’ve started doing it now, under a lot of heat.
Brian Christian: Yeah, I wouldn’t mind, I mean in the way that one has verified accounts for celebrities or something. You could imagine some Turing test required to get some badge on your account or something like this.
Robert Wiblin: Yeah, it’s interesting. I think there was some discussion of them having verified it in like you would have to send in like a scan of your passport to get an account, which people hated because that would prevent anonymous whistle blowing via Twitter and things like that. Right, but I guess the Turing test would-
Brian Christian: And then they would then-
Robert Wiblin: Yeah, having a conversation like that proves you are a coherent person would work as well. Although I guess you would have one person just doing that again and again and again for many accounts, strictly.
Brian Christian: That is true, although at least, you would limit them to the throughput of one guy working all the time. But I’ve seen this even just in online gaming, I’ve had the experience personally of being on some first person shooter gamer server. An admin shows up and literally forces you to just make small talk with them and if you don’t, then they’ll kick you out. So, we’re starting to enter this uncanny valley where … yeah, again, the Turing test … I think what would shock Alan Turing perhaps the most if he were in the 21st century is that, this had become a sort of banal nuisance. It’s no longer a thought experiment, it’s just this annoying thing that we have to do time and time again in the course of a day.
Robert Wiblin: So, hopefully we are to talk about machine learning again next year when your book comes out. Let’s talk about Algorithms to Live By for now. What is that book about in broad strokes?
Brian Christian: The basic idea is, there is a set of problems that all of us face in everyday life, whether it’s finding a place to live or deciding whether to commit to a partner or deciding where to go out for dinner or how to rearrange your messy office or how to schedule your time. These often emerge as the function of limited time, limited information. We tend to think of them as kind of uniquely and innately human problems. The message of the book is simply, they are not. In fact they correspond, really precisely in some cases, to some of the fundamental problems of computer science. So, I think this gives us an opportunity—having made that identification of the underlying computational structure of human life—to really learn something by studying the nature of those problems and their optimal solutions. I think, that gives us payouts, I would say at maybe three different scales. At one level, computer science can in some cases give you just very explicit advice. Do this, it will succeed this amount of the time. In other cases, a parallel may hold more loosely but it still gives you an understanding of the structure of the problem, the structure of what optimal solutions look like, and a vocabulary for understanding the parameters of that space.
Brian Christian: I think most broadly, it’s a way to think about the nature of human rationality itself. That the problems that the world poses to us are computational in nature and this makes computers not only our tools but in some sense, our comrades. We are confronting a lot of the same issues. And computer science paints, I think, a very different picture of what rational decision making looks like than you might find in, say, behavioral economics. Because one of the first things that any computer scientist takes into account is computational complexity. Once you incorporate the cost of thought itself, I think you end up with a picture of rational decision making particularly in some of the hardest classes of problems—that looks a lot more familiar and a lot more human.
Brian Christian: So, I think it’s a more approachable and a more recognizable version—or vision, I should say—of what human rationality should be.
Robert Wiblin: How much do you think people can gain from understanding these issues in their day to day life? Do you think it’s really important that people know these different models and try to apply them?
Brian Christian: I think so. I mean, I think, perhaps the average person doesn’t need to go personally into the, wading into the technical literature and looking at specific theorems and so forth, but I think that having a basic vocabulary for, “Oh, I’m in an optimal stopping problem. Oh, I’m in a explore/exploit tradeoff.” Is very useful because these things come up all the time. I think … We can get into, in the course of our conversation, what some of the psychological studies show us about what people do by default. In many cases people’s defaults are reasonable, but I think understanding a little bit about the types of problems that we face and being able to recognize and identify when you are in that situation is a really good first step and for me having that vocabulary has been really invaluable. That there is a set of concepts that map to these things, and just literally having access to those words I think had been really useful.
Robert Wiblin: What do you think is most fun about the book?
Brian Christian: Gosh, I think there is a lot of things for me that are really fun. I mean, one thing that was really fun about writing it and researching it was getting to interview all of these different experts. The book covers a sufficiently wide swath of terrain over computer science, operations research, psychology, cognitive science, there is really no one person that’s an expert in all those things and so my co-author, Tom, and I went on an expedition to try to find the people in each of these domains that were the most well informed, the most expert in each case.
Brian Christian: It was really fun (a) getting to hear the stories behind how they discovered some of these different breakthroughs, and also getting to put the question to them of whether their research has impacted their own life and the way they think about things day to day. I would say, maybe 50% of people said, “Oh, that’s really interesting. I’ve never thought about that.” And the other 50% said, “Oh, of course. Absolutely, no question.” It was really satisfying just getting to hear those stories.
Robert Wiblin: One interesting thing is a lot of the original contributors are still alive because a lot of this was discovered quite recently.
Brian Christian: That’s right. Yeah, I know. It’s pretty incredible, I guess the original generation of founders of computer science, Von Neumann and Turing, none of them are still around. But I would say the generation after that, a lot of those guys are still alive and it’s really incredible. We interviewed Tony Hoare, who is the inventor or discoverer of Quicksort. We asked him, how did you come up with Quicksort, this incredible algorithm? And he was like, “Well. I just thought, how would I sort something? And that was the first idea that came to my mind.” I think it’s really incredible to look back on a time when the discipline was so young, that you could make this career defining discovery just by being like, “How should I sort something? Let’s try this. Look, hey, it works.”
Benjamin Todd: Wasn’t that how Gauss came up with how to sum a geometric series or something when he was a school kid?
Brian Christian: In some ways I think yeah, one feels envious for just the low hanging fruit that was around at that period of time.
Robert Wiblin: Perhaps not envious of the lifestyle they had given those discoveries they had made, but yeah.
Brian Christian: Yeah.
Benjamin Todd: And even now there’s, this is partly why this is so interesting because I think many of these ideas have been discovered so recently, they haven’t made it into our general consciousness, where maybe say the heuristics and biases literature that’s become pretty well known recently with like, I think Thinking, Fast and Slow in the last decade or so. That this is another way of research on human decision making that I think is way less widely known than that. Even maybe more important in some cases.
Brian Christian: Yeah, I think that’s right. Part of our mission, I think, in the book to some degree was to try to speed that process up, or help it along. One case was, in the context of the explore/exploit tradeoff, there are a set of ideas that emerged in computer science that have become really interesting to people who think about medical ethics. We can get deeper into that question later, but watching the FDA start to come to an understanding of like, “Wow, computer scientists have had this best practice for like 40 years, it seems relevant to this domain in which human lives are on the line. Maybe we ought to think about evaluating some of those ideas and importing them into, say, clinical trials.”
Brian Christian: So, that was an area where I hadn’t expected to in a way put on an activist hat and really feel like, oh, I can use this book to try to actually nudge that adoption process forward and say like, “Yeah, you really should look into this.”
Robert Wiblin: One of my favorite blog posts ever looks into this question of why is it that so many of the intellectual greats seem to have been from hundreds or thousands of years ago, rather than today, even, despite the fact that there are so many more people around today, and so many more academics, so many more researchers. And there is lots of potential reasons for that. But probably the key one is that there was much more low hanging fruit 2,500 years ago. You could make enormous philosophical breakthroughs just by clarifying the most ordinary concepts. Actually sitting down and doing that, but today you have to, you have to spend 30 years training to get to the frontier, and then you find something like a slightly new idea that someone else hasn’t had.
Robert Wiblin: All right, so, just to signpost where we are going, I think mostly we are going to talk about three different models, which each have three different chapters in the book. One is explore versus exploit, the next one is optimal stopping and the third one is introducing randomness or simulated annealing. Each of them, they are related in different ways, they are all about tradeoffs that you have between trying out different things and getting information versus choosing the best that you’ve found so far. They can seem to blur into one another, but we are going to explain later, I guess, try to give clear criteria for which one you would want to use in different cases.
Robert Wiblin: I think the cases we should keep in mind as we are going through would be things like choosing which profession to go into as you advance in your career from undergrad to your early jobs, to your mid-career. Thinking about what specific jobs to accept when you are on a job search at a specific moment, perhaps deciding what city you are eventually going to spend the rest of your life in or who to date and whether to get married and things like that. Are there any other … yeah, archetypal models that you think people should have in mind as they are thinking about these models?
Brian Christian: I think some of them we may just bring up in the course of it. I mean, optimal stopping is famously applicable to being in a car, where it’s generally difficult to turn around. Part of what’s interesting is there are literally physical embodiments of some of these concepts. There are also conceptual embodiments. But I think here, it maybe easier to draw that out in the context of-
Robert Wiblin: What’s between them.
Brian Christian: Yeah.
Benjamin Todd: Let’s say, in 80,000 Hours’ career guide, we cover a lot of key questions that we will face in that [crosstalk 00:21:25] such as like which problems to focus on, should you invest in yourself and gain more skills or try to have an impact right away, and one of the really big questions of the career decision is basically how much to explore versus just go with your best guess. So, the big decision you are going to have in mind is should I go down one path, become an academic, or should I try work in government or should I work in nonprofits? That’s the key question we are addressing, and I think it’s a lot that many of these models might be able to say about that question. And we are going to basically try to attack from a bunch of different angles.
Robert Wiblin: I think it’s fair to say that this question of whether people should explore more or whether they already explored too much is an uncertainty we’ve had since the beginning. We’re confident that people don’t consider enough options. They don’t put enough options down on the page when they’re just considering what could I possibly do with my life? But then whether they do too many internships or too few internships between the ages of 18 and 25 is a bit harder to say.
Benjamin Todd: Unlike if you’re working on a job and it doesn’t work out, how quickly should you switch versus marching on. Should you try several jobs or just find the thing you think is best and go pretty hard into that?
Robert Wiblin: All right, explore/exploit. What’s the classic explore/exploit dilemma? Set the scene here, Brian.
Brian Christian: Right. So, first, I’ll make a linguistic note which is, in the explore/exploit tradeoff, this is the tension between spending your time and energy trying new things, gathering information versus spending your time and energy leveraging the information that you already have to get a pretty safe, good outcome. So, in English we’ve stacked the deck linguistically towards exploration because we think of exploitation as pejorative. But we have to think about these from the perspective of computer science and treat them as value-neutral terms.
Brian Christian: So, the canonical explore/exploit problem in computer science is what is called the multi-armed bandit problem. The basic idea is, you walk into a casino, there are n slot machines, some huge number of slot machines, and you are going to be in the casino for a while, let’s say an afternoon. And this is a bit of a strange casino because some of the machines pay off with different probabilities than others. You don’t know in advance, of course, which are which. So, the problem is quite simply, how do you make as much money as possible over the period of time that you are going to be there?
Brian Christian: Intuitively, we might imagine that there’s some combination of exploring—that is, trying different machines out, seeing which ones appear to be giving you higher payoffs on average than others—and exploiting—which is, biasing yourself towards of course cranking the handle of the machines that do in fact seem the most promising. But exactly what that balance should be, and what our strategy ought to be in that situation, has this wonderful and colorful history in the field, where for most of the 20th century it was considered an unsolvable problem and career suicide. In fact the Allied, the British mathematicians during World War II joked about dropping the multi-armed bandit problem over Germany as the ultimate intellectual sabotage, to waste the brain power of the Germans.
Robert Wiblin: So, when was this question first specified?
Brian Christian: That’s a good question. I think William Thompson was looking at a version of the multi-armed bandit problem in the 1930s. That literature ended up getting kind of buried and wasn’t rediscovered until much later. It came up again in the early 1950s. It had this reputation for, as I said, being this kind of brain teaser, but not being an actual thing that you could work on. The first paper on it came in, I think, 1952 by Herbert Robbins, where he was talking about a strategy that he came up with called Win-Stay, Lose-Shift. Which just means, if you pull the slot machine handle and it just paid out, pull it again. If it didn’t, try something else. He was able to prove that that strategy is better than something all the handles at random, which is such a modest result: “Here’s an algorithm that’s better than chance.” That was as much as could be said at the time. But in some ways that was that first handhold on the problem of maybe we can actually start to getting somewhere on this.
Robert Wiblin: And if I remember correctly, it was then Bellman, who came up with the theoretically correct answer to this question, but it was not really computable, it was just too difficult to ever actually figure out what it was even if you had the formula.
Brian Christian: That’s right. So, Bellman, in 1957, comes up with his famous idea of dynamic programming, which involves working backwards from the end and saving or memoizing different solutions of these possible endings and then using them to work your way backwards towards where you are now. Which is quite ingenious and is this incredibly important technique even today. But in the context of the multi-armed bandit problem, it relies on a few assumptions, that make it not really ideal in practice.
Brian Christian: It does require a lot of computing, it requires that you know advance exactly how many machines there are, how many times you are going to pull the handle total, things that may not be realistic or may not be useful in a practical real world situation.
Brian Christian: So, it’s a funny history because in some sense you get the definitive solution to the problem in 1957, on the other hand, it leaves open, and it’s sort of unsatisfying for all these different reasons.
Robert Wiblin: So, we got the first really practical solution from Gittins, I think in the 70s or 80s? Is that right?
Brian Christian: That’s right, yeah.
Robert Wiblin: Do you want to describe his approach?
Brian Christian: Yeah, so, there is this lovely story, I mean, I think, one of the things that I just love about the history of mathematics in general is sometimes people think they’re solving a very specific problem, and what they come up with has this level of generality that they don’t even anticipate. John Gittins, in the 1970s, he’s now a math professor at Oxford. At the time he was doing some consulting for the Unilever corporation. They wanted to know, basically, how they’ll allocate their money across different projects.
Brian Christian: So, you have pure research and development of new drugs, you also have marketing of profitable drugs, how much of our budget should we spend on one, how much on the other. Gittins immediately recognizes this as being kind of like a multi-armed bandit problem. Where you have these different levers you can pull, you don’t know in advance how well they’ll pay out. There’s a particular twist here, which I think is quite fascinating.
Brian Christian: Gittins is thinking about it from the perspective of the Unilever corporation, which wants to exist, theoretically, forever. They are not interested in maximizing their revenue over any particular time period, but indefinitely. At the same time, it’s better to have that money now than later.
Brian Christian: So, he approached the problem saying, instead of there being some finite sequence of rewards, what if there is an infinite sequence of geometrically discounted rewards? So, if a dollar tomorrow is worth as much as 99 cents today, and that extends all the way into the future, is there a way that we can think about the problem in this context?
Brian Christian: It was really fascinating thinking about how he approached the problem, because he sighed and thought, well, unfortunately we all know that the multi-armed bandit problem is unsolvable, but let me at least think of about what would give me a good approximate answer. And he comes up with this strategy that we now know as the Gittins index, which basically says, for each machine, imagine a guaranteed payout so good that you would never play that machine even one more time. For every machine there is some price that you would rather just take that reward again and again and again than even try the machine once.
Robert Wiblin: Try another machine once?
Brian Christian: Yeah.
Robert Wiblin: A different machine.
Brian Christian: Yeah. And he called this the dynamic allocation index; we now know it as the Gittins index. His thought was, well, you could just calculate that independently for each machine, it wouldn’t depend on which other machines existed. You could just play the machine with the highest Gittins index. He thought, “Well, this might be a reasonable approximation to the problem.” And then, to his own surprise, this is the solution to the problem. So, I think that’s this wonderful, again, I’m just … These mathematicians following their instincts and saying, humbly, “Well, here’s an idea, let’s try it.” And it turned out to be the answer.
Brian Christian: So, this is another case where the Gittins index is the gold standard for dealing with the multi-armed bandit problem, with infinite discounted rewards, geometrically discounted rewards. And yet there are still reasons that we may not want to use it in practice. For one, it relies, like I said, on geometric discounting, which there are a number of studies which suggest humans don’t do, although perhaps should. So, if you are doing hyperbolic discounting then you are in a different paradigm. It deals with this idea of infinite rewards, which may or may not be applicable to a particular situation. And lastly, it’s just non-trivial to compute the Gittins index for every given machine. It’s hard to do it in real time.
Robert Wiblin: There’s an awesome table in your book.
Brian Christian: Yeah.
Benjamin Todd: So you point out, if you are at a restaurant and you get that out and try to say, “Well, we’ve had seven good meals here so far and two bad. Now, should we switch to another restaurant next time?”
Brian Christian: Right. Exactly.
Benjamin Todd: Your friends will probably stop listening long before that point.
Brian Christian: Yeah, we encourage you to cut out this table and carry it in your wallet, but of course you and your friends also have to agree on the discounting function.
Benjamin Todd: So, yeah, what are some more rules of thumb solutions to the multi-armed bandit problem that someone might be able to kind of bear in mind in a real life situation?
Brian Christian: Sure. I think there are a few big picture ideas here. One of the key ideas as I see it is, if you are dealing with the finite horizon case, then one of the things you see by looking at the exact solutions that dynamic programming offers you is that you should basically front load your exploration, and do the bulk of your exploitation at the end. This makes sense, I think, for three different reasons.
Brian Christian: The first is that odds that a new machine you try is better than the best one you already know about, can only go down as you get more information. An analogy that I like to use is, if you have taken a work transfer to Spain, and you are going to be there for a year, the first restaurant you go out to, the very first night you are in Spain, is guaranteed to be the best restaurant you’ve ever been to in Spain. The second restaurant you try has a 50% chance of being the best restaurant that you’ve ever been to in Spain. This of course goes down as a function of your experience.
Brian Christian: So, the chance that trying a new thing will yield something better than what you already know about can only go down. What’s more, the value of making that discovery can also only go down over time. So, if you find an incredible restaurant in your last week in Spain, that’s strictly worse than finding that restaurant on your first week in Spain. Both the chance of making a discovery, and the value of that discovery, can only go down over time. On the other hand, the value of just doing your favorite thing, or going with the best option, can only increase over time—again, as a function of your experience.
Brian Christian: For all of those reasons, it makes sense to think about ourselves as kind of on this trajectory from exploration to exploitation, as a function of where we perceive ourselves to be within this finite horizon. What I think is really interesting about that idea is that it offers us a way of thinking about the human lifespan at its broadest level. And we are seeing for example, psychologists like for example Alison Gopnik at UC Berkeley drawing on the technical literature of the explore/exploit tradeoff, to make the argument about infant cognition, saying, you know, there’s this huge body of evidence that suggests that infants are highly random, they have the huge novelty bias. They always want to look at an unfamiliar object. No matter how carefully you’ve chosen their Christmas gift they’re just relentlessly interested in the next thing and the next thing and the next thing and the next thing … And it can be tempting to view this as, kind of, just a failure of willpower, or attention span, or that kids are just this kind of defective version of adults. In fact, you can appeal to the explore/exploit tradeoff and make this argument that, no, these kids have just burst through the doors of life’s casino. There are machines everywhere. They’re going to be there for 80 years. They really should begin their process by just flailing around, pulling those levers at random—you know, putting every object in their mouth at least once.
Brian Christian: So, we can think about the stigma of child cognition as actually being the optimal strategy given where they are in that finite horizon.
Benjamin Todd: So, okay, as a general principle we want to explore more fully and then move more towards committing and using what we already know, which we’re calling exploit. Can we know get a little bit more quantitative about that? How much should you explore early verses searching to exploit? Like, when … Suppose you’re on a two week holiday, like, how many days might you explore and then exploit?
Brian Christian: Yeah. I would love to be able to give you a specific threshold. I feel like it probably depends on how many restaurants there are in that town and exactly what the distribution of their food quality is that you’re drawing from. So, I mean, this is one of the problems with dynamic programing is that we might have to actually crunch the numbers. But, I think more broadly there’s this idea that front loading your exploration strictly, you know, so that the first x number of nights you only try new things. And then after some point, you only do the best thing. That’s an algorithm that’s called epsilon first. It turns out that epsilon first has this particular downside, which is that it offers what’s called linear regret.
Brian Christian: This kind of takes us from the 70’s to the 80’s, and the next big breakthrough in studying the multi-armed bandit problem came from Herbert Robbins, again, 30 years after his initial discovery. He’s back to advance the plot again with one of his collaborators. And they were able to frame the multi-armed bandit problem in the context of what’s called regret minimization. In every human life we have this idea that we want to minimize the number of regrets we have in the future. In the context of the multi-armed bandit problem, that has this beautifully explicit form, which is, your regret is all of the money that you left on the table. All of the money that you could’ve made, if only you knew at the beginning, everything that you knew by the end.
Brian Christian: Robbins and Lai looked at this question of, if you’re following the optimal strategy, what’s the best you can do with regards to regret? What they found is that using the optimal strategy, your regret will grow logarithmically. So, this is kind of a good news/bad news thing. You know, the bad news is, even if you’re doing the optimal thing, you will continue to leave more and more money on the table. You’ll still be making mistakes. But, the frequency and intensity of those mistakes will flatten gradually over time. So, this gave theorists another tool in their toolbox for thinking about how to approach the multi-arm bandit problem. Which is to say, we know that the best case scenario is that we can have strategies that offer logarithmic regret.
Brian Christian: What are simpler strategies, than, for example, the Gittins Index, that still offer that still offer this really nice property? Earlier, we were talking about epsilon first, which is the strategy that you explore for a fixed period of time and then exploit every, you know, forever more after that. So, the reason that that strategy is linear in its regret is that, the amount of exploration you did gives you some fixed chance that you are wrong in identifying the best slot machine and the best restaurant. At the limit as n goes to infinity, there’s just a percentage chance that you make the wrong decision forever after that. That’s your linear regret.
Robert Wiblin: So, every round, you regret goes up by the difference in the average between the one that you chose to pull forever and the optimal one that you could have chosen to call. So it just keeps on growing, then?
Brian Christian: It just keeps, yeah … every single pull is another small burden to bear, right. There’s been a lot of really exciting work starting in the mid 80’s and continuing through the 21st century of trying to identify simple intuitive strategies that offer this guarantee of logarithmic regret.
Benjamin Todd: And, yeah … what’s the …
Brian Christian: What’s the … yeah, so what are they? One of them is called epsilon decreasing. So, if you have a certain fixed chance that you are going to try something random and explore, but that you slowly decrease that percentage according to some kind of schedule, then you can prove that this strategy proves logarithmic regret.
Benjamin Todd: The way the strategy would work is… suppose you have a process, which is like, 80 % of the time I’m gonna pull a thing that I think, pull the lever that I think’s best right now and 20 % of the time, which is the epsilon, you’re gonna just pull a random lever?
Brian Christian: Yeah.
Benjamin Todd: And then you slowly decrease that percentage as you go on?
Brian Christian: Yeah, that’s right. So, let’s see, there may be specific technical results about your pulling schedule in order to achieve that result. We can direct interested readers into the technical literature on that. The basic intuition is, yeah, if every day you start with some fixed chance of, let’s say 20 % that you’re gonna try something random. But, every day that fixed chance goes day, let’s say it’s multiplied by .99 or something.
Benjamin Todd: Mm-hmm (affirmative)…
Brian Christian: Then, this is the kind of strategy that avoids the pitfalls of epsilon first because there’s always some chance—now, granted it will dwindle, of course, in the long run—but there’s always some chance that if you’ve made a mistake in identifying which is the best machine, you’re still leaving the door open a crack to getting new information that could change that. But, of course, you are sort of tapering that down, in some ways appropriately, as you’re gathering more and more information which makes it less and less likely that you have made a mistake.
Robert Wiblin: So, how does that compare with upper confidence bound algorithms, which you spend a fair bit of time on in this chapter?
Brian Christian: So, one of the other strategies that’s very simple and intuitive, but also offers this property of logarithmic regret, is what’s called upper confidence bound. The basic idea here is that you compute a … what’s called a one-sided confidence interval for each of these machines. For people with statistics background, you’re used to seeing the error bars above and below a quantity, you know, on a bar chart. What’s interesting about upper confidence bound is it says, we’re not actually interested in the expected value of the machine. And we’re not interested in the lower bound. We’re only interested in the upper bound in how good it could be. And so you just always play the machine with the highest upper bound. This is an idea which I think elegantly synthesizes exploration and exploitation because something that you have less information about is going to naturally have wider bounds. As you learn more and more about it, those bounds are going to tighten. I think it’s a really sort of beautiful way of synthesizing, both the idea that we want to optimize for quality, but we also want to optimize for information, and bringing that together into a single idea.
Brian Christian: I think it’s also, there’s just something kind of poetic about the idea that it’s essentially the rational case for optimism; that you are only interested in the reasonable best case scenario. In some ways, you almost don’t even care about what you expect will happen.
Robert Wiblin: Mm-hmm (affirmative)
Brian Christian: And I think there’s a principle there which I just kind of find encouraging. It’s one of these results that you feel, sort of, happy, knowing that that’s the case.
Benjamin Todd: And wait, so … yeah, if we zoom out a little bit, you can imagine, you’re about to pull one of the levers. Your best guess is that every time you pull a lever you get $10.00, say. So, that’s the expected value of the lever. But then, you’re saying, now you want to think about, what’s my upper 10% confidence interval? So, I think, maybe there’s a 10 % chance that I actually got $15 from this lever. If it turns out to be better than my best guess. Then, you want to do that for all the levers and go for the one where you think there’s your kind of, 10% level is actually highest, rather than what your best guess is highest.
Brian Christian: Yeah, that’s right.
Benjamin Todd: And, I mean, I guess there’s a tension in the literature on whether you should be kind of using, like, 10% confidence interval or 50%, or …
Brian Christian: Yeah, the paper where they, kind of, make the proof that this is regret minimizing uses what’s called the Chernoff-Hoeffding bound. So, I can give you the exact prescription, which is, you want to play the machine that maximizes your expected value plus the square root of two times the natural log of the total number of handles that have been pulled divided by the number of times you’ve pulled that handle.
Benjamin Todd: So, the number of times …
Robert Wiblin: I see why you did not put that in the core of the book.
Brian Christian: Yeah, some of this stuff ends up buried in endnotes for a reason.
Robert Wiblin: Yeah.
Benjamin Todd: This is the number of handles you pulled in the past so far?
Brian Christian: Yes. So, it’s the square root of a fraction … the top of the fraction is two times the natural log of the total number of pulls in the casino divided by the number of pulls of that specific machine.
Benjamin Todd: And that’s not the number of arms, it’s the number of past pulls you’ve done so far.
Brian Christian: That’s right.
Benjamin Todd: Okay. And so generally that means that the confidence interval is, you’re using a narrower one over time?
Brian Christian: Yeah. And so that’s gonna go down …
Benjamin Todd: Which is, again, the same characteristic we just covered.
Brian Christian: Yeah, exactly. I mean, that is the specific bound that they used for their proof, but I think the intuition is …
Robert Wiblin: Is pretty clear.
Brian Christian: Yeah, and it withstands you using different statistical measures of upper confidence. I mean, I think it’s also just an intuitive idea of when you’re in a situation, what is a reasonable best-case scenario? You know, if I go out to dinner, the reasonable best-case scenario is not that my dinner companion gives me a million dollars, but it might be that they give me an idea for a book that I am going to write. So you know, I think some of these things cash out into more intuitive notions of what the upper confidence interval would be, but as an idea I think that it’s pretty robust and kind of suggestive across a much broader swathe.
Benjamin Todd: And like, thinking about careers again a little bit, the idea just on that intuitive level would be consider the career which might plausibly turn out to be best rather than your best guess on which one is better?
Brian Christian: Yeah and I …
Benjamin Todd: So, if you’ve got two, which are kind of maybe roughly think they’re similar, but one you could see that could be this amazing scenario and the other one doesn’t have that amazing scenario thing than you should probably try out the amazing scenario one first.
Brian Christian: Yes.
Benjamin Todd: And is the intuition behind this optimism kind of heuristic? That one way of seeing why that makes sense is that if you do the optimistic, kind of, the thing that’s plausibly best, that that turns out not to work, then you can just switch to something else.
Brian Christian: That is exactly right.
Benjamin Todd: Then, if it turns out to work then you’ve made this amazing discovery. You’re now on this really good path and you can just carry on with that. So, like, it pays to be optimistic earlier because it might let you play out this amazing thing that you would’ve missed otherwise.
Brian Christian: Yeah, that’s exactly right. Yeah, your point that the costs are limited is, I think, an important subtext here. So, in the classic version of the multi-armed bandit problem, the machines either pay out some fixed amount or they pay out zero. So your losses are bounded. You know, if you’re in a world where the machine might explode and kill you and then you can’t continue gambling on anything, then you probably do want to consider the lower end of the confidence interval.
Brian Christian: So, it’s partly a function of the nature of the canonical multi-armed bandit problem that if you put a dollar in the machine that your losses are bounded at $1.00. It’s also one of the assumptions of the problem that you can just effortlessly walk over to the next machine. So, in some ways, the maximum cost for trying something and concluding that it was a waste of time is $1.00. In reality, of course, it may take you much more than a single metaphorical pull of the lever to determine that a career isn’t for you, or it may be more difficult to switch back to what you were doing before, after you’ve left an organization or something. There are versions of the multi-armed bandit problem that include what are called switching costs that add friction to these things. We can include some links if people want to go into that literature, too. That’s one of the variations that the book considered.
Benjamin Todd: Yeah, that would be very interesting. I mean, also, the issue of lower confidence interval might maybe be important is maybe a bit separate. Because you say, all you’re losing is the money you could’ve gained on that lever, but in a real career decision, you can actually lose more than you put in. So, you go into a job and you turn out to hate it. You get depressed and burned out. Then, you’re actually in a worse position than where you started rather than just getting zero instead of some positive payoff.
Brian Christian: Yeah.
Robert Wiblin: You’ve gone backwards.
Benjamin Todd: Or maybe even we’re thinking more on a social impact. Whenever you can imagine … in some areas it’s easy to make things worse rather than better.
Robert Wiblin: Mm-hmm (affirmative) right.
Benjamin Todd: So like trying to do policy change, it’s very easy to have unintended consequences at that. You might actually again make this area of the problem you’re trying to work on worse rather than better. So, that was one of our questions. How would you factor in, kind of, you could get negative payouts rather than just 0 or 1 payouts.
Brian Christian: Yeah, I think that’s an important question. So, I tend to make the reverse argument. Just thinking about an individual employee trying to decide what career is best for them. One anecdote here is, a good friend of mine was an engineer at Google and he was trying to decide whether to leave Google and start a startup. His manager said, well, you know, you’re on this great trajectory, you’re making all this money, do you really wanna try something that will in all likelihood fail? Then where will you be? He said, well, come on, you and I both know that if I fail and I come back to you in 18 months time and want to rejoin your team, are you gonna say no out of spite? And the manager was forced to admit that, no, in fact, he would gladly take him back at his, you know … existing salary if not more, and so forth … I think that’s an example where people can get a little bit spooked and perhaps overrate the downside.
Brian Christian: So, stepping away from a job for a year, crashing and burning in the startup game and then getting right back in where you left off. I think that’s really an argument for being willing to take that risk. In fact, I counseled my friend, literally with the explore/exploit literature, and said, I think you should really pull that new lever. I think from an employee’s perspective it makes sense to be fairly optimistic. I think most people, in my experience, if anything, are not optimistic enough. I think from an organizational perspective, especially if you’re doing some kind of major intervention that could have some huge unintended consequence, you know, you go into some country and you give everyone free wheat, but then you destroy the local wheat economy or something like this. That’s certainly a case where you’re in something that probably doesn’t really represent the multi-armed bandit problem at that point. You’ve unintentionally imploded the casino. Or something like this … You’re probably closer to something that’s an MDP and that’s a whole other kettle of fish.
Benjamin Todd: What does that stand for?
Brian Christian: It’s Markov Decision Process.
Benjamin Todd: Okay.
Brian Christian: So, an environment where the actions you take change the state that you find yourself in. So, one of the nice things about the abstraction of the canonical multi-armed bandit problem is that your actions don’t really do anything to the environment. Like, you’re … you get x money or not, but then you’re right back where you found yourself. In something like a Markov Decision Process, you know, you take some action and … if you think about an Atari game as a Markov Decision Process, you use some power-up and now you don’t have it. So, you’ve changed the set of options that you have or the state that you’re in. That’s just an even more complicated domain. So, I think identifying … this goes back to this question of trying to identify the situation that you find yourself in and asking yourself if this feels like a multi-armed bandit problem, then, let me kind of painlessly and cheerfully explore a bit because the downside is capped.
Benjamin Todd: Yeah, I mean, I totally agree when you’re thinking about normal career decisions. People maybe don’t appreciate that the downside is relatively capped and it’s okay to explore more than people often do. But, yeah, I think it’s when you start to think about some more of these social impact issues. You could imagine often dealing with these cases where there could be like, really good upsides or significant downsides if you’re like, finishing through a major policy change or something like that.
Brian Christian: Yeah.
Benjamin Todd: It’s more like a multi-arm where the machine could actually like, force you to pay. You pull it and it’s like, 9-10 and now you like, owe the machine money.
Brian Christian: Yeah, exactly.
Benjamin Todd: Or it could be like plus 20 or something.
Brian Christian: Yeah, I mean, the other thing that’s worth unpacking here is that the other assumption in the multi-arm bandit problem is that you get the feedback immediately. It’s not like you pull the handle and ten years later, a check comes in the mail, it’s for 10 cents or whatever. This is something that may or may not be true in a lot of situations, right? So, one of the reasons that tech companies really like multi-armed bandit algorithms is, for something like ad optimization, you show an ad, the user clicks on it or not, and you’ve gotten that feedback immediately. So, you really can model that as a multi-arm bandit problem. The feedback is instantaneous. So, you can really adapt and adjust your ad probabilities basically in real time.
Brian Christian: So, as some of these ideas I mentioned earlier are making their way into the medical literature. In something like a clinical trial, the first clinical trial to use what’s called an adaptive method, which is basically, you’re changing the percentage of people that are receiving the experimental drug versus the conventional drug in real time, rather than waiting until the end of the trial. The first case in the medical literature that used this was for something called ECMO. This was back in the 1980’s. Infants that were going into pulmonary arrest and their lungs were stopping. The conventional treatment was really bad. It had only worked I think something like 60% of the time. So, someone got this idea, we want to try this new crazy experimental technique called ECMO. We think it could work considerably better. It could also be a total disaster. It has a risk of embolism and all of these things.
Brian Christian: One of the reasons that, just from a formal perspective, it made sense to use some of these multi-arm bandit algorithms, is if someone goes into pulmonary arrest, you either save their life within five minutes or they die within five minutes. You know, obviously it’s a tragic scenario to have to deal with, but in some sense there’s this mathematical silver lining, which is that, it makes it much easier to rapidly identify whether some new technique is better or worse than the status quo. You don’t have to administer it and then track these people longitudinally over the rest of their lives. So, this is another one of these parameters where you can sort of identify, am I getting immediate feedback? If so, then this is more like a multi-arm bandit problem. If I’m not, then I may want to adjust my strategy and not rely so heavily on that framework.
Benjamin Todd: Okay, so you’ve covered a couple of different complications. One is that you might sometimes have negative payouts. Now, we’ve just covered, you’ve gathered some kind of imperfect information, imperfect feedback, might take them several years to really figure out how a certain path unfolded. And you also mentioned just earlier you’ve got switching costs.
Brian Christian: Right.
Benjamin Todd: Where in practice and real life you can’t just switch between the arms, you have to like, do a whole job application process, which takes you months.
Brian Christian: Yeah.
Benjamin Todd: I mean, do you have any intuitions about how these might affect which strategies are best? I mean, my guess is it’s gonna generally mean you should do a bit less exploration because the costs of exploring are higher. You’re getting less information and you’re less able to use the information because you have to switch.
Brian Christian: Yeah, that’s exactly right. So, the intuitive answer is that’s exactly right. So, the higher the switching cost, the more reluctant you should be to abandon an option even if it seems like it’s not working. Or, the more reluctant you should be to try something frivolously because you’re gonna pay that switching cost twice. Once to go in and another time to get out. Another thing … another assumption that is kind of underneath this whole conversation is that the quality of the options that we’re evaluating is static. The restaurant doesn’t fire their chef and get a new guy who’s not as good. Or the company doesn’t lose its way or change their management or whatever.
Brian Christian: So, when the payout probabilities of these different machines can change, then you find yourself in what’s called the “restless bandit problem.” Which is NP-complete and it’s … there’s no effective solution that’s gonna get you there all the time. For me, there’s an interesting footnote here, which is that, people actually seem very good at dealing with restless bandit problems in practice. Here’s a case where the computer scientists are in fact turning to the cognitive scientists and saying, how are you guys modeling the human decision making process? Because it seems that people have a really good heuristic for dealing with this, like, known intractable problem. We’d love to know what it is because that’ll give us some insights that we can use in a purely computational context.
Benjamin Todd: So, if I remember from the book, when you give people multi-armed bandit problems in a lab, they actually explore too much?
Brian Christian: Yes.
Robert Wiblin: Mm-hmm (affirmative)
Benjamin Todd: Which I found very surprising because normally it seems the general theme in this kind of literature is like, people don’t really explore enough, they stick with the status quo, they have sunk cost fallacy … but actually here, they should’ve just carried on pulling the best guess lever and they kept switching.
Robert Wiblin: I’ve got major objections to that experiment.
Brian Christian: Oh, really? Okay, great.
Robert Wiblin: Yeah, but maybe set it up first.
Brian Christian: Okay, so yeah, let me tee it up.
Brian Christian: So, one of the canonical experiments in this area was done by Amos Tversky in the 1960s. The basic idea is that you have a box with two different lights on it. You have an option to press a button and either observe which of these two lights comes on; or make a bet on which of the two you thought was going to turn on, but you don’t get to observe it. So, you don’t know until the end of the study whether your bet paid off or not. And I think these lights, one of them lit up 60% of the time, the other went 40% of the time. I believe participants were told that, but I’m not 100% sure about that. So, the basic idea is, again, how do you want to maximize your total take, your total earnings over, I believe in this case, a thousand trials? It turns out that the optimal strategy is, observe the first 38 times, and then blindly make a series of 962 bets on whichever light happened to have come on more in those first 38 and then you’re done.
Brian Christian: Is that what human subjects did? No, not even close. People would observe for a while, bet for a while. Observe a little more again and then bet a little bit more again. I wanna say, on average, people observed 500 times instead …
Robert Wiblin: 505.
Brian Christian: 505?
Robert Wiblin: 505 out of 1,000. Yeah.
Brian Christian: Yeah. And so this is a case … I mean, my read on this is that the participants were told that these probabilities were fixed, but for me to be a bit more sympathetic or charitable towards the subjects. They knew they were in a psychology experiment. There’s a long story history of being lied to by experimenters in psychology studies. So, they didn’t necessarily want to take the experimenters’ word for it. So, they were effectively acting as if they were in a restless bandit problem where let’s say the payoff probabilities are on a random walk, they can go up and down.
Benjamin Todd: That’s what I was just thinking. Maybe people explore more because they think the probabilities might be changing.
Brian Christian: Yeah, so … that’s one way to model the data that they saw. Was, people were establishing a certain level of confidence that enabled them to switch into this betting mode, but as time went by, their uncertainty started to grow. Once it hit a certain threshold, they gathered a bit more information.
Benjamin Todd: And it makes sense because real life is a restless bandit problem rather than a multi-armed bandit problem. Because especially in careers, like, the landscape is always changing. So, maybe our intuitions have evolved more to deal with that one rather than the more kind of artificial where everything’s stable situation.
Brian Christian: Yeah.
Robert Wiblin: So, this experiment is terrible. That’s one of the issues, I think. Yeah, they said it’s stable, but they might not … even if they believe them, all their intuitions about how much to explore and how much to exploit are based on life where things are changing all the time. So, it’s impossible for it to get through to their intuitions. Even if on an explicit level, they kind of believe it; but that does explain why they alternate between exploring and exploiting rather than just like, dual explore and then all exploit.
Robert Wiblin: But the much more severe issue is that in this experiment is that while you’re exploring you didn’t get any benefit. You couldn’t draw any benefit from the levels that you’re pulling. Which seems very artificial, not like a typical case at all. If you were able to derive the benefit which was only between like, 40% or 60%, depending on the lever, that reduces the cost of exploration so much that it wouldn’t surprise me that if this, like, 500 out of a thousand sample wouldn’t, while you’re exploring, wouldn’t be kind of reasonable. Again, like, people’s intuitions all gonna be about cases where, while you’re exploring you derive benefit.
Brian Christian: Yeah.
Robert Wiblin: Then, like, another issue is that they made the differences between the levers really small so it’s like 40% versus 60%. Like, in real life, things like are often more varied than that. So, people’s intuitions again are more in favor of exploration because the difference of the odds are so limited.
Brian Christian: Right.
Robert Wiblin: Oh, and another objection is that …
Brian Christian: Keep going, keep going …
Robert Wiblin: Is that people maybe just enjoyed the novelty of trying the levers and exploring rather than pulling it and not seeing any response at all. Because in the exploit phase you don’t find out whether you’re benefiting. Whenever does that happen? It’s all so artificial. You’re … so you’re exploiting and you don’t even find out whether you’re winning? That’s like, very weird. I think this whole thing was … the deck was completely stacked to produce this surprising result. But I don’t know we can learn anything about the real world from this difference between the theoretical optimum and what people do.
Brian Christian: Yeah, I think that’s all right. I mean, you know, imagine a stock market in which as soon as you buy a stock you cease to know that value of that stock. I mean, it’s just very strange.
Robert Wiblin: Right, yeah.
Brian Christian: It’s … I would be hard pressed to even think of an analogy. I think also, the value of exploration, even in this context, carries over beyond the walls of the game itself. If you were expecting that you might be asked to do a different version with a different box, then any understanding that you gained in the first condition might be useful in subsequent conditions. So, I mean, in general, I think there’s a lot of evidence that humans and animals are designed to get pleasure from learning how things work. So, it makes sense that part of what you want to do is be like, okay, I’m in this new environment, this new contraption. I don’t really know how long I’m going to be in this situation or how many other similar situations I’ll be in, so let me just try and figure out what’s the deal. That seems totally reasonable.
Robert Wiblin: So, the ideal strategy is that the person should sit there and press a button without getting any feedback 962 times in a row. That sounds very boring and I think maybe…
Brian Christian: Yeah.
Robert Wiblin: Would you actually do it? You’re literally just pressing a button. Yeah, I dunno. So, this is an objection that people have to the biases literature more broadly. They set up these incredibly artificial scenarios where the deck is stacked towards people’s intuitions about the cases being bad and then they’re like, oh, people don’t do the theoretical optimum in this stupid game that I created to engineer that result. It’s not always like that. You could take that criticism too far.
Robert Wiblin: But many listeners know there’s this whole other school called the Heuristics. So, there’s the biases school and heuristic school. The heuristics people … so actually people are incredibly good at answering these very complex questions in a good enough way and that the people who are focusing on how we’re biased are like, picking up edge cases, particularly unusual cases where people’s intuitions, the heuristics they’re using, don’t work. But those are the odd cases rather than the raw.
Brian Christian: I think that’s right and I would just add, too, that there’s this second argument that I think resonates with the heuristics school. So, there’s one argument you could make which is just yeah, evolution kind of tuned our parameters for a certain type of environment. Surprise, surprise, when we’re in a totally different environment we don’t do the right thing. So, yeah, that makes sense to me. In general, computer science has a lot of what’re called no-free-lunch theorems that basically say, if you optimize for a given environment, you will necessarily be worse on other environments that aren’t like that. There’s often no way to improve uniformly across all environments.
Brian Christian: There’s a separate argument, I think, that also goes in the same direction, which is simply, you are paying a cost to think; you’re paying a cost to deliberate, to hesitate. So, part of what we’re trying to do at the broadest level in this book is paint a more recognizable picture of rationality that takes computational constraints into account and says, once you start to think about information processing itself as a cost, you end up with a notion of optimality that does look a lot closer to some of these ideas that come out in a heuristic context.
Brian Christian: There’s this concept in experiment design called information leakage, which is basically, the subjects gleaned more than we strictly told them. It’s very difficult to actually grapple with that. We interviewed one researcher who studies optimal stopping problems in human subjects and he says, “Yeah, it turns out that our subjects were just getting bored. It’s not irrational to get bored, but it’s hard to model that rigorously.” I think, you know, in general, when there’s a conflict between our models of what decision making should be and what people actually do, we have a choice. We can say, oh, people are stupid or irrational or that they have these heuristics that are tuned for a different environment. Or, we can say we have the wrong model, we have an incorrect formal description of what it is that these people are doing or the problem that they think they’re solving.
Robert Wiblin: So, if I could just recap with the explore/exploit section. So, we talked about the Gittins index, then epsilon decreasing strategies, and then kind of a variant on that is the upper confidence bound algorithms, which is kind of appealing because it seems like it will be easier to apply in everyday life; to think about, you know, what would be a very good case here. Not the very best imaginable, but a very good case and then always go for the thing that has the highest, very good case. At least early on in your life, maybe later in life not so much. You need to be more reasonable, more realistic. Then, there’s various different issues that arise when thinking whether these models are good description of real life. So we’ve got the discount rate.
I’m not too bothered by that, because I think at least early in life people probably should just have a geometric discount rate, then they would have to choose what that discount rate is. Discount rate’s one. Then you’ve got a question of not really knowing how long or like how many pulls of a lever you’re going to get in your life, like, yeah, how long do you have to spend at a job before it counts as like a pull of the lever and you’ve got the measurement, so there’s a bit of like arbitrariness there.
Robert Wiblin: Then you’ve got some more severe issues that nothing changes in these environments, and like we don’t have simple algorithms once things start changing, which is how the world is, and also you’re not changing the environment at all, which in some cases would be quite important in real life. Uh, you’ve got switching cost potentially, so changing job is difficult, whereas that’s–although that seems like you can modify the algorithms, it sounds like, to account for switching costs, but we’ll have to look those up …
Benjamin Todd: Explore a little bit less. Yes.
Robert Wiblin: Yeah. Explore a little bit less is the rule of thumb there. Then you’ve got, it seems like some of these describe cases where you pull a lever and you either get one or zero, or I guess like one or minus one, because you have to pay or something to pull a lever.
Brian Christian: Yeah, this is called the Bernoulli Bandit, by the way, if people are interested.
Robert Wiblin: Yeah.
Brian Christian: You either get zero or one. Yeah.
Robert Wiblin: But in real life we talked about how like there’s downsides, not only upside, but that doesn’t seem like it’s so severe because we just shift the distribution. Yes, you’ve got like, one or zero are the outcomes. But you could, imagine it could be a normal curve, a normal distribution of outcomes, or perhaps a log-normal distribution, so like much more spread out. Or it could be power-law distributed, so like very massively different outcomes depending on the option they choose. And all of those mean there’s more variance in the outcome, so you have to explore more, and also there’s like more risk of choosing one early that misses the top tail, misses the one, in fact has the highest expected value, but you didn’t realize that because you didn’t sample enough to pick up the upper-best tail, or potentially the lower-terrible-case tail.
Brian Christian: Right.
Robert Wiblin: So … And then. Another one is that all of these are kind of modeling you as coming in with no information, so perhaps you just have a uniform prior-belief about the possible different outcomes. That the levers have … Whereas in real life, almost everyone listening to this is going to be at least sixteen, and they have kind of a model of the world, of what’s the plausible distribution of outcomes of different actions that they can take.
Benjamin Todd: Yeah, but you’re factoring that in like, say with the upper confidence interval one, you’re using everything you know at that point to make your best guess at what the upper confidence interval is.
Robert Wiblin: Yeah. I agree in principle it’s incorporated here, but we haven’t really talked about what things would look like if you’re like seven-hundred through an eight-thousand draw. Like most of the tables describe, like after three pulls, like when you’ve got, say, two wins and one loss, whereas in real life it seems like we have much thicker information than that–we’re very rarely coming in blind, and so it might be better to model it as like a Bayesian issue where you have like a prior and you update based on each pull, which I think may well end up resembling solutions I’ve got in here anyway.
Brian Christian: Yeah, I mean of course, yeah in real life you’re … You’re making judgements not only about the machine you’re pulling, but also about the nature of the game itself that you’re playing, so if slot machine A pays out much less well than you thought it would, you might start to extrapolate and be like, “oh, maybe slot machines just aren’t as good of an investment as I thought they were.” And you see this in people who are very superstitious about gambling, where they’re sort of promiscuous in what they attach their success and failure to. They get some payout, and then they update their priors on the value of wearing their lucky baseball cap, but also the value of it being 12:04 PM with the sun at this angle, and being at this particular machine. And so, yeah, I think all of these things, needless to say, point towards the enormous complexity of what real-world decision making actually looks like in most cases.
Benjamin Todd: Just with the, the restless bandit problem which is where the payoffs are changing at the different arms. So could I just recap that you were saying that, actually, people are almost better at doing that than these simple algorithms we’ve developed? Or … ?
Brian Christian: Yeah. I mean the restless bandit problem is what’s called intractable, which means that there is no efficient solution to the problem—“efficient” has a technical definition which we can look into if you want but people seem in a way untroubled by the daunting formal complexity of the problem and they just do stuff. And the stuff they do seems to work. And this, I think, has created a certain amount of interest in the computer science community of trying to figure out “how can we characterize, you know, a computational model of what people are actually doing, and is there a rigorous way to analyze just how good their instincts actually are? Can that lead us to, ideally, some sort of algorithmic breakthroughs that we can then use in practice?”
Benjamin Todd: But are there any rules-of-thumb about ways we can modify some of the things with the algorithms we’ve seen earlier? That would still get you like a better-than-random payoff when doing restless bandit problems? I mean, it sounds like one thing again is like you should be a little bit more keen to explore.
Brian Christian: That’s certainly true. So the more, if you think about this at the, in the limit of a completely random environment, then you might as well just pull the handles at random …
Benjamin Todd: Yeah
Brian Christian: … If the payouts are just jumping all over the place. And so, yeah, in general it is true that the more volatile the environment is, the more restless you should be yourself, and not settling for something and not kind of continuing to act on stale information. So it makes sense … I mean, I might have to check the literature on this, but I would imagine that the Win-Stay, Lose-Shift principle is still reasonably better than chance even in the restless condition because if something paid out, you’d … you know, it’s a reasonable assumption that you should pull it at least one more time. So there are, I think, very basic heuristics that hold, but in general it is true that the more restless the environment, the more restless you should be too.
Robert Wiblin: You’re going to end up basically discounting old information. Old pools get weighted less in measuring like the fraction that it succeeded, so you get some kind of moving average. But, yeah, I guess for some reason that ends up like being computationally intractable.
Brian Christian: Yeah. And I mean you also can consider like, do you know going in how restless the environment is? Or are you building your model of the noise in the environment based on your experience? Which is obviously even more complex.
Benjamin Todd: Maybe as a way of kind of summing up the discussion as well I’d be interested to talk more about trying to get very concrete about specific career decisions.
Brian Christian: Yeah.
Benjamin Todd: And you know like, it does seem like in a way you could think of, well you have all your different career options open to you, and one way of thinking about it is each career stat takes like one to three years, which is kind of like, a job …
Brian Christian: Yeah.
Benjamin Todd: And then you have a forty-year career. So you’ve got, you know, ten or twenty pulls of the lever. And then the question is, you know, which one should you go for? And I just wondering if you wanted to say like, how we might attack that based on some of the models we’ve covered?
Brian Christian: Yeah. I think, I mean it’s also, I can’t help thinking about this in the context of my own career. You know, I don’t know how illustrative that is, or how useful that is to listeners, you know from just this anecdotal perspective of how I became a writer, but I was very conscious of the idea that I would take a crack at writing as a profession and find out fairly quickly whether I would succeed or fail. And then just do something else.
Brian Christian: So, in my case, having a computer science background, it was … Well, I can always just roll up to some, you know, big corporation and get some job, and so I didn’t have to worry about becoming destitute if I failed in my writing ambition. And so, speaking personally, I felt very emboldened by that to do something very risky and try to write a book proposal and so forth.
Benjamin Todd: That was a little bit like the upper confidence sense, for where you’re like, being a writer would be a real like kind of dream job for me; I’m not really sure if I could make it work, but, like, it’s worth giving it a go and I can always just switch back to the kind of like normal job path after.
Brian Christian: Yeah. And I remember having a conversation with my undergraduate writing mentor, and I was talking to him about should I go into graduate school and so forth, and his advice to me was “I highly recommend that you only go to graduate school if you can go to a program that’s funded, because part of what you are trying to do is make a life as a writer, and if you graduate from even a really good program with, you know, fifty-thousand dollars or a hundred-thousand dollars of debt, then that is going to rapidly put pressure on you to either immediately professionalize as a writer or immediately abandon that path because you’ve got to make your loan payments.”
Brian Christian: And I thought that was really kind of astute advice, and that’s not the kind of advice that fits on the axis of, you know, “go for your dreams, or not.” I thought it was very pragmatic, and sort of had this eye to the option-value of being in a position to make slightly risky moves as an adult. I thought that was really kind of astute advice. So that’s something that I think people can think about from the perspective of making those really early decisions about whether to get–you know for example if you get a law degree or a medical degree, those degrees are so expensive that it is very hard to do anything other than law or medicine, in part because you need to pay off your law and medicine training and those are generally lucrative ways to do that.
Benjamin Todd: That’s a good example of a multi-armed bandit that kind of has a negative payoff, because if you go and try and qualify as a lawyer and then you realize you hate this, you’ve actually invested a ton of money, so you’re worse off than when you started.
Brian Christian: Yeah. Yeah. And so, I mean, I think you guys and the eighty-thousand hours community surely have thought more about this than I have explicitly, but I think, on the whole, people probably spend less time testing those waters than they should. Particularly because they come with these big switching costs.
Benjamin Todd: Well, yeah, so the advice in our career guide that’s currently up is, kind of like, if you are pretty confident that this path seems best, then like probably figure out how to go for that, but obviously have a back up plan, but you know go for your main-line option. But if you’re uncertain, which many people, if you feel you’re uncertain which many people are then we encourage people to make a plan to try out several things over a few years, and one way you can do that is, before graduate school you often have like a couple of year period and then you can kind of do something a bit different and then you can go to graduate school, and …
Brian Christian: Mm-hmm (affirmative).
Benjamin Todd: That’s a way of kind of ordering things that lets you try out some things. But then when I actually read the book and thought about upper confidence in schools, I wondered if the advice of kind of “go and try out a couple of things” is not actually quite the right advice. Instead, you should think “which things seem plausibly best,” and then just do that straight away. [Laughter] And like switch later.
Benjamin Todd: I mean, that’s obviously ignoring many of the complications we’ve covered, but it made me, made me pause for thought that maybe the advice should be along the lines of “do the kind of plausibly best thing, rather than kind of plan to try out lots of things.”
Brian Christian: Yeah. Well, in a way, you’re describing the tension between epsilon-decreasing and upper confidence bound …
Benjamin Todd: Yeah.
Brian Christian: … And you can be buoyed by the fact that they’re, that they both offer you, you know, asymptotically logarithmic regret. [Laughter]
Brian Christian: They’re both part of …
Robert Wiblin: You can sleep soundly in your bed knowing that.
Brian Christian: Yeah. Yeah, actually I’ll mention one, a third algorithm that’s in that same family which I think is intuitive, which is called Thompson sampling, and that is “do something with the percentage of your energy or time or money that is the likelihood of it being the best thing.” So if you’re ninety-nine percent sure that you want to be a doctor, then spend ninety-nine percent of your time being a doctor. If you’re fifty-percent sure then spend half your time. And I think it’s just this wonderfully intuitive idea. And it fits perfectly within a Bayesian framework.
Benjamin Todd: Fifty-percent of your time over what time horizon? Like … ?
Brian Christian: Well, again, this is in the multi-armed bandit problem, so it’s just your next pull.
Benjamin Todd: Okay. Yeah. Okay.
Brian Christian: With probability point-five, you pull that, and then you get feedback and then you re-evaluate. So it’s a little bit different again in a sort of slower feedback mode. It does seem though, I mean in the context of advising someone that’s really young, to think about, you know, information gathering for its own sake. So if you’re someone who’s twenty-four, you’ve been doing, you know you’re in your first job out of college, and you really like it. In some ways there’s this argument, at least from, you know, epsilon-decreasing, that says, “I don’t care how good it is; try something else anyway.”
Brian Christian: You’re at that period of time where that’s what you need to do, is just try stuff.
Benjamin Todd: Yes. That’s why I was going to zoom back, like, if we had, say, if you’ve got ten or twenty pulls of jobs over your career, I mean the epsilon-decreasing advice is, like, you know for my next career stuff I should almost, I should basically flip a ten-sided coin, and if it’s like twenty-percent of the time I should go and do some like random other option, and otherwise I should carry on with the thing that I think is best, which, you know … You sometimes see people doing advice like that where they’re like “Well I’ve been in this thing for a while; I’m not really sure it’s doing something for me, so I’m just going to go and like do this like pretty unusual different thing and see where it takes me.” But it, on the other hand, feels like very counter-intuitive advice just to like do a randomly-chosen different job, like some fraction of the time.
Robert Wiblin: I think a case where that doesn’t work too well are industries where it’s kind of winner-takes-all. So in order to get anything you have to be the best. And I guess writing is actually a little bit like this.
Brian Christian: Yeah.
Robert Wiblin: Music, academia to some extent, if you’re doing your Ph.D. In those cases, exploring too much, or trying out lots of things and basically conceding failure that you’re never, that you’re not going to become the best musician if you like only expend a third of your time doing music because the competition’s so harsh …
Brian Christian: Mm-hmm (affirmative).
Robert Wiblin: … And people only want the best, whereas there’s other cases where exploration works okay because you just get kind of linear returns to like being better.
Brian Christian: That’s right. So I mean this is sort of a case where the machine, the payout on the machine grows with the number of times you’ve pulled that handle. If you pull the, like playing-the-violin handle, the first time you just get, you know, nothing. But the ten-thousandth time … You get an angry phone call from your neighbors.
Brian Christian: Yeah, it’s a, I mean that’s yet another way in which you know the multi-armed bandit framework is sort of an imperfect lens for thinking about some of these things. And I think in general I also find that this is something, I mean, yeah not to preach too much, but I mean younger people don’t quite appreciate the degree to which career paths put you by default on a trajectory where doing more of that thing becomes increasingly attractive, and doing other things you know less-so. Most corporate jobs are structured in this way very, I think, cannily so.
Robert Wiblin: They put you in “golden handcuffs”, I think is the expression.
Brian Christian: Exactly.
Robert Wiblin: Or like you’re always waiting the next six months to get the bonus from the previous year.
Brian Christian: Yeah. I mean even in the structure of the way that jobs are set up, so I mean just, this is an anecdotal example, but, being an author, you have this funny kind of life-rhythm where, you know, a draft will come back from your editor or your proofreader or something like this and you’ll have to work fourteen hours a day seven days a week for two weeks, and then you hand it back in and then you have nothing to do for two weeks. And this goes on a few times. And then the book goes to production and you have, you know, six months of relative peace, and then this huge publicity tour.
Brian Christian: It is not the kind of thing that you can do while smoothly segueing into your other job … You know, let’s say you want to switch careers so you take a full-time position, but you say to them “I have this publicity tour where I’m going to need to be away for like six weeks, you know, three months from now.” That’s not going to go over particularly well. And so, I mean that’s just an anecdotal example, but you start to notice that this particular window of time is exactly the right amount of time to start researching a new book proposal, and then do the publicity and then go back to researching the next book.
Brian Christian: It’s not as conducive to getting, you know, a full-time job at the corporation. So I think a lot of careers, they’ll have their own version of this, where you find yourself, you know, you open this door …
Robert Wiblin: It’s path dependency.
Brian Christian: That’s exactly right.
Robert Wiblin: So, yeah, podcasting is a bit like this, because, like each episode you gain more subscribers, and so each episode is more valuable than the last because more people hear it. So you usually end up in a situation where I never should have started this …
Robert Wiblin: … So you could easily end up thinking, “Well, I never should have started, but now that I’m here I should definitely continue.”
Brian Christian: Yeah. That’s right.
Benjamin Todd: Okay. So we’ve started to talk about careers where you kind of have to commit to them and once you get off it’s hard to get back on. And so maybe these might be better models as optimal stopping problems, which is another really fascinating chapter you have in the book. And so maybe we can start by just quickly saying how it’s different, and then …
Brian Christian: Mm-hmm (affirmative).
Benjamin Todd: … Then some of the approximate solutions to those as well and how they might apply.
Brian Christian: Yeah. Great. So there’s a second genre of problems that are called optimal stopping problems, and this has to do with being presented with a sequence of opportunities, one after another, and at each point in the sequence you either commit to that particular option, at which case the game’s over, or you decline and continue to progress through the sequence but, critically, you can’t change your mind and go back. And so the canonical optimal stopping problem is what’s called the secretary problem, and the basic idea here is, you imagine you’re hiring a secretary, you field n different candidates, they show up in a random order, and then you evaluate them, you interview them one after another. And because of whatever constraints, you either have to hire that person on the spot and dismiss everybody else, or you send them away, in which case you lose the ability to change your mind and hire them later.
Brian Christian: And so the problem here is how do you attempt to hire the very best candidate in the pool, given that you are establishing a baseline essentially as you go? And so there’s a risk of course that you stop too soon. There’s a risk that you establish too high of a standard and then no one after that point exceeds it. And this is another one of these math problems with this kind of wonderfully colorful history through the mid-twentieth century, and it also has this wonderfully elegant solution, which is that you should spend exactly one over e, or approximately thirty-seven percent of your search, just establishing a baseline. So interview thirty-seven percent of the candidates without an intention of hiring any of them, no matter how promising they seem, and then, after that point, be willing to immediately hire the next person who’s better than everyone you saw in that first thirty-seven percent.
Brian Christian: And this is due to a fascinating mathematical symmetry–your odds of success in this scenario are also one over e, or thirty-seven percent. And that in itself is kind of an intriguing detail, which is that following the optimal strategy you still fail sixty-three percent of the time. It just turns out to be a hard problem. But the optimal strategy and the odds of success are identical regardless of the size of the pool. So as n goes to infinity you still want to follow this thirty-seven percent rule, and incredibly you still have a thirty-seven percent chance of success. Even if the pool is, like, a million people, which seems crazy. You know, given random chance you would only have one-in-a-million shot of identifying the single best candidate out of all one million.
Robert Wiblin: I suppose it’s cancelled out by the fact that you get even more time to collect evidence or something like that? So …
Brian Christian: Yeah.
Benjamin Todd: I mean you have a really nice explanation of how the derivation works in the book, so really encourage people to check that out.
Brian Christian: For people who really want to like go down the calculus wormhole, yeah.
Benjamin Todd: So with how this might apply to career decisions is, you can kind of imagine, you can like either, you can keep trying out jobs or you can commit to one and then you kind of run with it for a while. And that kind of approach makes more sense in these careers where you kind of can’t easily like jump in and out of them, but you have to kind of just commit and maybe you want to think about that more as an optimal stopping problem rather than multi-armed bandit, where with multi-armed bandit you can always just like switch to a different lever.
Brian Christian: Yeah.
Robert Wiblin: Perhaps the best match might be relationships, because it’s particularly hard to dump someone and then get back together with them after you’ve like tried someone else. People wouldn’t take too kindly to that.
Brian Christian: Yeah.
Robert Wiblin: It’s probably easier to go back to your previous job than it is to do that.
Brian Christian: That’s true. And this is actually something that people who studied optimal stopping model explicitly. In the literature it’s called recall, which is the ability to return to a previous candidate or a previous opportunity. And in fact in the book we tell these amusing mini-biographies of mathematicians and scientists applying, in some cases explicitly applying a thirty-seven percent rule to their dating life, with mixed success.
Brian Christian: But someone who embodies this idea of trying to return to a previous option is Johannes Kepler. So, after the death of Kepler’s first wife, he embarks on this kind of epically arduous series of courtships to try to find the perfect second wife to help him raise his kids and so forth. And he’s very frank about this in his letters, talking about, you know, he really liked the fourth woman that he was courting for her tall build and athletic body–just strange to hear this famous astronomer speaking this way. The fifth woman got along with the children; she was even better than number four, but he still persisted, and ultimately after he spends several years courting a total of eleven different women, he realizes “oh, no, no, no, it really was number five all along,” and he goes back to her and, you know, musters his best apology and says “I’m sorry for the half-dozen other people I’ve been dating in the meantime, but if you’re not spoken for and, you know, you can find it in your heart to forgive me, I’d love to get back together.”
Brian Christian: And, fortunately for Kepler, she agrees, and according to his biographers the rest of their lives is quite happy indeed.
Benjamin Todd: This is actually one of the bits I found more interesting in the book, because I kind of heard of the secretary problem before and the thirty-seven percent solution, but then you point out that if you add in the complication, that you can try to go back to a previous option, which you often do have in real life. But you say there’s only a fifty percent chance of that working, then how does that change the percentage, and if I remember correctly it means you should actually try out more like half the pool.
Brian Christian: Sixty-one percent.
Benjamin Todd: Sixty-one.
Brian Christian: … If you have a fifty percent change of your apology being accepted.
Benjamin Todd: So it means, but it means you should explore …
Brian Christian: That’s right.
Benjamin Todd: … which is intuitive.
Brian Christian: That’s right. And so it’s interesting, you know, in his diary Kepler bemoans what he calls his “restless and doubtfulness,” of, you know he’s kind of beating himself up, of “why did I keep dating all those different women when number five was so amazing?” And, if you look at the math, you know, he was following the optimal strategy, where you shouldn’t be willing to commit until you’re sixty-one percent of the way through the pool, and so … In fact, he, you know, you could argue that he was doing the optimal thing, although it certainly caused him a lot of stress over the years.
Robert Wiblin: Okay. So, in that chapter you go through progressively more versions or models of this problem, so you start out with the case where all you know is whether this applicant is the best you’ve seen so far and you can’t back-track at all, and your goal is simply to maximize the probability of choosing the very best one.
Brian Christian: Yeah, and that’s a very important thing to highlight.
Robert Wiblin: Yes, so I’ve … Yeah, I think that makes it like not terribly realistic or not similar to what people actually try to do in real life. So in the next one, you get information on where people stand as a percentile out of a full pool from which you’re drawing applicants, so you know that they’re like fiftieth percentile or seventieth percentile, and then you end up with a threshold for hiring based on how many more you’re going to see after that one.
Brian Christian: That’s exactly …
Robert Wiblin: Then there’s the back-tracking one which allows you some probability of back-tracking which suggests much more exploration. Then you add a cost to holding out, so like every period that you don’t hire someone so you don’t have the secretary or you’re single and you find that unpleasant, which pushes towards obviously hiring somewhat sooner, depending on how large that cost is. And then there’s another one where you talk about selling a house and looking at different offers on a house, where this starts I think to get a lot more realistic, because now you’re getting cardinal information about the dollar value that the different people are offering, rather than just their ordinal value. So you’re not just saying that this is better than the previous one.
Brian Christian: Mm-hmm (affirmative).
Robert Wiblin: You’re saying, uh, I’m going to get five-hundred and fifty thousand dollars of the house, and that compares with like five-hundred and twenty and I think like six-hundred thousand is the larger offer that I’m reasonably likely to get. And so that starts to seem more like a real case, and in that example I think the process that you describe is maximizing the expected value that you get for the house, the expected price.
Brian Christian: Yeah. That’s right. So I want to highlight, you know, you mention the difference between cardinal information and ordinal information, which I think is kind of philosophically an extremely deep idea. So with ordinal information you only know whether any two things are, which of them is better than the other, but you don’t know by how much. With cardinal information, you both know which of the things is better and you know by how much it is better. And there’s, for me, this fascinating kind of philosophical theme that goes through the math of “what is the value of cardinal information above and beyond just the mere ordinal information?”
Brian Christian: And, in the case of the secretary problem, your odds of success following the best strategy if you only ordinal information is thirty-seven percent; with cardinal information it’s, I believe, fifty-four percent [Fact check from Brian: turns out to be 58%!], and so it allows you to get the very best candidate in the entire pool more than half of the time. And so I think it’s just fascinating to see the additional informational value of cardinal scores being made concrete and very explicit.
Brian Christian: So in the house-buying scenario, what you’ve highlighted there is … Again, one of the things that we can tweak in an optimal stopping problem is, what is our objective function? So in the classic secretary problem, from which the thirty-seven percent rule is derived, you only care about maximizing the probability that you get the single best candidate in the candidate pool. Anything else is equally bad.
Brian Christian: So getting the second best is just as bad as getting the very worst. And so the thirty-seven percent rule is optimized accordingly. There will be a thirty-seven percent chance, of course, that you’ve skipped the best candidate in your calibration period, and if that happens then you’ll never hire anyone else, because no one else will be better, and then you’ll end up stuck with the final candidate, which is just randomly drawn from the pool.
Brian Christian: So there’s a thirty-seven percent chance right off the bat that you end up with a completely random candidate. Now, of course you may find that unrealistic, that doesn’t really model what people are seeking in that situation. And there are a number of different objective functions that people have played with. So one of the ones that’s gotten a lot of traction in the technical literature is what’s called “minimizing the expected rank.” So if you rank all of the candidates from 1 to n, what is the strategy that, in expectation, minimizes the rank order of the applicant that you end up with?
Brian Christian: There’s a whole separate literature on that. And then, in the case of selling a house, so this one I think is especially interesting and relates to a lot of real-world situations, let’s say you pay some sort of fixed cost every week that your house is on the market … You’re making mortgage payments or you’re paying utilities or whatever it might be, and your goal is not necessarily to sell the house for the highest possible price, if that means holding out for a really long time; your goal is to make the most money, and so I think that’s interestingly and meaningfully different. And so in this case you might well be willing to take something that you know is pretty good immediately, and get yourself, you know, that gain or whatever, that profit right off the bat.
Brian Christian: And so in cases like this there’s usually a very straightforward rule that you can apply in the situation, so if you pay, let’s say a fixed cost for every additional offer that you entertain, it should be quite easy to determine the probability that a new offer is better than the one you currently have. And then multiply that by the expected value of that better offer, if it is a better offer. So what’s the chance that it’s better? And if it’s better, what do I expect it to be? And you just set that equal to the cost, and then that tells you at what threshold you should accept an offer.
Brian Christian: And so there’s something just kind of beautifully straightforward about that, and it enables you to, I think, confidently just set a price going in, and then essentially ignore the offers that come in, uh, you don’t have to do anything, you just wait until that threshold is reached. And then you’re good. So, of course, this leads us into the direction of what about situations where you are adjusting your model of the distribution from which the offers are coming in real-time as you’re getting offers, and so, broadly the way that the literature is divided in optimal stopping, there are what are called “no information games.” So these are the ordinal scenarios like the secretary problem, where you don’t know the distribution from which they’re being drawn; you only know the ordinal information.
Brian Christian: There are so-called “full information” games, where you know that, let’s say you know your secretary’s typing test percentile or something, and you know this distribution and candidates are being sampled out of that distribution. The last category is what are called “partial information games,” where, let’s say, you have cardinal information, but you don’t know the distribution ahead of time, so you are building a model of the distribution based on the cardinal information that is coming in.
Brian Christian: Needless to say, the solutions are much less clean in the partial information version, but for better or worse it also appears to be the best model for most real-world situations.
Benjamin Todd: Do you have any rough sense of how that changes some of these rules-of-thumb? So, if we started out with thirty-seven percent, then invest, I mean, maybe there’s any of these complications that we’ve covered, so you mentioned like, suppose if you’re trying to minimize the what you were saying …
Brian Christian: The expected rank.
Benjamin Todd: Yeah. Which seems more realistic, because mostly the second or third option is not way worse than the first.
Brian Christian: Yeah.
Benjamin Todd: How would that change like how much you should explore?
Brian Christian: So the basic intuition with the version of the problem in which you want to minimize the expected rank is that, you basically have a series of thresholds where, initially, you’re in this purely exploratory phase, so you will not accept any candidate no matter what rank they are. After that point you should be willing to hire someone if they are the very best candidate you’ve seen so far. As you start to run out of time, you now switch to a regime in which you’re willing to accept someone if they are the best OR second best candidate you’ve seen so far.
Brian Christian: And then later, if they’re the best, second best, or third best. And, if you want we can post …
Benjamin Todd: So you drop your standards over time.
Brian Christian: Exactly. Yeah. So this is the familiar-if-not-encouraging advice of, you know, “so as you start to run out of options, lower your standards.” But the math tells you exactly when and by how much.
Benjamin Todd: Earlier we were talking about this model where we have twenty, maybe we have twenty draws in a sense, which you could–this is one way of thinking about it, you have a two-year period which is like trying a job, and then at that point you can either carry on with that, or you could switch to a different one. How might you attack that kind of model with these algorithms, or you think there’s a better way of setting it up?
Brian Christian: Um, that’s a good question. I mean, I think partly it depends on what kind of information we want to think about you as getting from this job. So, you know in the ordinal version of the problem, you are basically duty-bound to automatically reject the first offer that you see no matter what, or the first candidate, because you literally have no information about them; all you know is they are the first best candidate out of a pool of one, and so you, there’s really no realistic scenario in which you would ever stick with the first thing.
Brian Christian: So you right off the bat have a 1 over n chance of, you know, losing the best opportunity forever because it happened to be the first one out of n that you saw. In a cardinal scenario, you might realize that the very first candidate is exceptional, and even though you’re going to get n more draws from that distribution you’re still confident that you have, that you’re in the right tail and you don’t expect anything else to be better than that.
Benjamin Todd: So, yeah. In the cardinal case, just to recap that’s where, after trying the job we can roughly say “I think this is like a ninetieth-percentile job for me or a fiftieth-percentile compared to the other options that you’re considering,” and then in those approaches the best thing is to have a threshold, and if the job is above the threshold then you just stick with it, and if it’s below the threshold then you try something else. And the threshold kind of …
Benjamin Todd: And if it’s below the threshold then you try something else, and the threshold declines over time as your time horizon is used up, and so what you were saying is if you try your first job, and you’re like, “Whoa, this is actually really, good. I think maybe this is among the best options,” then you might just stick with it because it’s above your threshold.
Brian Christian: Mm-hmm (affirmative).
Benjamin Todd: And if it’s below the threshold, then you’d try something else.
Brian Christian: Yeah. That’s right, and so I think part of the message that I take away from the math at least, is that scenarios in which you have some kind of objective scale on which to place things, are just, you know, computationally kinder versions of the problem to find yourself in. Now, there’s of course the question of, is that how the mind works? If you’re in a job, do you have cardinal sense of how much you like the job? Maybe. Maybe you could cash that out as some function of your brain chemistry over time, or something like this. (laughs)
Robert Wiblin: Well, I think you do.
Benjamin Todd: Your life’s satisfaction in this job and … Or if we’re thinking about it from an impact point of view, it would be how much impact do you think you could have? Obviously, impact is easy to quantify but … (laughs)
Brian Christian: Exactly.
Robert Wiblin: So, my take on this chapter is that the first few models are basically not useful at all, cause they just deviate too much from real life but finally by the house one, where we’ve got cardinal information, and we’re basically setting our threshold, that’s actually starting to get a real one, something that we could use in real life.
Robert Wiblin: If I think about a real life case, it seems like one thing is that you have a pretty good prior, because you’ve learned a lot from other people’s experiences with careers, you know, like the typical level of satisfaction, and so you can roughly place yourself in the distribution of how happy everyone is with their career, and also as you’re saying you then update on how much we like careers in general as you go. So the Bayesianism is going to complicate this quite a bit, cause you’ve got a pretty decent prior of the distribution, and then you shift it around as you learn.
Robert Wiblin: You’ve got some ability to back track, but that’s limited, you’ve got pretty serious cost-to-delay, in your career.
Brian Christian: Mm-hmm (affirmative).
Robert Wiblin: Also, so I’m not sure that this will change it, but many possibilities, many candidates for a job, would be negative, so in fact you always have this reserve option of not hiring anyone, which gets you zero value, but you at least avoid a negative outcome and that’s, I think, something that’s not included in this model, typically, that you can just reject everything.
Brian Christian: Yeah, so I can mention that there’s an interesting variation of the classic secretary problem where your objective function is you get plus one for choosing the best candidate in the pool, minus one for choosing anyone else, and zero for making no choice at all. In this case, the optimal stopping threshold is one over the square root of e, which is approximately 61% and so this indicates that there is a major difference in how you approach something based on how you rate the downside of making a mistake.
Robert Wiblin: Right.
Brian Christian: Versus the experience of leaving that position unfilled or not buying a house at all or whatever it might be.
Robert Wiblin: Not selling, yeah.
Brian Christian: Not selling, exactly. And so, you know, I guess one analogy then you could make to the dating realm would be if you’re perfectly content to be single and you’re happier being single than married to someone that’s going to make you unhappy, then you should, as one might expect, be much more choosy, and so I think it’s, I mean that’s obviously intuitive, but the math bears that out.
Robert Wiblin: Yeah, okay, we’ll stick up a link to that formula, chase it down in one of the footnotes.
Brian Christian: Yup.
Robert Wiblin: Another interesting property is that other people can steal your experimentation, especially when you’re trialing out people on the job. So you get an imperfect signal of how good someone will be with their job application or their interview or their work trial and then if you hire someone and they turn out to be really good, then they now have evidence that they can use to show other people that they’re really good and then other people will poach them potentially at a higher salary. This actually is like a huge issue in economics. (laughs) I know in labor economics sets a systematic bias against trying new candidates, people who don’t have a job record, because the company that hires them doesn’t get to keep the evidence that they’ve produced, it becomes public information. Maybe set that aside as just an interesting example of, yeah …
Brian Christian: No, I mean there’s a lot of economics that are playing optimal stopping as a model for both sides of the hiring process, so employers are vetting different candidates for opening and their deciding, “Do we make this guy or girl an offer? Do we hold out and keep the position unfilled, but that person will probably get some other job?” It also models the employee’s side of the equation, where you’re fielding offers and you’re deciding, “Do I take this job at Salary X in Location Y or do I keep searching, but if I keep searching then I might not be able to, you know, keep that offer on the table.” And so, for example, there had been kind of a long-standing paradox in economics of how you could have unfilled vacancies and unemployed job seekers in the same market at the same time, and one of the things that kind of unlocked that riddle was the idea that you could, in fact, model both of these sides as simultaneously playing an optimal stopping game, with respect to the other.
Robert Wiblin: Yeah, there’s a search model of unemployment [crosstalk 01:43:53] for that.
Benjamin Todd: If we applied all these complications to, again, the kind of choosing a correct path decision, how do you think that would shift around the …
Robert Wiblin: Well, it seems to me like in this chapter, once you’ve got this example, it’s basically actually the same process as we were using previously with the threshold, the upper confidence bound. Because with the house case, you say, well, you’re trying to maximize the value of the sale and you have a cost-to-delay. You say, basically you like figure out a range of plausible bids that someone might make, say between $400,000 and $600,000 and then that’s like this, you’ve got the lower and the higher and then you know the cost of lay relative to that range and then that determines what your threshold will be in that term, and I guess if all of the parameters are constant for every turn until you actually sell.
Robert Wiblin: So, again, you basically have this threshold based on the premise that it’s going to be very similar to the upper threshold, I think in the previous example, except that that one you update more as you get more information, whereas in this one you just assume that it’s kind of a constant range, that you never-
Benjamin Todd: Wait, so once you’ve got the threshold, what determines whether you accept it or not? So if I make a $500,000 bid-
Robert Wiblin: Yeah, so there’s a formula where it depends on how unpleasant it is to turn down a bid and wait for the next one, so what’s the cost of delay? So you’ve got basically, you’ve got two different things. One is the magnitude of the range, which in this case, I guess in this case would be $200,000 and then you figure out how frequently the bids come in and then you figure out what is the cost of delay and to wait for another bid, so it’s $10,000.
Benjamin Todd: So if it’s like $10,000 a week and you got one bid a week or something like that …
Robert Wiblin: Yeah.
Brian Christian: Yeah, so I can give you some actual numbers if you want?
Robert Wiblin: Yeah.
Brian Christian: So, you know, we walked through this example in the book, of imagine that you’re selling a house and you’re expecting offers uniformly distributed from $400,000 to $500,000. So this is probably not realistic in the Bay Area. (laughs) So the cost of waiting is only $1 per offer, then you should set your threshold at $499,552.79.
Benjamin Todd: And that means if anyone offers you over that, you take it?
Brian Christian: Exactly.
Benjamin Todd: And other wise you keep waiting.
Brian Christian: Exactly. If waiting costs you $2,000 an offer, you should hold out for $480,000. If waiting costs you $10,000 an offer, it goes down to $455,000. And then lastly, if the cost of waiting exceeds 50% of the range over which offers are expected to come, then that means that you should literally take anything, because the cost of fielding an additional offer exceeds the expected value of doing so.
Benjamin Todd: Awesome. So I really like how these two, I guess similar cases have converged on basically a very similar heuristically, where you set a threshold and go about accepting anything above and not below. It’s also enticing that in both cases, you’ve slipped Bayesianism in the back door here, because in reality with the house you’ve got a probability distribution of like prices, so you could get something below $400,000 if you’re very unlucky or something above $600,000 if you’re very lucky.
Brian Christian: Yeah.
Benjamin Todd: But you can’t go and say the fifth percentile and the ninety-fifth percentile or ninetieth percentile and tenth percentile, it’s kind of just what we were talking about last time, like what threshold should you choose? I guess in this case it’s going to be a bit clearer, there probably is like an optimal threshold, at least for a normally distributed set of bids.
Brian Christian: Yeah.
Benjamin Todd: Whereas in the previous one, depending on how many turns there are left.
Robert Wiblin: How many levers did you pull, I think it was?
Benjamin Todd: Yeah, how many levers did you pull, yeah.
Brian Christian: Right, so in this case, you basically have the luxury, at least the way that we’re currently modeling this, right? So, you have the luxury of going into the situation with a complete knowledge of the distribution of the offers that are going to get drawn. In reality, you know, you have a partial sense of that, if your first several offers are disappointing you say, “Hmm, maybe the market’s cooling down,” or whatever and I think a Bayesian framework is probably the right way to think about how you should be doing that. There’s some work on how you should incorporate prior experience in other versions of the problem when you then face an additional version and I think the rule of thumb is something like consider all of your prior experience, basically tack that on to the beginning of the candidate pool and then imagine that you’ve already sort of been through that.
Benjamin Todd: Mm-hmm (affirmative).
Brian Christian: So, if you previously hired a position where there were ten candidates, you interviewed four of them and let’s say chose the fourth, then the next time you’re hiring, you can essentially prepend those first four and imagine yourself in a new version of the problem, where you’re starting on candidate five, as it were.
Benjamin Todd: Yeah, that’s nice.
Brian Christian: So there are some nice heuristics for thinking about that.
Benjamin Todd: Yeah, I was just thinking how we’d set up the model, so if we were going to use, does it make sense to maybe, we could try using life-satisfaction as a scale, though I guess that’s not a fully cardinal scale, but maybe with life satisfaction you kind of got a situation, you know that like most jobs are at a 7 out of 10 for life satisfaction, but a few are nines and a few are like fives, and now say you don’t know anything about the job before you take it and then you take and you find that it’s an 8 out of 10. And then the question is, should you now carry on with that or should you keep exploring?
Brian Christian: Yeah, I mean I think, you know, as with all these things it’s going to depend on exactly what are your assumptions going in, you know, are you assuming that you have a cardinal sense of your life satisfaction? Can you make use of reports of other people who have been in that profession? You know there’s some interesting psychology work to the effect that people are worse at predicting their future life satisfaction then they think, but other people’s reported life satisfaction are more relevant to you than you would think, yeah.
Benjamin Todd: Yeah, [crosstalk 01:49:30].
Brian Christian: And so that kind of fits into it.
Robert Wiblin: But then you get to just bypass this whole thing, cause you can just go into whichever one has the highest average satisfaction, except for there’s a massive selection effects and things.
Brian Christian: Yeah.
Robert Wiblin: You’re not similar to the people who were already in a job.
Brian Christian: You start to wander into some interesting territory depending on which of these kind of assumptions you tweak, you know, Rob, you were mentioning earlier this sort of game theoretic effects of if you interview someone then perhaps other people get to know how good that person is also. There’s some interesting work on sort of an ecological analysis of something like the multi-armed bandit problem, where everyone in a given society is basically playing this game of chicken where everyone wants someone else to do the exploration. Like it’s rational for any individual to be pursuing much more heavily exploitation based strategy as long as someone somewhere else is creating the information and part of what I find kind of charming and counterintuitive about this is that you realize people who are very exploratory by nature are performing a public service.
Brian Christian: Like we think about people who have this disposition of being very novelty seeking, always trying new stuff, there’s almost a stigma with that personality type where we think of it as sort of selfish or hedonistic or you know, they’re out for their own thrills or whatever, but in fact, they are kind of taking one for the team. They’re martyring themselves to be the explorer and if they find something that is no good, then they alone have paid the price, but if they find something great, then they’ve created this public externality of now everyone goes to that restaurant or whatever.
Robert Wiblin: As long as there’s not a down side tail. If there’s like lot’s of negative or risk, especially negative risk that other people bear. Then it’s bad.
Benjamin Todd: So like if it’s finding new technologies or something and there’s a chance you discover … That kind of thing, yeah.
Robert Wiblin: I think that might be part of where the stigma comes from, is that people like that often take risks that other people have to suffer from.
Brian Christian: That’s fair. Yeah, I was I guess implicitly imagining some kind of hunter-gatherer, where it’s just you go eat the weird mushroom (laughs) and the mushroom doesn’t kill everyone else. (laughs) Yeah.
Benjamin Todd: That’s a point we’re actually somewhat making in an upcoming article, which is if you … There’s thinking about doing good as an individual, but often what we’re doing is doing good as part of a community.
Brian Christian: Yeah.
Benjamin Todd: And in that you very much have that dynamic where it becomes more valuable to explore and because then other people can make use of that you find and get this externality of the community.
Brian Christian: Yeah, that’s great. One of my favorite interviews that we did in the book was the editor at Pitchfork whose name is Scott Plagenhoef and he was talking about just, you know, what is the effect of being a music critic? Like what is the lived experience of being a music critic? And he was saying you know, you think about a music critic’s life as you’re someone with very specific musical taste, you get to listen to music all day long, and he said in reality, it’s like hellacious because you’re specifically someone who has very high standards in music because you’ve heard, you know, all of the music and you know exactly what you like and what you want and yet you’re forced to wade through this just ceaseless stream of you know, mostly bad stuff. And so he was telling us how he would physically remove his favorite artists from his phone so he couldn’t break in his resolve, he’d keep listening to the new things.
Brian Christian: And so, I think there’s, again, that aspect that we don’t fully appreciate that there are people in society or in any organization or any tribe that are in a way exploring so that others may exploit. It is a form of martyrdom I think we should kind of recognize that.
Robert Wiblin: Yeah, I think that information economics especially information externalities might make my top 10 list of most important concepts.
Benjamin Todd: Yeah.
Robert Wiblin: And I think it’s a very underrated idea, the fact that the market for goods like the drinks that we’re having and the food that we eat and the furniture we have in our houses is very good, but the market for information is garbage, absolute garbage, cause it’s not privately controlled and we try with patents and copyright, but it doesn’t really work. Especially not with news and lots of kinds of evidence and also often in the case with politics, the people who you might expect would want the information or who would benefit from it existing in general, don’t necessarily want to consume it themselves because of all the impacts they have in terms of how they vote are also externalities. So you’ve got both the supply side and the demand side are seriously broken, which I think actually is like a fundamental reason that lots of things in society don’t work. Very often you see that there’s information economics, but basically there’s no reason to expect the free market to expect a good outcome here. And so while we’re very rich materially, we can be very poor informationally.
Brian Christian: That’s a succinct description (laughs) of society at the moment, it feels. There are some interesting papers, if people want, we can put links up, that have looked at these sort of evolutionary or ecological models of, “You’re a society in some environment, what percentage of your society should be these people with this very exploratory disposition? What percentage of your society should be these conservative, exploiters who are just doing the safe thing?” And you can find the sort of optimum levels of that, at a population level, rather than thinking at the level of an individual decision maker.
Benjamin Todd: That’s be really, yeah, that’d be really relevant to us. That’s very interesting.
Robert Wiblin: Let’s move on to randomness and simulated annealing, the third case. Maybe this is a good moment to just take a step back and distinguish what are the fundamental differences between these three models and where you’d want to apply each one.
Brian Christian: Yeah.
Robert Wiblin: So I guess the first one you have multiple different discrete options that you’re kind of choosing between and you get to do them again and again.
Brian Christian: And switch between them.
Robert Wiblin: And the second one the distinctive thing is you want to choose one best one at the end and you can’t necessarily return to previous options. This is one where it seems like you want to create kind of a structure or a combination of things by the end.
Brian Christian: Yeah.
Robert Wiblin: Rather than just choose one out of many options, it’s about how they are interlinked and especially when the search space is sort of like, the issue is the search space is so vast that you can’t try them all and instead you have to have some process for sorting through them all.
Brian Christian: Yeah, yeah.
Robert Wiblin: Searching through them. Yeah. I guess take it away?
Brian Christian: Yeah. So, I think one of the key differences in sort of moving into the idea of simulated annealing is that in a lot of the cases that we’ve looked at so far, the options that we’re considering are essentially independent. So, you know, if you pull the lever on machine three, that doesn’t really tell you anything about machine four, it doesn’t suggest anything about you should try something else in that neighborhood or something at the other side of the room or whatever. But there are a lot of cases in life where the space of possible options actually has meaningful, local information, and options that are similar in this decision space, has similar outcomes. And so you can model the space of possibilities as some kind of high dimensional space and so there’s this idea of what’s called ‘Error Landscape’ and you can move around on this error landscape in local ways.
Brian Christian: So, I mean, one example would be, I mean the classic optimization problem in this area that I have in mind is what’s called the traveling salesman problem, where you are trying to visit all these different cities, you want to put together some itinerary that visits them all and doesn’t go to the same city twice let’s say, and your goal is to minimize the total trip. So, you can think about moving through this parameter space as let’s say you switch the order in which you visit just two out of those cities. Well, it’s likely that that is going to have a similar trip length and so you can think about option as kind of contiguous in this way.
Brian Christian: So the simplest approach is what’s called ‘Hill Climbing’. So, in hill climbing, again, using this metaphor of the fitness landscape, the error landscape, you simply look around in the immediate environment and if some local modification is better, then you just do that. So, in the traveling salesman case, it would be you swap the order in which you visit two of the cities and now you visit them in reverse order if that is lower total mileage, then okay, great. You adopt that change and then you pick two other cities at random and you see if swapping them is better. And you just keep doing this and making these incremental improvements, these small, local changes that improve you slightly, until you eventually find yourself at a situation where so local permutation is better than what you currently have, and this is what’s called a local maximum or a local minimum, depending on if you want to think about yourself as wanting to try to go up or down and the problem that any optimization person faces is trying to figure out essentially how to get out of these local minima or local maxima.
Brian Christian: So, you’ve hill climbed your way to this particular trip itinerary that is better than any local modification, now what? And so the question of, “Now what?” leads to this whole family of different answers for dealing with making progress in these landscapes and many of them, if not most, involve strategic use of randomness. So for example, one idea is called shotgun hill climbing. So in shotgun hill climbing, every time you get to this local maximum, you just completely start over from a completely random point and then you start hill climbing from there. If you do this enough times, eventually you’ll find local maxima that are better and better, so that’s one extremely simple technique.
Brian Christian: There’s another idea, what’s called the Metropolis–Hastings algorithm, and that says when you try this local permutation, if it’s better than what you’re currently looking at, do it; if it’s worse, maybe still do it anyway. (laughs) And in particular the percentage, or likelihood that you adopt this thing, even though it’s worse, should be proportional to how much worse it is, but this is one strategy for trying to make progress even if you’re kind of locally stuck.
Brian Christian: There’s an even simpler idea called ‘Jitter’, which just says, “When you’re stuck, jiggle some things around at random and then go back to hill climbing.
Benjamin Todd: Mm-hmm (affirmative).
Brian Christian: So the Metropolis Algorithm says build a little bit of randomness into every move that you make and then the idea that’s kind of built on top of the Metropolis–Hastings Algorithm is what’s called simulated annealing. And so there’s a cool backstory to simulated annealing that goes to IBM in the 1980s, I want to say. IBM is doing this really complicated optimization where they’re trying to fit the circuits on their chips in the most compact and efficient way and it involves all these super complicated dependencies of, “If we move this thing over here, then we have to re-route this wire, etc., etc.” And it just so happens that there’s this guy at IBM who’s like the chip layout guru and for some reason, he comes up with better layouts than anyone else can and he won’t tell anyone else how he’s doing it.
Benjamin Todd: Does he know that’s-
Robert Wiblin: It’s ambiguous in the book, at to whether he’s hiding that particular knowledge or whether he just can’t describe it, it’s ineffable.
Brian Christian: I was not able to interview the guru, so I didn’t hear his version of that story, but I think that it is kind of a tantalizing question.
Robert Wiblin: Mm-hmm (affirmative).
Brian Christian: So a group of other researchers at IBM were sort of frustrated at the cult of personality that this person was kind of cultivating around himself and they thought, “Well, let’s try to approach this in a bit more of a repeatable and rigorous way.” And one of the people involved in this was Scott Kirkpatrick and at this point he thought of himself, not as a computer scientist, but as a physicist, and so he was really interested in how materials cooled. So if you heat something up and then it cools, sometimes it turns into a crystal, sometimes it turns into glass with no like crystalline structure and often this has to do with just the speed at which you cool it. So if you cool it really, really slowly, you’re going to get a crystal. If you cool it really quickly you’re going to get a glass. And so this got him thinking about this analogy of temperature in real space is kind of random Brownian motion. And this is kind of analogous to this kind of random motion in hill climbing situation that the Metropolis-Hastings Algorithm embodies.
Brian Christian: So, what if we attempted to use this annealing metaphor, so annealing is the process of cooling this material slowly to create these ordered structures. What if we started with this randomness dial cranked all the way to eleven and then slowly, slowly decreased it until eventually by the end, were just purely hill climbing? It turns out, that in fact, this works exceptionally well and it’s still to this day one of the best practices in optimization. And at first of course, their community is very skeptical of this very analogy based idea of just, “Okay, you’ve found this kind of parable of this is kind of like cooling a piece of glass, that’s cute.” But it was producing better designs than the guru was coming up with and then of course the scholarship caught up and found out that this is in fact a really good way to approach problems that are this complicated.
Benjamin Todd: So to kind of recap the solution, you’re dealing with a problem, you’ve got a bunch of parameters and you’re trying to find which parameters maximize the outcome you’re going for and you do that in a bunch of steps, so one step you take one of the steps and you try jiggling it one way or the other, adjusting it one way or the other and then when that happens, either it can be better than before, in which case you always go for that, or it can be worse, in which case you sometimes go for it and sometimes don’t and that’s you kind of have this randomness parameter. And then basically the chance that you go with the worst option anyway, just to explore it, goes down over time and that the annealing part of it.
Brian Christian: Mm-hmm (affirmative).
Benjamin Todd: Which then just goes back to the concept we covered right at the start, where generally you want to explore more early to do more random stuff early and focus more and more on what you think is already best over the time. And then maybe this can apply to certain aspects of career decisions because sometimes what you’re doing with a career looks a bit like a kind of design problem, where say you’re in a job and there’s various ways you can have adjust how that is, so you can like work at home or you could like work in the office, or you could work more in sales or you could work more in product design, and you don’t know which of these things is going to be best for you and what this process might kind of suggest is that early on you should, you know, you should keep experimenting, trying different ways of doing all these things and you know, you find a thing that’s better, you can move into that and that’s your hill climbing. And then occasionally, you’re going to like you’re going to do something that you think is probably worse, but you’re just going to do it anyway-
Brian Christian: Right.
Benjamin Todd: Because maybe it’s going to turn out to be better.
Brian Christian: Yeah.
Benjamin Todd: And that might unlock a whole new kind of area of things for you that you might have just missed otherwise if you hadn’t had that randomness. But then you’re going to commit more and more to the things you think are best over time.
Brian Christian: Yeah, I think one key idea here that differentiates it, say from something like upper confidence bound in a bandit context, it’s not necessarily that you think this new thing will turn out to be better. Usually the way these problems are constructed is you just literally know it’s worse, you know, the chip design uses more silicon or uses more copper or whatever. It’s that this might be sort of a segue or like an isthmus to a different part of a space. And so it might be that, for example, just taking a chip design that looks really good and then moving the power supply to the other corner, leaving everything else the same, is super bad, but once you then start optimizing around that, you end up in a different part of the space and you end up with something that’s better overall.
Robert Wiblin: It seems to apply this to real life, you’d have to think about different properties of your life where there’s many of them that can be flipped in different ways. You’ve got maybe the profession that you’re in, the company that you work at, the city that you’re in, the friends that you have, where you live, okay? Whether you work from home or not. And you can’t, it’s not possible in your lifetime to explore all of the possible combinations there. I guess you can imagine if you’re … You could, plausibly, even be in a very good situation, a very good combination and be like, “No, I’m going to mix it up and change all of them,” and then start from there and climb-
Robert Wiblin: [crosstalk 02:05:55] That’s probably unrealistic in someone’s life, cause-
Benjamin Todd: Well, sometimes you get someone who’s kind of like, “Wow, I’ve kind of optimized what I can in this company and I feel like this is as good as it’s going to get in this organization, so I just need to try a totally different job or company or …”
Robert Wiblin: I suppose it might be more realistic if you find a combination that you’re in seems really bad, that you just want to throw it out and then like switch all of them to a different combinations and just start from there again.
Brian Christian: Yeah.
Robert Wiblin: Or especially if you can kind of foresee that as you kind of tinker with these things that it’s kind of going to never going to be great.
Benjamin Todd: Yeah, I mean, I guess maybe the sort of way that process could come out, this is like a super simple example, but suppose you’re like working remotely now, doing web engineering and then you like, say, “I’m going to try out working as marketing.” And you know that working remotely as a marketer is strictly worse than your current job. But then if you go and become a marketer and you change some other things, such as working not remotely anymore-
Brian Christian: Right.
Benjamin Todd: Then everything turns out to be better than where you were before.
Brian Christian: Mm-hmm (affirmative).
Benjamin Todd: And being kind of willing to just like try on being a marketer for a while, that you uncover that new combination that you never would have uncovered if you were not willing to first take a step that was unpleasant.
Brian Christian: Yeah, I mean, certain aspects of life lend themselves to different degrees to this idea of like being parametrized, but I think it is true that often their dependencies between certain aspects of life. So, like if my job is in San Francisco and I live in San Francisco, those two thing are optimized with respect to one another and if I take a job in New york, but I still live in San Francisco-
Benjamin Todd: Yeah.
Brian Christian: That’s going to be some huge dis-utility there. I’m going to be on an airplane all the time. And so I guess there is a sense in which the different aspects of our life have a kind of continuity or a kind of synchrony between the different parts that makes it resemble one of these kind of high-dimensional landscapes and so yeah, it’s interesting to think about what are the local modifications that could segue you to a different part of the space? You know, you transfer to a different company that’s just across the street, but then they open a new office in Bangladesh and then suddenly now that takes your life in a totally different direction, but you never would have just applied for a job in Bangladesh. So there are senses in which we can move through this space of possibilities in a more or less contiguous way, so you can sort of think of it in these terms.
Robert Wiblin: Yeah, another this is, this model only applies if the combination has special properties. If you just add up the value of each one separately then you just optimize on each one individually it’s only if there’s like some chaotic function of the combination that you need to do this random hill climbing thing.
Brian Christian: And my favorite analogy for this is jimmying a lock, right? So jimmying a lock works by optimizing each pin on its own.
Robert Wiblin: Yeah.
Brian Christian: You just figure out where to put that one pin and then you move to the next pin and you figure out where to put that, so that’s kind of the difference between the full combinatorial complexity of needing to adjust all of these things at once, versus treating them as totally independent and of course you just decompose the problem.
Benjamin Todd: Yeah, there was just another issue with these exploration puzzles is how we set the time horizon may not be as intuitive as it first seems, because I think you actually need to take into account discounting, so we were considering a case where you’ve got a 40-year career and you’ve got like 20 steps or 20 jobs you can try. But actually I think you should discount that time, so you should care less about the steps that are far into the future than the immediate ones, and I mean this, maybe is less true in your personal life, but I think it’s quite true in social impacts, cause many of the world’s problems are urgent and there’s an issue of how urgent and it’s a whole giant article we have.
Benjamin Todd: But generally, we think there’s generally reason to contribute earlier rather than later. And you can sometimes model that with a discount rate. And so that actually might mean that you exploration budget is quite a bit lower than it first would naively seem if we were just going to go back to a kind of very simple, you should explore for X of the jobs and exploit after that. Actually I think you should say, I don’t remember the specific figures, but if you have like forty years ahead of you and then your discounts get 5% a year, then I think the kind of like discounted length of the career would only be like 20 years or something.
Brian Christian: Yeah.
Benjamin Todd: And so that would actually roughly half the expiration period.
Brian Christian: I think that’s very interesting. One of the things that I think is true in human life that is absent from almost all of these problems is that your utility function is on some kind of a drift, where when you’re 50, you’re just going to care about different stuff. Like when you’re 10, you were a different per-
Benjamin Todd: Different person than I am now. (laughs)
Brian Christian: Yeah, exactly, exactly.
Benjamin Todd: That’s a good personal reason to discount a bunch as well, yeah.
Brian Christian: And so it would be interesting to think about how you try to optimize a problem where literally your objective function is changing as you’re moving through the problem.
Benjamin Todd: Yeah.
Robert Wiblin: It’s like the information you’re getting and the value of the combination that you found is degrading over time or what’s the term? It’s like going stale?
Brian Christian: Yeah.
Robert Wiblin: But the world’s changing and you’re changing, so yeah, that could potentially have quite a high discount rate or like spoilage rate.
Benjamin Todd: Yeah, yeah, cause like our initial thought was that generally people need to explore a bit more than they typically do, but then actually thinking about say, you might way to have a reasonably high discount rate from a social impact point of view, that might actually push you much back towards just like do the things that is your best guess. And actually we were also pointing out here, the fact that this is now just going back to a personal point of view, if you’re priorities are changing, and the world’s changing a lot, then again, information becomes less valuable and you should again just do the thing that seems best or right.
Brian Christian: Kind of locally.
Benjamin Todd: Yeah. And I’m sure how that all shakes up in the end.
Brian Christian: I think also-
Robert Wiblin: We’ll spend the next ten years figuring that out. (laughs)
Benjamin Todd: Yeah and the answer will be wrong by then anyway.
Brian Christian: Exactly, but you know, certain things give you option value, regardless of what your objectives are going to be, so putting yourself in a position where you have kind of a … Regardless of what your 40-year-old self is actually going to be motivated by, having more money is probably going to be useful regardless of what those goals are, knowing more people is probably going to be useful regardless of what those goals are.
Benjamin Todd: In general, when you’re dealing with super uncertain decisions, there’s a few strategies you can take. One is to just gain more information and then figure it out. But then the other one is to kind of somehow keep your options open such that it doesn’t matter what happens, you’re going to be fine anyway?
Brian Christian: Yeah, or aim for the middle of this objective function space, where you don’t know exactly what kind of 40-year-old you’re going to be, but certain things are predictable and other things may be useful across a wide range of goals that you assume is going to happen.
Benjamin Todd: Exactly, yeah, so, we talk about that being like flexible career capital?
Brian Christian: Yeah.
Benjamin Todd: Where the career capital is the stuff that puts you in a better position in the future, and then you can either have narrow career capital, that’s like I’m only useful for one path, or you can have flexible that’s useful to lots of paths. The more uncertain you are about what’s going to happen, the more you should care about flexible rather than narrow.
Brian Christian: Yeah, I think that’s right.
Robert Wiblin: So another thing you talked about in the randomness chapter is that there’s a lot of problems which are intractable computationally. You can’t figure out an analytic solution where you have just a formula that spits out a number, but often you can figure out the answer just by sampling randomly from the space of possible cases and then see what’s the distribution of answers that it has to those specific ones.
Brian Christian: Yeah.
Robert Wiblin: And interestingly I think this applies here because we’ve got this question of, do people tend to explore too much or too little. Which I think, given all of the complications that we’ve discussed, the discount rate, the changing situation, it’s actually computationally intractable, but what we really need is an experiment where we get people to explore more or less and then see if their lives get better or worse.
Robert Wiblin: A really nice case of this, which I’ll stick up a link to is the two economists from the Freakonomics team. They did this experiment where they got people who were really on the fence about whether to quit doing something or continue doing it, so, it’s like persevere or give up and they then … they get them to describe their situation on this website and get them to really think about whether they were on the fence and then if they were still on the fence, they would flip a coin for them and tell them to either quit or persevere and then they followed up six months or a year later to see whether they were happy with their decision.
Robert Wiblin: This is a great thing. It would really, I think be impossible any other way to find out whether people quit too much or too little, but because they found who were really uncertain, and we might find ourselves in a similar uncertain situation. Now we can see which way we’re biased, all things considered, with the world as it is.
Robert Wiblin: Turns out, people should quit more. People who were told to quit did quit the thing more often and they were happy. And that was actually, I think, what they expected. So perhaps unfortunate, maybe they’ve rigged the experiment in some way to get the answer that they wanted, but they thought people would just find it very hard to give up on something they’ve tried before. They continued throwing good money after bad or good time after bad.
Benjamin Todd: That’s just classic sunk cost fallacy or status quo bias.
Robert Wiblin: Yeah, right.
Benjamin Todd: These kinds of things.
Robert Wiblin: And so it’s saying, yeah, I guess, if listeners if you are on the fence without quitting something, maybe you should give it up today.
Brian Christian: One thing that, in my mind at least, is connected to this is, there are times where you think you are on the fence about something but you’re really not, you’re just telling yourself that you are. I remember a time in my life when I was deciding whether or not to move to Bay Area. I thought that I was on the fence. But I noticed that I only wanted to go to my favorite restaurants and I only wanted to hang out with my close friends, I didn’t want to go to parties and meet new people, I didn’t want to go try the new bar that opened up. I was able to identify: I am exploiting. I’m acting like someone who is at the end of their horizon in that place in their life. So, even so I’m telling myself this narrative of being on the fence, I’m not acting like someone who’s on the fence, I’m acting like someone who is moving to a new chapter in my life. That was part of what helped me get over that hill and make that decision.
Robert Wiblin: That’s like classic decision making procedure that say, “Oh, yeah. You’re really on the fence? Okay, well, you should just let a coin decide then.” So, flip the coin then and then see how you feel when the coin is in the air. Would really want to follow the decision that it’s going to produce? I think often, no. You know which way you want it to fall.
Brian Christian: Yeah. This idea though that there are certain things that are best established by just trying them. I think there’s a very deep idea and it’s a big part of computer science. So there’s this whole set of techniques called Monte Carlo methods, which are essentially that. It’s just you try something a bunch of times and you estimate the outcome based on how those samples go.
Robert Wiblin: So, the main reason that I think economists think you can’t really have a centrally planned economy, is that you need much more experimentation than what a central government can provide. You need everyone to be exploring some of their local space to help get the entire thing into a good combination.
Benjamin Todd: And that’s the theme and our advice as well because basically there’s not many predictors of job performance or job satisfaction or they are not very accurate. The things that are most accurate often amount to actually trying the work, so. Like work samples are much more accurate than an interview, for instance.
Brian Christian: Right. And again, I think this is one of these things where we have this idea of rationality that is a little bit of a character in we think, “Oh, well, if I were being more rational, I would think this whole thing out and analytically derive the distributions from which these things are being drawn.” I think there is a reassuring idea that comes out of computer science that says. “No, just trying something is valid.”
Brian Christian: So, the idea of Monte Carlo simulation goes back to the scientist Stanislaw Ulam, who was convalescing in a hotel, playing solitaire, and being a mathematician, started wondering, “Well, I wonder, what percentage of the 52 factorial starting positions in solitaire are solvable?” Some of them are known not to be. Thinking about how he would analytically determine this, and he settled on the idea that he would just play the game. Just figure out what percentage of the time he won. I think it’s in a way a real validation that someone as smart as him kind of-
Robert Wiblin: Accepted that some things were incalculable.
Brian Christian: Yeah, exactly. And that there really is some legitimacy to the idea of just sampling. Just trying it.
Robert Wiblin: All right. I think Ben, you’ve got to head out, I guess that kind of wraps up the exploration-exploitation discussion…
Benjamin Todd: Yeah, thanks so much. I found it super fascinating.
Brian Christian: My pleasure, yeah.
Robert Wiblin: So, one of the most important cases where this exploration question comes in is deciding what problem to solve in the world. Individuals have to make a decision on what are they going to specialize on global catastrophic risks or poverty or something else. And I guess, charitable foundations also have to think about how they are going to allocate, how are they going to distribute the money that they have in their endowment between different problems that they are interested in fixing. What do you think this kind of research has to say about those cases.
Brian Christian: Yeah, I think that this idea of budgetary allocation connects to some of these ideas from the explore/exploit tradeoff. This is in fact the initial motivation of John Gittins was to help the Unilever corporation figure out how to allocate their budget. And I think there is a really simple idea that’s quite powerful in this context, which is called Thompson sampling. This is the idea that you should figure out your subjective probability that you think something is the best thing to do, presumably through Bayes’ rule. And then you should just allocate exactly that percentage of your resources to it.
Brian Christian: So, if you think something is 12% likely to be the best option, then you should spend 12% of your time or 12% of your money on it. And it is this very intuitive idea, but it’s got a lot of powerful mathematical support in terms of, it’s a regret minimizing strategy in a multi-armed bandit problem and has a lot of nice properties as well as being very intuitive.
Robert Wiblin: Interesting. So it’s optimal under some particular assumptions that aren’t too crazy?
Brian Christian: Yeah, right.
Robert Wiblin: Okay. I guess at a personal level though, because the gains of specialization are so strong, it seems like that gives you more pressure to just go all in on your top bet. Although with the foundation where you have many different staff and pros of the different program areas, it seems like you can divide the funding more [crosstalk 02:19:54]
Brian Christian: Yeah, I think that’s right. Yeah, it might be the case for an individual person that spending 5% of your time doing brain surgery is probably far worse than spending none of your time doing it.
Robert Wiblin: Are there any kind of decisions that you think people should randomize more in life, we think they are systematically against trying new stuff?
Brian Christian: This is a good question. I do think that there are a lot of reasons to embrace serendipity and I think that’s a version of allowing randomness into your life. I think increasingly, the market is moving society, or society is moving the market, or both, in this direction towards everything being a choice, and there’s something a little bit dangerous about that. I think when things are not a choice to begin with, then you get your randomness for free. A certain song just happens to be on the radio, or you happen to bump into someone in some common area, and they start talking about something.
Robert Wiblin: But with Spotify there is less randomness, it’s what’s chosen for you after, once it’s learned your taste.
Brian Christian: Right, which can be quite dangerous, actually.
Robert Wiblin: Because you’ll never find a new genre unless you go out of your way to do it.
Brian Christian: Right, and if you do it requires some sort of concerted process of choice of actually overriding whatever the default thing is and then picking something, which you are not really qualified to do. So, I think it’s worth remembering that noise has a role, randomness has a role, serendipity has a role and I think, increasingly we are pushing those things out of our lives. We do that maybe at our peril. I think we are starting to appreciate the value that those things have.
Robert Wiblin: In the book you make the argument that the world in a sense is more static now than it used to be. You say hunter-gatherers, they’ve got different kinds of food that they would eat all the time and it would be different day to day or week to week, whereas a Coke is a Coke is a Coke so industrial standardization tends to make the world less variable. On the other hand, it seems that we have access to way more potential products now than we used to. So, one of those cuts against exploration, homogeneity cuts against exploration, on the other hand, greater diversity of options works towards more exploration. Do you have a sense of where those might balance out?
Brian Christian: I don’t have a great instinct about where that nets out. I mean, I can think of certain algorithms like multi-armed bandit context, there’s this one that we haven’t talked about yet called the least failures rule, which says you should always do the thing that has failed the fewest number of times, and this is asymptotically optimal, I think, if your discounting rate is nearly 1. Meaning, if you care about the far, far distant future, then this algorithm is reasonable. Partly what that means in a city where there is more restaurants than you could probably go to in your lifetime, or the new ones that are popping up faster than you could eat at them is-
Robert Wiblin: Trying each thing once, and if you don’t like it then go back?
Brian Christian: As soon as something let’s you down, never go back. And I think that is reasonable.
Robert Wiblin: It’s kind of the Manhattan lifestyle.
Brian Christian: Indeed.
Robert Wiblin: With dating, people talk about, people who are willing to do that now, that there’s so many options available potentially on these dating sites, that as soon as someone is mediocre in any respect, then there’s this temptation to dump them.
Brian Christian: Yeah.
Robert Wiblin: There’s kind of ill considerations there.
Brian Christian: Yeah, it’s this anxiety provoking thing where if you are interacting with someone in an asynchronous medium, literally everything you say could terminate the interaction. So, I think this is part of why it so nerve-racking for people using these sites.
Robert Wiblin: There was two, really, things you had in the exploration-exploitation chat that I just wanted to mention. One is that it seems that people seem to get happier as they seem to grow older, at least after 50 or 60, I think. Which would be potentially explained by this exploitation-exploration trade-off. Because when you are young, you are doing all of this exploration, but that comes at the cost of present satisfaction because you are not always going to choose the option that you like the most now, in the hope of a pay-off in the future. And maybe it is actually paying off the people in the future that as they get older, then they get to rake the benefits of the exploration they did when they were young.
Brian Christian: I think that’s exactly right. I think that’s a really powerful idea. One of the people who has done a lot of really great research in this area is Laura Carstensen, who’s at Stanford. I she is part of this movement to recharacterize what it means to be an older adult in a society that … we have a lot of negative preconceptions about getting old. We think of people as set in their ways, resistant to change.
Brian Christian: There is research showing that older adults maintain fewer social connections as they go through life, and it can be tempting to read that as, “Oh, that’s kind of lonely.” In fact, there is this powerful way of reframing this, which is that, older adults are in the exploit phase of their life, they are very deliberately pruning their social interactions to the people that really matter, the people that bring them the greatest satisfaction. They are doing what they know and love and in fact are essentially cashing in on a lifetime’s exploration. Their past selves have eaten all those mediocre meals in order to discover those incredible places that their later selves get to really enjoy.
Brian Christian: So, you should expect from this, naturally, that in fact, contrary to stereotypes, old adults would be consistently happier than young people, and that’s exactly what she finds.
Robert Wiblin: Yeah. Interestingly, I think, there is this middle age slump that the world as it is now, that people are quite happy when they are 20 and I think it goes down to a minimum at around 40, and then goes back up. I’m not sure how you explain the first thing, perhaps families and children and just, I don’t know, other issues-
Brian Christian: Child rearing maybe a big part of that. Yeah, that’s a good question.
Robert Wiblin: Which is kind of another investment in potentially being happier when you are older. I’ll just give you two, not knowing you’ll add to the list and said, two observations and see if they can try to explain the combination of them based on what we’ve talked about so far. One is, Hollywood makes a lot more sequels than it used to, the other one is, revenues from films are going down. So, I’ll give people just a minute to think about those two and think why would those two things happen. So, the obvious explanation that a lot of people I’ve heard point to is that people don’t like sequels, and that’s why the revenues in the film industry are going down. So, it’s going from sequels to the movie industry being in decline.
Robert Wiblin: But in fact, what we’ve discussed all through this episode gives us a very good reason to think to think that the causation goes the other way. That what’s happening is, because the film industry is in decline, perhaps because of competition from television or piracy or whatever els, this gives the film industry less reason to produce the blockbuster franchises of the future. So, imagine you could produce the Matrix for, or something like that, a reasonably short bet, something’s that’s definitely going to have a market. Or you could do the exploration of trying to create a new franchise, but probably failing because most franchises in fact fail.
Robert Wiblin: The ladder is an investment in future returns on sequels that you make on this new potential franchise. But if the industry is in decline, you don’t care about the long-term future as an industry that much anymore, so why take the risk now? Why pay now for a potential pay-off in what is a shrinking pool of film goers. I just thought that was beautiful.
Brian Christian: Yeah, thanks. Yeah, that’s exactly right. I mean, the insight here is that in a finite horizon version of the multi-armed bandit problem, if the horizon determines your strategy, then someone observing your strategy should be able to infer the horizon. And I think that’s exactly what’s going on in the film industry, where they are behaving like they are in exploitation mode, milking those cash cows while they can, and not investing in the franchises that the next generation will enjoy.
Brian Christian: So, that is a rational response to being at the end of your …
Robert Wiblin: Lifecycle.
Brian Christian: … time, yeah.
Robert Wiblin: So if you are sick of the 17th Marvel superhero film, then I guess go to the cinema more and maybe they’ll make something different. Or like, tell them you’ll go to the cinema in the future, and they’ll make something different. Just coming back to the question of whether people explore or exploit too much, and which way they are biased, it seems to me like as a society, we can’t be too far off the optimum. Because if it was the case that people who explored much more were way more successful, people would notice that and copy. And on the other hand, the people who exploited a lot more, explored less were really more successful, then people would just notice their lifestyle and copy it.
Robert Wiblin: I guess that depends on people being able to see how successful other people are and how much they explored versus exploited in life as a whole. But, and then this isn’t anywhere saying that we are going to be actually optimal. Merely that if what we agree is just wrong, that we might notice as a society. What do you think of that argument?
Brian Christian: That’s interesting. I mean, I think, partly there is a two-level optimization happening, where each person is trying to have the best life that they can or get the most jackpots in life’s casino, whatever that might mean for them. But you also have this societal, ecological evolutionary thing happening. Where societies in which the ratio of exploration and exploitation are tuned appropriately for that environment, are going to out-compete or out-perform other societies that are of a different makeup.
Brian Christian: So, it may be the case that an individual person is shackled to whatever genetic inheritance they have of being a very risk-averse person, or being a very risk-seeking person. That may limit their ability to adapt in a given environment, but they are also part of this broader framework. I don’t have a clear sense of whether we are to draw from some of this, that we are basically on the Pareto frontier of how good things can be. It doesn’t subjectively feel that way, but I don’t know. I have to think more about that and see where that logic goes.
Robert Wiblin: Okay. I just want to do a bit of a section here. The book just had so many amazing little stories in it that really stuck with me, and I wanted to go through some of them just for the enjoyment of the audience. This isn’t particularly going anywhere, unfortunately, but you can exploit this section as you like.
Robert Wiblin: So, what’s thrashing? I found this to be fascinating. I’ve just never heard of this concept.
Brian Christian: Oh, yeah. Right. Thrashing is an idea that comes up in the context of computer scheduling and in particular it emerged during, I want to say the 1960s, as they were developing multi-user machines. So, this idea that you have different processes or different users on a given piece of hardware, essentially competing for the resources of that hardware. As you switch from a particular user or a particular process to another, you have to reallocate the machines resources. So, effectively refill the RAM with data relevant to the next task.
Brian Christian: There was this problem that people began noticing on these machines, which is that as you add more and more users or more and more tasks to the workload of a machine, initially it seems like everything is fine, you can accommodate two or three processes in parallel and everyone’s basically getting what they need, but beyond a certain point, it wasn’t as if the performance degraded linearly, it was that you would hit some threshold and then just spectacularly collapse the performance of the system.
Brian Christian: So, this was really interesting to people at the time. No one really knew what was going on. It was a researcher named Peter Denning who was able to diagnose this problem, and he coins this term, thrashing. The basic idea in thrashing is that, imagine your system is dividing its time in just a round-robin fashion between five different tasks, let’s say. At the beginning of that slice of time, it begins filling its RAM with data relevant to this new task. It’s essentially setting up its work space, if you want to think about it that way.
Brian Christian: At the end of that task, it does a context switch out of that work space, and saves its work, bookmarks its progress, and moves on. Well, if you get to a point where the time slice that you’ve allocated to that particular process is just long enough to do that bookkeeping, then you can get into a position where the computer is doing, basically, 0% actual work. It is just context switching into the next task, and then immediately context switching out. And so, this is one of the most frequent culprits for the spinning beach ball of doom, that Mac users will be familiar with, or the Windows equivalent.
Brian Christian: That, suddenly, it’s like you have 67 browser tabs open, everything is fine. You open a 68th browser tab and then your whole system appears to lock up, and it takes 60 seconds to even close them back down again. That is usually a tell-tale sign of thrashing. Part of why I got so interested in thrashing was that it just felt like I could recognize some of my own worst moments. I thought it was a reasonable diagnosis of certain periods of psychic paralysis that I think most of us, certainly myself, encounter.
Brian Christian: Where, if you have so much to do, that you’re spending all of your time reminding yourself what to do, or prioritizing what to do, or beginning something but then being reminded of something else, and then starting that. You could find yourself in a position where you’re effectively spending your entire time budget on this kind of meta-work.
Robert Wiblin: Just switching attention, and trying to figure out what you should be paying attention to?
Brian Christian: Yeah.
Robert Wiblin: And then never actually doing anything having done that?
Brian Christian: Yeah, exactly.
Robert Wiblin: I think, one solution to this is just, do something from the list and let the other things hang, and don’t think any more about prioritization. Just do it, and then move on to the second thing. Stop doing this meta work, it’s not actually helping you. Then, once you’ve cleared some things off the list, then maybe you’ll be able to return to doing some prioritization.
Brian Christian: Yeah, exactly. I mean, so, the catch phrase that we use in the book is, “work dumber.” But this is an idea that has a lot of grounding in actual computer science practice. I mean, there’s two ideas here that I think support this. One is that, if you have a series of tasks, each of which is going to take a certain amount of time, you only have one machine to do them on, and you want to optimize for what’s called the makespan, which is the total amount of time it will take you to do everything, well, it just so happens that the order doesn’t matter at all.
Brian Christian: You simply have a certain fine item out of work, a certain amount of time. And so, if you find yourself in a position where you’re optimizing for the makespan, so, your goal is to reduce the total amount of time you spend working, and you can’t delegate, it’s just you going to do the work, and it’s all more or less equally important, then the worst thing you can do is spend any time thinking about the prioritization. You should just begin randomly.
Robert Wiblin: Total waste.
Brian Christian: Yeah, it’s a total waste of your energy. There’s an anecdote in the book that we tell about the Linux operating system. Every operating system has what’s called a scheduler, which performs exactly this function for the CPU, of how many microseconds to be working on this particular thread, when to switch, what to switch to. How to stack rank the different priorities that the system has, and how much time to give each of them. In a sense, you can think of this meta process, of doing the sorting and the prioritization is directly competing against-
Robert Wiblin: Doing it.
Brian Christian: … doing the work. And so, this is one of these cases where it turns out that the best solution might be to be more imprecise. We follow the evolution of the Linux kernel through the 2000s. I want to say it was 2003, they replaced the scheduler with one that was less accurate about prioritizing the different tasks on the system, but more than made up for it by just spending all of that time doing more stuff. I found that a very consoling message.
Robert Wiblin: Okay, let’s move on. The Copernican principle, and Laplace’s Law. These, I both, I heard before, but I think you did a really good exposition of them. So, just briefly describe them?
Brian Christian: Yeah. I mean, the Copernican principle is this idea that if you are trying to estimate the duration something … So, the example of how this actually came about was, there was a guy who found himself at the Berlin Wall and he was just musing about, “I wonder how long the Berlin Wall is going to be standing.” I don’t remember, exactly, how long it had been up, at that point, let’s say it was 11 years, or something. He starts thinking about it, and he says, “Well, on average, I should expect that I’ve shown up smack in the middle of the duration of this phenomenon, and so, I should just double the amount of time, and assume it’s going to last for 11 more years.”
Brian Christian: It’s this very intuitive idea, but it turns out that this has full mathematical legitimacy of, this is the appropriate Bayesian prediction that you should make if you have what is called an uninformative prior, where, not only do you not know how long something is going to last, but you don’t even know the scale. It could be equally likely to last for milliseconds as millennia, then you have one of these, what are called scale-fee priors. If you crunch the numbers, you get exactly this prediction, that if you just happen upon something, you should assume that it’s going to last exactly as long as it’s lasted already. Which I think is a really, really nice rule of thumb, and it’s nicely validated by the math.
Robert Wiblin: I think, I mean, it does actually work, because I’ve got predictions. You’ve got this question of trying to predict when the bus is going to arrive. If you know how long it’s been since the same bus on that line last left that station, then you say, “Double that.” It’s, on average, how long it would take for the next one to arrive. The United States has been around for 230, 240 years. So, on average, it’s got 240 years to go.
Robert Wiblin: There’s another case of the Doomsday argument, which says, “You’re as likely to be in the first half of all of the humans that were born.” The second half, so, on average, you’d expect there to be … On average, there should be as many humans yet to be born, as they have been born so far, which … If you have exponential growth in population, suggests that our doom is coming very soon. So, debate about whether this argument actually goes through. It’s interesting there, that there you’re sampling from humans who existed, rather than time. If you think about in which cases you should use time, and which cases you should use instances of the thing that you were describing about.
Brian Christian: Yeah, yeah, that’s an interesting point.
Robert Wiblin: And then, Laplace’s Law?
Brian Christian: Yeah, so, it was originally considered in the context of one of these baroque lotteries, where you’re drawing lots out of the hat. You’re trying to estimate what percentage of these colored slips of paper, or whatever, are one color, versus another, based on a certain number of samples that you’ve drawn, so far. It turns that the best rule of thumb is, let’s say you’re calculating what percentage of the slips of paper you’ve drawn are green, out of the total. The answer is just the number of green ones that you’ve drawn, plus one, over the total number of samples, plus two.
Brian Christian: This is just another one of these wonderfully elegant results that comes out of thinking about this from a Bayesian perspective, if you have … I think this requires you to have a uniform prior, so, may not always apply in every single situation. But it gives you just a really elegant rule of thumb, that I think is applicable in a lot of cases.
Robert Wiblin: I imagine that you were going skydiving, and you’re the first person to ever go skydiving. You go skydiving once, and you survive. Which is like, in what fraction … What should be your likelihood of dying on the second run. I guess, in this case, you would say it’s one out of two, at most. Then, if you go again, then you still have again, the next, one out of three. Then, someone, one out of four, and on, and on, and on. There’s this medical paper that I’ll find and stick up a link to, which is, if nothing has ever happened, will everything be okay? Which basically uses Laplace’s rule, to figure out how safe something is, when you haven’t seen a catastrophe so far. So, yeah, it’s quite neat.
Brian Christian: There’s an optimal stopping problem called the burglar problem that asks you, basically, how many heists should you go on if you have a certain fixed probability of getting arrested and having all of your assets seized? It depends on your probability of succeeding. You can do this really lovely proof using Laplace’s Law, that says that, if you are willing to burgle at all and you succeed, then, if you use Laplace’s Law-
Robert Wiblin: You’ll never stop.
Brian Christian: … you’ll be more inclined to burgle in the future, et cetera. Yeah, exactly. Every time you succeed, you become more confident in your ability to pull it off the next time, and so, you never stop. You’re guaranteed, eventually, you’ll get arrested.
Robert Wiblin: I guess you have less and less value, and more and more money, that might be an offsetting factor that would actually cause people to stop in the real world. So, you’re stealing more and more, you’re [crosstalk 02:40:49] your returns by how much money you have.
Brian Christian: Yeah, exactly.
Robert Wiblin: So, maybe that would eventually cause you to stop, you know?
Brian Christian: That may be part of the answer why, literally, all criminals are not in jail, I don’t know.
Robert Wiblin: Like still burgling houses at the age of 80.
Brian Christian: Yeah, yeah, exactly.
Robert Wiblin: I think that Laplace’s Law is also called the Law of Succession, I think if you google that, Wikipedia article comes up. Okay, another one, you’ve got a pro and con list and canceling out. And Darwin doing that, trying to decide whether to marry. I think this is something that Benjamin Franklin did all the time, to draw out a basic very extensive prone comments. Do you want to describe what you’re talking about here?
Brian Christian: Yeah. There this is really beautiful charming passage from Charles Darwin’s diary, where he’s trying to decide whether to propose to this woman. He draws at this list of all of the things that might good or bad. He has “children, if it pleases God,” on the other hand, “Oh, I have less money for books.” I mean, it’s really, really amusing, all of the things that he thinks about. By the end, smooshed into the bottom margin of the page, he decides, “Marry, marry, marry, QED.” (It is proven.)
Brian Christian: He then goes on to, of course, overthink the decision of when to marry, “Now, or soon?” Then he lists all the pros and cons of he wants to go on this hot air balloon trip to Wales, and all these, everything. What I think is really interesting about that is, this particular example of Darwin is often raised in this context of someone who is a chronic over-thinker. It seems, in a way, too calculating of a way to think about getting married. In some ways, I think you could make the opposite argument, which is that he forced himself to decide by the bottom of a single piece of paper. It was as if he listed things until he just hit the bottom of the page, and then forced himself to make a decision, based on an evaluation that factors that happen to fit into the page.
Brian Christian: This is connected to this idea from machine learning, which is called regularization. The basic idea in regularization is that, often times, it’s possible to make an arbitrarily complex model of some data that you want to try to predict, let’s say. But there might be a lot of reasons why you shouldn’t do that. For example, there’s this very deep idea in machine learning, called the bias-variance tradeoff. That says, basically, the more complex your model is, the more it will differ based on the exact data that you’re fitting it to, and this will make it less robust to new data that it hasn’t seen. And extrapolating from the model it makes may be more and more bizarre and random.
Brian Christian: So, there are a lot of reasons to want to inhibit the complexity of a model, even if it’s the case that a more complicated model appears to offer a better fit for the data that you have.
Robert Wiblin: It fits what you’ve already seen, but it won’t fit the future.
Brian Christian: Exactly.
Robert Wiblin: Because it’s fitting itself to idiosyncratic factors about what you’ve seen so far, not the fundamental underlying process that generated what you saw?
Brian Christian: Yeah, that’s exactly right. And so, there has been this entire development of methods for regularization, L1, L2, et cetera, that some of your listeners will be familiar with, that basically act as a downward pressure on the number of variables, or the number of parameters that your model has.
Brian Christian: I think it’s interesting, on the one hand, to just make this argument for simplicity. That, I think, intuitively, we have this idea that, making a better decision almost necessarily means taking more information into account, thinking longer, gathering more data, considering more factors.
Brian Christian: That’s not the necessarily the story that you get, by thinking about this from the perspective of machine learning. There are, I think, powerful arguments for simpler models being more robust, in a lot of different ways.
Robert Wiblin: Yeah, I quite like pro and con lists, in a way. Just, yeah, trying to think of a good number of positive things and negative things about a choice. You were saying you need to cut that off, at some point, because, otherwise, you’ll just end up thinking about unimportant things you care about right now, but won’t seem that important in the future, something like that? I think, maybe, an even bigger problem, with, really, a sense of pro and con lists, is that this is-
Robert Wiblin: I think maybe an even bigger problem with really extensive pro and con lists is that there’s this strong temptation to decide based on which side of the ledger has more factors on it, which is obviously incredibly stupid.
Brian Christian: Yeah .
Robert Wiblin: Like no one would think that they’re deciding on that basis but they could kind of do it by accident. The other thing is if you have very long lists of pros and cons, it’s hard to divide your attention to each of those in accordance with their importance and so if you have a long list of factors where like the top one is, should be given the 100th the weight of the bottom one, then you’re not actually going to spend 100x as much time thinking about it and it’s very hard to just give it that much significance. And so actually a pretty good procedure is to potentially spend quite a bit of time thinking of pros and cons, and then pick the top three from each side.
Brian Christian: Yeah.
Robert Wiblin: Like really think about which of the top three and then just weigh those against one another, rather than allow yourself to become distracted by like insignificant issues further down the page.
Brian Christian: Mm-hmm (affirmative). Yeah, I think this is exactly the counter narrative that you can apply to the Darwin case. You can say that he was regularizing himself to the page. He would only consider the number of factors that could fit onto a page of loose leaf paper.
Robert Wiblin: Yeah.
Brian Christian: Or, his diary notebook paper. And I occasionally use this in my own life where I will think about something until I’ve completely run dry of things that I can say about one side or the other, but then I will force myself to articulate the gist of the decision as, “On the one hand x, on the other hand y.”
Robert Wiblin: Yeah.
Brian Christian: Now, granted that’s a fairly artificial way to go about it. If y is half as important as x, but y prime also exists and is equally important, then you could get a decision that’s sort of faulty. But I think this method of aggressively constricting the complexity of your decision making process—there’s a story that you can tell that really makes this argument that robust decision making, especially in the face of great uncertainty, really should be as simple as possible.
Robert Wiblin: There’s this interesting ambiguity where it’s not clear whether you should spend more time contemplating a decision in an environment where evidence is very clear and crisp, versus where it’s vague and hard to say how much weight to give to it. There’s kind of an elasticity there as an economist. Cause there’s two different considerations. On the one hand, where the evidence is very clear, you can tell how strong an argument is, that means that you got more value from finding out an argument cause you know how good it is. On the other hand, you might actually just end up solving the problem fairly quickly. You come up to a good answer fairly fast.
Robert Wiblin: On the other hand, with the question where it’s really hard to tell how strong the arguments are, where it’s hard to predict the future, those kind of long term things, so you don’t understand the lay of the land very well. On the one hand, the argument’s you’re coming up with aren’t terrible persuasive. On the other hand, you really don’t know what to do
Brian Christian: Yeah.
Robert Wiblin: because the question is so much harder and you’re never going to be super confident. So, I wish I knew the answer to this. Whether when you’re making a very difficult decision, whether you should spend a lot of time thinking about it or more time thinking about it or less.
Brian Christian: Yeah, the question of time is quite interesting. I certainly think, if you’re framing it in terms of the complexity of the logic, or the complexity of the model that you’re building. If you’re trying to write a business plan for the next 18 months, it should probably be pretty long and include, this is exactly what’s going on in the market, this is exactly what we’re going to do and how we’re going to do it and x, y, and z. These are our projections. If you’re trying to make a business or create a nonprofit organization that you want to have an impact a century from now, the business plan is probably gonna be like two sentences, if that, and that is appropriate. Right? Because everything else is going to sort of wash out in the uncertainty of what might happen. In a way, it’s sort of counterintuitive, that the grandest ambitions in a way should be the most succinctly stated. But I think that makes sense from this perspective.
Robert Wiblin: Yeah. There’s this thing, if you try to figure out how to influence the world in a couple of years time then you can come up with very specific plans. I’m gonna work with this organization and meet this person. Whereas if you’re thinking about, how will I be able to have a lot of influence in 50 years time. It becomes very vague, like, I need to have a lot of money or I just need to be a well known person because you can’t figure out who you need to talk to or what you need to spend the money on yet.
Okay, let’s push on. Computational kindness.
Brian Christian: Yeah.
Robert Wiblin: What is that concept? I loved it.
Brian Christian: Yeah. So, with everything that we’ve been kind of establishing about the computational nature of the problems that people face in everyday life, it gives us an opportunity to think, not just technically and strategically but also ethically, about the problems that we pose to one another. Both in our kind of interpersonal interactions but also in the types of policy that we enact and the way that we design physical environments and things like this. So one example would be, imagine you’re driving towards a destination. So if we frame this as a math problem it would be something like, you’re on an infinitely long, infinitely straight road and you start infinitely far away from your destination and you’re approaching your destination and you’re looking for a parking spot. You see a space appear and then you are faced with this kind of optimal stopping problem of do I just take this space or do I push on in the hopes that there’s a better one out there.
Robert Wiblin: And it’s kind of symmetric. You can continue going [crosstalk 02:50:20]
Brian Christian: Yeah and then you can overshoot it and end up father away [crosstalk 02:50:22] on that and so forth. Exactly. So there’s actually a heuristic that says you should always overshoot it because then you are at most, twice as far away as you would have been.
Robert Wiblin: More than 50% time you should overshoot.
Brian Christian: Right, and so part of what’s interesting about this, is that the strategy of approaching the destination requires this analysis of what is the percentage of the spots that are occupied. As soon as you pass the destination and you start pointing away from it, all of the math drops away and you should just take the first spot that appears. And I think this is sort of a toy example.
Brian Christian: But if you imagine building a parking garage, if you start the parking garage at the best spots, and you slowly spiral to worse and worse and worse spots, that’s a computationally kind architecture. Because the optimal stopping rule is dead simple: if you see a spot, it’s by definition the best spot that you’ve encountered so far. Take it, you’re done. If you build the parking garage in the opposite direction where you enter, let’s say, the back, and you’re slowly spiraling slowly towards the place where you want to go, then you find yourself now in this dilemma where you have to kind of crunch the numbers and figure it out. And so, it’s just a small example but it shows that the problems that we face are not, some of them are just intrinsically posed to us by nature, by the world. Many of them, an increasing many of them, are designed by somebody else.
Brian Christian: So I’ll give two examples here. Another toy example and then another sort of real world example. So the toy example there’s this lovely paper by the computer scientist Jeffrey Shallit, looking at if you were to add a coin to the money supply. Let’s say you wanted to minimize the number of coins required to make change across all possible values of change that you might need to make. What would be the denomination of the coin? Turns out it’s 18 cents. It would be an 18¢ coin. Unfortunately, this would dramatically alter the computational kindness of making change. So currently, you can use what’s called a greedy algorithm, where if you’re making change you just give as many quarters as possible, then as many dimes as possible, then as many nickels, then as many pennies. If there was an 18-cent piece and you needed to give someone 36 cents, you wouldn’t just start with the quarter and the dime. You would have to realize oh, this is two 18¢ coins. So change making would cease having this greedy algorithm.
Robert Wiblin: Can you still follow the process of giving the biggest coin that doesn’t take you over and then the next largest coin until, is that?
Brian Christian: No, you can’t .
Robert Wiblin: No [crosstalk 02:52:55]
Brian Christian: No and that’s [crosstalk 02:52:55]
Robert Wiblin: So it becomes the knapsack problem [crosstalk 02:52:57].
Brian Christian: Exactly right. Exactly right.
Robert Wiblin: Interesting. And why doesn’t that happen when it’s all like divisible by 10, or 5, or 1?
Brian Christian: As long as everything is kind of mutually divisible, then
Robert Wiblin: Oh, I see.
Brian Christian: There’s never a situation where
Robert Wiblin: Gotcha.
Brian Christian: Where some combination of smaller things is better. So then you can ask the question, well what denomination of coin I added to the money supply would minimize the expected number of coins per transaction, subject to the condition that change making still has this nice greedy algorithm? And it turns out the answer is a 2¢ coin, which makes a lot of sense. And so more broadly I think, there are a lot of situations where we can adjust the way that problems are framed in what I would consider this computationally kind, more ethical framework that tries to minimize the cognitive burden on the other person.
Brian Christian: So a real world example of this: if you are buying a house. Typically, the way that buying a house works is called a first-price auction. So sealed-bid first-price auction. So you try to estimate how many other bidders you think there might be and perhaps you don’t even know, so you try to gather information that suggestions how many other bidders there might be and then you make a single bid and your objective is to win, but you want to win at the lowest possible price. And so we can give you the math, there’s all this game theory of what is the best bid to make relative to your private evaluation of the worth of that asset. It also factors in how many other competitors are also trying to get it. And you can run the numbers and you can come up with an optimal strategy.
Brian Christian: Well it turns out, that if you instead set it up as what’s called a second-price auction, where the person who writes the biggest number still wins but they pay the amount of the second highest bid. This is also known as a Vickrey auction. So there all these wonderful theorems that say that a Vickrey auction will, once you take into account people’s kind of strategic adaptation to the new rules, it will end up with the same good going to the same person for the same amount of money. It will generate the same amount of revenue to the seller. But, all of this strategic thinking that buyers have to do goes out the window. The optimal strategy in a Vickrey auction is to just literally put down exactly what you think the asset is worth.
Robert Wiblin: What they’d be willing to pay.
Brian Christian: Exactly.
Robert Wiblin: Yeah.
Brian Christian: Exactly. In some ways the rules of the game are optimizing for you.
Robert Wiblin: Yeah.
Brian Christian: There’s no further optimization to be done by thinking strategically.
Robert Wiblin: I guess, I assume that houses usually aren’t auctioned this way because they think they can manipulate people and get them to behave irrationally and bid more than they should. Like if they were fully being strategic in this way and downgrading from what they think it’s worth to what other people think it’s worth.
Brian Christian: Yeah so there are a number of theorems that show what’s called revenue equivalence. There may be specific reasons that the housing market violates the assumptions that those theorems are based on. But you do see for example, Google’s ad auction was famously set up as a Vickrey auction and in part because they figured the computational kindness to their clients or the people placing those ads would, in the long run, make it easier to place ads and easier to you know sort of maintain a longstanding relationship with Google. And that they would come out ahead.
Robert Wiblin: People were running so many ads and they don’t know the market so it’d be so confusing as an advertiser on every ad, what is my maximum bid for an impression [crosstalk 02:56:18]
Brian Christian: Right exactly.
Robert Wiblin: Yeah so that one’s really nice. Auction theory in general, we should check that out. Another example of computational kindness that really resonates with me is if you’re trying to schedule with someone, rather than say oh I’m free anytime in the next month, what time would you like to meet? We’ve all received an email like that right? I don’t want to look at my calendar and try to find the optimal time to meet in the next month so I’m just gonna like answer this email later, it’s a bunch of work.
Brian Christian: Yeah.
Robert Wiblin: Whereas if someone says “Are you free 2 P.M. on Tuesday?” You’re like well I can check that very easily.
Brian Christian: Yes.
Robert Wiblin: You’re not demanding anything of them to figure out when they want to meet. So it oddly enough, being very specific, limiting their options in fact is, it requires less computation in their head and they’re more likely to respond to you. So I never email anyone suggesting, let’s talk or let’s meet without giving a very specific time.
Brian Christian: Yeah. I think that’s right. I mean Tom and I experienced this firsthand in the process of researching the book. We did I think something like 100 different interviews with various computer scientists and so forth. And we found empirically, that if we said “Are you free at 10.30 next Tuesday?” we got a higher response rate than if we said “Are you free in the next three weeks?” Right?
Robert Wiblin: Will you ever be free? Is the easier question to answer but not useful.
Brian Christian: Exactly. So it just so happens that framing the problem in a way that’s easier to understand is better than giving the person the option.
Robert Wiblin: Okay so this one’s not actually in the book. You were talking about it earlier. Corrupted back channeling.
Brian Christian: Right so this is an idea that comes out of networking. So one of the issues in TCP/IP is, how do you create a robust communication channel over an unreliable medium? So one of the ways that you do this is by having what are called acknowledgements, or acks for short. So whenever you send an HTTP packet, you get what’s called an ack packet back. And this turns out to be this critical aspect of the functioning of the internet that I think people don’t quite appreciate. So for example, your maximum download speed is dependent on your upload speed and on your upload latency because the person who’s sending you that file requires you to be going “uh huh, uh huh, uh huh” in order to send them at the maximum bandwidth.
Brian Christian: And so you sometimes get these paradoxical situations where the user experiences what they perceive to be a download problem but the symptom is actually, something’s clogged in the upload, upstream direction. So in the book we talk about this issue that’s called bufferbloat which has exactly this property.
Brian Christian: It also connects to the linguistics of human communication because back channels are a huge and kind of unappreciated or underappreciated aspect of human communication. That basically, the listener has this active role in shaping even what appears to be a soliloquy or a one-direction communication. So there’s been a lot of really interesting linguistics research on the role of these back channels and so one of my favorite is there’s a study where you are telling a story to someone on, let’s say over the phone but unbeknownst to you, their back channels to you are being corrupted and they’re being replaced with random back channels. So either they’re coming at random times or they say “uh huh,” but you hear “oh,” or vice versa. What they find is that this completely destroys your ability to tell a convincing story. And the message here is really there is this unsung role of the upstream direction: these back channels that are present, not only in our computer networks but also interpersonally.
Robert Wiblin: I guess the problem is you’re using this backchanneling to figure out whether to give more detail or less detail and whether to speed up or slow down and to like yeah build up a model of how much the other person knows and what they know about. But if it’s just all random and you just end up with this incredibly confusing person who like wants you to suddenly go faster or slower even though it should be the other way around they know about something and they don’t understand it.
Brian Christian: Exactly. So yeah you find that a storyteller for example, will painfully repeat a section of the story as if you didn’t get it but then this makes the story much worse and so forth. They’re trying to build a model of what you appear to do and don’t know but that model doesn’t make sense so the evidence that they’re getting just doesn’t paint some consistent picture. So then they’re working overtime trying to understand you. And I think part of what I think is just kind of an interesting parallel, is that from kind of the late 1970s through the last decade, you have this parallel discovery of the importance of the back channel. Both in networking and also interpersonally and those stories kind of compliment and form one another.
Robert Wiblin: Okay let’s talk now a little bit about your career as a writer and whether listeners should potentially become writers as well. Are you glad you became a writer? What would you advise people in general? It’s known to be a fairly difficult thing to break into, fairly competitive.
Brian Christian: Yes. I mean I guess one thing I should bracket everything that I say with you know survivorship bias. The fact that you’re asking me indicates
Robert Wiblin: I shouldn’t listen.
Brian Christian: Indicates that I’m not from the median of the distribution or whatever. So with that having been said for me it’s a really satisfying profession and it’s something that I certainly would, on the one hand recommend to anyone who wanted to get involved. I’m reminded of Robin Williams, I believe it was, used today that if anyone ever asked him whether they should become a comedian, he would say no. The reason being, it was a really hard life with a low probability of success and if the person had any doubt then he really should turn them away and if they had no doubt then his telling them not to enter the profession would have no effect.
Brian Christian: I’m tempted to just give the Robin Williams answer and leave it at that. But I also think that for me following this kind of upper confidence strategy of taking a swing at it is something that I certainly don’t regret. And in particular, I guess the one thing that I often tell writers is, there’s a sense in which realistic aspirations are almost just as difficult as unrealistic aspirations. This is an idea that for example Tim Ferriss has talked about. That often people self-select and don’t try the outrageous ambitious things and because no one’s trying it, it’s not as hard as you think.[crosstalk 03:02:50]
Robert Wiblin: Competitive.
Brian Christian: Exactly. Everyone is clambering for the realistic thing. So I experienced this as an undergraduate in a very explicit way, which was at Brown University there was this introduction to writing course, the intro level writing workshop. And it was so over-enrolled that they just had a random lottery and that’s who got in. And so people would spend their entire undergraduate years trying to get into this intro class. The intermediate level workshops required a portfolio. You had to submit a selection of your work and I, as an incoming freshman, like pretty much everybody, assumed that my work was nowhere near good enough to qualify me for that. But I had a peer advisor who was an upperclassman and she said look, I’m gonna let you in on a little secret: everybody thinks they’re underqualified and so fewer people apply than the number of spots they have, and everyone gets in. And so I really felt like I was able to just jump the turnstiles and went straight into intermediate workshop and it kind of went from there. And more importantly, I feel like that gave me kind of a principle that I find useful in a lot of situations. I mean the same thing happened to me, I went to graduate school for creative writing. I graduated with my MFA at the bottom of the recession, just not an ideal time to be minted with a terminal degree in creative writing.
Robert Wiblin: (Laughs)
Brian Christian: And you know I was trying to figure out what to do and at the same time where I had work getting turned down from these kind of obscure online publications that were gonna pay me $10 for an essay or something like this, I was able to succeed at getting a book contract with Doubleday. And it felt to me very analogous to what I’d experienced as an undergrad, where the competition for what were perceived to be the next logical steps, was so fierce but no one was just going for it in a way. That’s hyperbole.
Robert Wiblin: Yeah.
Brian Christian: But it was harder than I thought to climb the ladder and easier than I thought to just jump the queue. And so that’s, for me, I think that’s advice that actually applies to almost any situation. Certainly, I’ve found it a few times in my writing career. I encourage people to try to think about that.
Robert Wiblin: See if you apply further up the ladder there’s kind of always a lot of randomness with these applications. Even if they shouldn’t give you the job, there’s a chance that they will.
Brian Christian: Yeah. (Laughs)
Robert Wiblin: Write that.
Brian Christian: Yeah that’s right.
Robert Wiblin: Do you think you’re in a good position to make a large social impact? Or do you have to make a lot of compromises in what you write about?
Brian Christian: That’s an interesting question. For me as a book author, it’s a very different position from being kind of a day to day journalist, where you’re expected to produce some quota of stories every day, every week, every month that kind of thing. So, I certainly experience great latitude in types of projects that I can take on and how long I can spend working on them. It’s also the case that, I think book writing relative to magazine writing, you have a lot more latitude at the sentence level, because the draft that you’re turning in is just too big for someone to micromanage your syntax on every sentence. I’ve done very sparing amounts of magazine journalism over the years. I haven’t done any of that in a long time but when I have, I’ve bristled at the amount that my editors have sort of massaged the points that I was making across the individual piece. At the book length, it’s too much for them to successfully do that.
Robert Wiblin: To some extent you can say what you like.
Brian Christian: Yeah and I think that also has affected the way that I read magazine articles versus books. When you’re reading a magazine piece, there’s a single person on the byline, but what has appeared in that piece has gone through a chain of editors, each of which has sort of put some kind of stamp on it. When you’re reading a book, it is coming much more unfiltered from the author themselves. So for that reason I’m sort of biased towards books as a medium, both as a writer and as a reader.
Robert Wiblin: You also don’t have the experience of having some idiot come in last minute and give an inaccurate title. It’s a very common problem for journalists.
Brian Christian: I have experienced that. I have many horror stories about exactly that. And then later, people will interview you and ask you why you said x and out of loyalty to the publication you can’t just say well, so-and-so up the editorial chain said x.
Robert Wiblin: It’s a devil’s bargain. Sometimes they’re like oh we’ll publish your work but we’ll have to, we’ll lie about what you said in the title to get more clicks. It’s kind of sickening in a way. Yeah, I guess it’s a fierce media environment.
Brian Christian: Yeah.
Robert Wiblin: Anyway you don’t have to deal with that with books. That’s fortunate.
Brian Christian: You have considerably greater degree of control, although probably a lot less than people imagine over certain things like subtitle and the cover art and things like that, so it’s not without that element. But I certainly think it’s a pretty unique position, even within the nonfiction ecosystem, to be able to write books and so I try to take that liberty very seriously.
Robert Wiblin: What books would you like to see written on really important topics that you don’t expect to get to yourself? And maybe that you do expect to get to yourself?
Brian Christian: Well I can’t reveal my R&D pipeline! But no I think that’s a great question. One of the things I’ve been thinking a lot about is there just seems to be a near total breakdown in the ability of people to constructively disagree. I mean, by any measure it seems like polarization is sharper than ever. Both, I mean, even if you just look at the left and the right politically in America right now. There’s just a total inability to speak to one another in a constructive way, and I think it’s also reflective of just a society that is more self-segregating. It really seems to me that some sort of intervention needs to be done in order to restore people’s ability to articulate what they think and why, and to have that conversation in a way that both people are gaining information and it’s not sort of framed as zero-sum mortal combat. I think that’s important both for the politics of having a healthy democracy. I think it’s also important for people just interpersonally to be able to say vulnerable, difficult things and manage conflict or manage disagreement in a relationship. It feels to me like we’re just getting systematically worse at that. I don’t know if this is just me in my 30s becoming like, curmudgeonly, and you know saying “kids these days” but [crosstalk 03:09:22] it does seem. Yeah I agree with that.
Robert Wiblin: Also I think people’s real life relationships, I don’t think have necessarily have gotten worse. The relationships you have with your friends and family and how rancorous those discussions are. It wouldn’t surprise me if that’s pretty similar. I think it’s just that so much interaction is now happening through other media that don’t encourage treating other people humanely.
Brian Christian: Yeah. Yeah I think that’s true I mean it also feels to me, and it’s hard to measure this but it feels like there are less constructive disagreements over the Thanksgiving table. I could be wrong about that. So I think there’s something there that’s sort of at the intersection of humility and actual kind of verbal skill that feels to me like a necessary intervention.
Robert Wiblin: Yeah. I’ve quite a lot of thoughts on this.
Brian Christian: (Laughs)
Robert Wiblin: There’s a common argument we’ll make that it’s harmful to think that other people are have stupid ideas or that they’ll evil. Most people in society think this about other people in society.
Brian Christian: Yeah.
Robert Wiblin: So like don’t do it. I’m like but you can’t just change your beliefs based on what’s like most convenient for people to believe. Right? You can change your behavior but you can’t necessarily just say oh well it would be bad if I believe that they’re bad people so I’m not going to believe that.
Brian Christian: Yeah.
Robert Wiblin: I think like realistically what you can push on and say is yeah lots of people are actually pretty bad. Most people have very stupid ideas. But it’s not really helping to treat them in a very angry, acrimonious way so you should find some way to make peace with being friendly, potentially with people you think are quite bad. Just like have some kind of fatalistic sense of humor perhaps or realize that I think people often are inclined to say well, they’re a bad person so I ought to hate them or I ought to treat them terribly. I’m like, well not necessarily, not if it doesn’t help. (Laughs)
Brian Christian: (Laughs) Right.
Robert Wiblin: What is there in fact if what you’re doing is causing harm and not changing their minds?
Brian Christian: Yeah. I find anecdotally, if I interact with people with whom I disagree, their claws come out automatically and when I don’t engage the conversation at the level of zero-sum one-of-us-is-going-to-win, but I just genuinely attempt to understand what they’re thinking and where they’re coming from I experience this reaction of almost confusion, people don’t know how to navigate this weird rhetorical context they now find themselves in. Like they’re prepared to fight, and when I don’t fight back then they… It’s this weird thing where they’re sort of trying to figure out, what does it mean to have a different kind of conversation?
Robert Wiblin: It’s very different when there’s an external audience versus if there’s not.
Brian Christian: I agree with that.
Robert Wiblin: It’s hugely different. Like if I’m just speaking to you, no one else is here to persuade you then it’s pretty clear I should be nice to you. If there’s like 100 people watching and basically I don’t care what you end up thinking, I just care about what they end up thinking and there’s potentially reason to savage you in the conversation even if I’m totally giving up on ever convincing you.
Brian Christian: Yeah.
Robert Wiblin: I think we probably need more conversations without audiences frankly.
Brian Christian: I agree and it’s not clear how that’s gonna happen because it feels like there sort of is no private sphere anymore.
Robert Wiblin: Mm-hmm (affirmative)
Brian Christian: Right like anything [crosstalk 03:12:20]
Robert Wiblin: There is a bit
Brian Christian: Anything you text to someone can be screen shotted on the front page of Reddit 30 seconds later. And so I feel like people increasingly behave in private, as if they’re in public.
Robert Wiblin: Yeah. I think it’s very anxiety inducing as well.
Brian Christian: Yeah.
Robert Wiblin: There’s a lot to say here maybe we’ll have a different episode on online discourse.
Brian Christian: Yeah, well it should be multiple books (Laughs)
Robert Wiblin: Yeah. There is a stigma, like people who screenshot people who share private text messages publicly, I think are treated as like dishonorable and thank god and thank god cause otherwise you’d just constantly be having people revealing like things that it would make it impossible I guess to have discourse by text message.
Brian Christian: Right.
Robert Wiblin: It would be too dangerous. Cause I mean all of us have private views we want to share with one person that we don’t want published. There’s nothing bad about that.
Brian Christian: Right. Yeah I’ve been in situations where I’m asked to address a group of 10 people and then someone says, as if off-handedly, oh can we record this and put it on YouTube? That’s a room of a million people.
Robert Wiblin: Yeah.
Brian Christian: That’s a completely different room and.
Robert Wiblin: Forever!
Brian Christian: Yeah, yeah, yeah, exactly.
Robert Wiblin: I’ve been asked that before and I’m like, I say yes you can but it will be a totally different talk.
Brian Christian: (Laughs)
Robert Wiblin: I’ll have to change what I say pretty substantially potentially.
Brian Christian: Yeah.
Robert Wiblin: You have to like think in that case what if someone took 10 seconds what you said totally out of context then like how bad could they make you seem? You then have to be constantly on the defensive about anything any individual sentence that you might say. It’s a good reason not to record things (Laughs).
Brian Christian: No I totally agree with that. Yeah It’s
Robert Wiblin: We say with a microphone in front of both of us.
Brian Christian: Yeah indeed yeah. But I do think yeah we need to somehow try to reclaim that so it seems to me worth doing.
Robert Wiblin: All right so to finish you mentioned earlier that you’re getting married in just a couple of weeks and marriage is something that shows up in the “Algorithms” book very regularly.
Brian Christian: (Laughs)
Robert Wiblin: End up using any of the algorithms you described to decide whether to get married? And when?
Brian Christian: Yeah my fiancée tells this story, which I don’t remember having said, but I believe her when she says it that, at some point after we were dating I was researching the book. I was writing the optimal stopping chapter and I mentioned to her, oh you know well 37% of the average American male lifespan is something like 29 and I’m 29 so you know if this works out, I’m all in.
Robert Wiblin: (Laughs)
Brian Christian: And I’ve no memory of this conversation but that sounds like the kind of thing I would say and it did work out, and I am all in, so.
Robert Wiblin: Yeah.
Brian Christian: You know I’ve followed that.
Robert Wiblin: This has been an absolute marathon conversation we’ve covered quite a lot of stuff in the book but really only like 3 chapters out of 11 in any detail. So if you enjoyed this, there’s tons more algorithms to live by. There’s also The Most Human Human and the book you’re writing about AI and controlling it and where technology’s going coming out next year so go out and buy these books. My guest has been Brian Christian. Thanks for coming on the 80,000 Hours podcast, Brian.
Brian Christian: Thank you so much, it’s been a pleasure.
Robert Wiblin: Just a reminder about our annual impact survey.
If this show, our coaching, or any of the articles on our website have helped you have more social impact, please head to 80000hours.org/survey and spend a few minutes letting us know.
We really couldn’t exist without your stories.
Also did you know we have a newsletter which you can use to stay on top of all the new research we release, and new jobs we recommend people apply to? You can sign up at 80000hours.org/newsletter
The 80,000 Hours Podcast is produced by Keiran Harris.
Thanks for joining, talk to you in a week or two.