#147 – Spencer Greenberg on stopping valueless papers from getting into top journals

By Robert Wiblin and Keiran Harris · Published March 24th, 2023 ·

#147 – Spencer Greenberg on stopping valueless papers from getting into top journals

By Robert Wiblin and Keiran Harris · Published March 24th, 2023

Gene Wilder in Young Frankenstein (1974)

Can you trust the things you read in published scientific research? Not really. About 40% of experiments in top social science journals don’t get the same result if the experiments are repeated.

Two key reasons are ‘p-hacking’ and ‘publication bias’. P-hacking is when researchers run a lot of slightly different statistical tests until they find a way to make findings appear statistically significant when they’re actually not — a problem first discussed over 50 years ago. And because journals are more likely to publish positive than negative results, you might be reading about the one time an experiment worked, while the 10 times was run and got a ‘null result’ never saw the light of day. The resulting phenomenon of publication bias is one we’ve understood for 60 years.

Today’s repeat guest, social scientist and entrepreneur Spencer Greenberg, has followed these issues closely for years.

He recently checked whether p-values, an indicator of how likely a result was to occur by pure chance, could tell us how likely an outcome would be to recur if an experiment were repeated. From his sample of 325 replications of psychology studies, the answer seemed to be yes. According to Spencer, “when the original study’s p-value was less than 0.01 about 72% replicated — not bad. On the other hand, when the p-value is greater than 0.01, only about 48% replicated. A pretty big difference.”

To do his bit to help get these numbers up, Spencer has launched an effort to repeat almost every social science experiment published in the journals Nature and Science, and see if they find the same results. (So far they’re two for three.)

According to Spencer, things are gradually improving. For example he sees more raw data and experimental materials being shared, which makes it much easier to check the work of other researchers.

But while progress is being made on some fronts, Spencer thinks there are other serious problems with published research that aren’t yet fully appreciated. One of these Spencer calls ‘importance hacking’: passing off obvious or unimportant results as surprising and meaningful.

For instance, do you remember the sensational paper that claimed government policy was driven by the opinions of lobby groups and ‘elites,’ but hardly affected by the opinions of ordinary people? Huge if true! It got wall-to-wall coverage in the press and on social media. But unfortunately, the whole paper could only explain 7% of the variation in which policies were adopted. Basically the researchers just didn’t know what made some campaigns succeed while others didn’t — a point one wouldn’t learn without reading the paper and diving into confusing tables of numbers. Clever writing made their result seem more important and meaningful than it really was.

Another paper Spencer describes claimed to find that people with a history of trauma explore less. That experiment actually featured an “incredibly boring apple-picking game: you had an apple tree in front of you, and you either could pick another apple or go to the next tree. Those were your only options. And they found that people with histories of trauma were more likely to stay on the same tree. Does that actually prove anything about real-world behaviour?” It’s at best unclear.

Spencer suspects that importance hacking of this kind causes a similar amount of damage to the issues mentioned above, like p-hacking and publication bias, but is much less discussed. His replication project tries to identify importance hacking by comparing how a paper’s findings are described in the abstract to what the experiment actually showed. But the cat-and-mouse game between academics and journal reviewers is fierce, and it’s far from easy to stop people exaggerating the importance of their work.

In this wide-ranging conversation, Rob and Spencer discuss the above as well as:

When you should and shouldn’t use intuition to make decisions.
How to properly model why some people succeed more than others.
The difference between what Spencer calls “Soldier Altruists” and “Scout Altruists.”
A paper that tested dozens of methods for forming the habit of going to the gym, why Spencer thinks it was presented in a very misleading way, and what it really found.
Spencer’s experiment to see whether a 15-minute intervention could make people more likely to sustain a new habit two months later.
The most common way for groups with good intentions to turn bad and cause harm.
And Spencer’s low-guilt approach to a fulfilling life and doing good, which he calls “Valuism.”

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer: Keiran Harris
Audio mastering: Ben Cordell and Milo McGuire
Transcriptions: Katy Moore

Highlights

Experimental evidence on how to actually go to the gym more

Spencer Greenberg: One study I think that really makes it clear how hard behaviour change is, is this really huge study that was run fairly recently. I think it was on tens of thousands of people that were gym members, and they tried to get them to go to the gym more often.
Spencer Greenberg: The basic idea is they got tonnes of researchers — I think it was like 30 different scientists working in small teams — to develop behaviour change interventions. And then they took these tens of thousands of people — like 61,000 participants who already had gym memberships — and used these text message interventions, 54 different interventions that the scientists developed, to try to get them to go to the gym more often.
Spencer Greenberg: Here are the ones I thought were more promising, more interesting. One is giving people bonuses after they mess up. So the basic idea is if you fail to go to the gym when you wanted to, you’ll be told you’re going to get a special bonus if you recover from this mistake. So the next day, if you go at the time you planned, you’ll get extra points. And I think this one probably is not a false positive, because actually in the top five, this occurred twice: there were two slight variations on it, and they both were in the top five. So that seems really promising to me.
Rob Wiblin: How can people apply that in their normal life? You have this issue of falling off the wagon that a lot of people have when they’re trying to change their habits. I suppose you need to have an extra reward for yourself if you miss a day and then you manage to get back on the next day. That’s maybe like an intervention point where you’re particularly able to make a difference by geeing yourself up.
Spencer Greenberg: Right. And I think the key is to think about a failure as not, “Now I’m screwed up and now it’s not even worth it.” It’s like, “No, no, no, wait. Now I can recover and I should feel really happy if I am able to recover.” Because think about doing a habit: you’re going to have failure days, inevitably. If you can’t recover, then you’re pretty screwed. I just think that’s just a reminder that the recovery piece is as important as doing the habit in the first place.
Spencer Greenberg: So the other one is really quite interesting. They gave people a choice of a gain frame and a loss frame for the points they earned. The idea is when you go to the gym successfully at the time you planned, you earn points, right? So that’s a gain frame. But you can equivalently think of it as you start with all the points, and every time you don’t go to the gym, you lose points.
And with this intervention, they actually let people choose. They said, “Do you want to have this many points at the beginning and every time you don’t go, you lose them? Or do you want to have zero points at the beginning and every time you go, you get them?” And of course it’s the same number of points either way. But by letting people choose, they found people actually seem to go to the gym more often. And we don’t know for sure that it’s not a false positive, but I think it’s kind of cute. And if it actually works, that’s pretty cool.

The factors that predict success

Rob Wiblin: I suppose there are all of these folk theories about what determines success, which usually highlight one particular thing. You know, it’s just grit, it’s about your ability to stick to it, it’s just luck, or it’s just intelligence, or something. People are reaching, I think, for these simpler folk theories because they want to know empirically which of these factors actually is the most important in determining success. Do you know of any evidence that can help people narrow down what’s most important?
Spencer Greenberg: So what’s really tough, if you’re looking at really high levels of success, is it’s hard to get good sample sizes, right? Because you can kind of make a collection of some of the most successful people in history, but then you’re just like anecdotally investigating each of them. On the other hand, if we’re thinking about more ordinary levels of success — like being good at your job and marrying someone you like and so on — it’s a lot easier to get datasets, so I think more is known about that kind of success.
And if we think about that literature, things that come up a lot in a work context are conscientiousness — for a wide range of work, but not all types of work. But for many different types of jobs, you don’t want to be too low in conscientiousness. There could be diminishing returns. I think that’s an open question. Do you really want to be at 99.9999% of conscientiousness, or is that too much? I suspect that at some point it actually becomes dysfunctional, but at least up until a point, it does seem like it tends to predict job performance.
IQ generally predicts job performance across a very wide range of jobs. So that’s a helpful one.
Then spending time training at a particular skill that’s relevant for that job clearly is really important — but the type of training matters tremendously. So just the number of hours someone has spent doing a thing is a much less good predictor than the number of hours they spent with high-quality feedback. If you think about someone who just plays chess every day: yes, they’re going to get better at chess. But compare that to someone who plays chess every day, and at the end of the day, they break down what they did badly, what the best chess engine said they should have done instead, compare it to what they did, try to figure out why it said that. Even if you control for the amount of time they spend, the second type is going to become vastly better at chess.

Four improvements in social science research

Rob Wiblin: What do you think the state of play is these days regarding this social science reform movement?
Spencer Greenberg: Yeah, it is kind of depressing reading papers from the 70s, and being like, “Well, yep, they’re pointing out all the things that need to be fixed.” I think we still have a long way to go to make science better, a very long way to go. However, I do think there are glimmers of hope.
Spencer Greenberg: So one thing I’m seeing, and this is just kind of anecdotally, is I see more data being shared. I see more researchers being like, “Here’s my data, go check it out.” And also more material sharing as well, like, “Here are the materials I used for this study.” The Open Science Framework has been a really positive force, where they really are encouraging people to share their materials in an open way. It’s a nice platform for helping you do that. So that’s really cool.
Spencer Greenberg: I think there’s way more replication projects, where people are trying to replicate results, so that’s really great.
Spencer Greenberg: There’s also this idea of Registered Reports. Chris Chambers has really been an advocate for this. They’re quite an amazing idea, where basically they get journals to agree to accept a paper before the study has been run. So basically the journal knows exactly what the study is going to be, but they don’t yet know the results, nor do the researchers. The journal agrees to accept it, and then the research team goes and runs the study, and it gets published regardless of whether it’s a positive or negative result. And this is really nice, because it reduces the incentive to p-hack your results just to show some cool result because your papers are already going to be published either way. So that’s really nice.
Spencer Greenberg: And I would say just generally more awareness, like increased scepticism, is probably helpful, because it means people know they’re going to be scrutinised a bit more for their research methods.

Biggest barriers to better social science research

Spencer Greenberg: I think currently something like 40% of papers in top social science journals don’t replicate, but it’s pretty dependent on what field it is. And I think ideally we should get that down to something like 15% not replicating, or something like that. You’re never going to get to zero, because there’s always things that could happen — it could be just bad luck or weird chance or stuff like that — but I think it’s just significantly too high a replication failure rate.
Spencer Greenberg: And the basic answer is that it’s an incentive problem, fundamentally. That is the super high-level answer. There’s interesting things to unpack there about what would it mean to make better incentives — but at the end of the day, if you’re a social scientist in academia, you need to get publications in top journals in order to stay in the field, and to get those tenure-track roles, and eventually to get tenure. And if you can’t get published in the top journals, you basically will get squeezed out.
Spencer Greenberg: So there’s kind of a double incentive whammy here.
Spencer Greenberg: One is that if you’re kind of doing fishy methods, you might have a competitive advantage over people who are really playing fairly, right? Because maybe the fishy methods let you publish more often. So that’s really, really bad.
Spencer Greenberg: And the second thing is that eventually you’re going to end up with a field that gets filled with the people that are doing the fishy methods, and then that becomes a norm. If you see other people doing fishy things and you’re like, “I guess that’s how research is done,” then that’s obviously going to have a negative effect on the whole field. And so one thing that’s really great about the open science movement is that by pushing back against these norms, it’s trying to create a new set of norms, a better set of norms.

Importance hacking

Rob Wiblin: Are there any issues with science practice that are a big deal that you think listeners to the show might not have heard so much about, or might not appreciate how important they are?
Spencer Greenberg: Yeah, absolutely. One really big one — that I think actually might be on the same order of magnitude as p-hacking in terms of how important it is, but is really not well known and doesn’t really have a standardised name — is something we call “importance hacking.”
Spencer Greenberg: You can get your result in a top journal by tricking the reviewers into thinking that it was a valuable or interesting finding when in fact it was essentially a valueless or completely uninteresting finding.
Spencer Greenberg: One example that comes to mind is this paper, “Testing theories of American politics: Elites, interest groups, and average citizens.” The basic idea of the paper was they were trying to see what actually predicts what ends up happening in society, what policies get passed. Is it the view of the elites? Is it the view of interest groups? Or is it the view of what average citizens want?
Spencer Greenberg: And they have a kind of shocking conclusion. Here are the coefficients that they report: Preference of average citizens, how much they matter, is 0.03. Preference of economic elites, 0.76. Oh, my gosh, that’s so much bigger, right? Alignment of interest groups, like what the interest groups think, 0.56. So almost as strong as the economic elites. So it’s kind of a shocking result. It’s like, “Oh my gosh, society is just determined by what economic elites and interest groups think, and not at all by average citizens,” right?
Spencer Greenberg: So this often happens to me when I’m reading papers. I’m like, “Oh, wow, that’s fascinating.” And then I come to like a table in Appendix 7 or whatever, and I’m like, “What the hell?” And so in this case, the particular line that really throws me for a loop is the R² number. The R² measures the percentage of variance that’s explained by the model. So this is a model where they’re trying to predict what policies get passed using the preferences of average citizens, economic elites, and interest groups. Take it all together into one model. Drum roll: what’s the R²? 0.07. They’re able to explain 7% of the variance of what happens using this information.
Rob Wiblin: OK, so they were trying to explain what policies got passed and they had opinion polls for elites, for interest groups, and for ordinary people. And they could only explain 7% of the variation in what policies got up? Which is negligible.
Spencer Greenberg: So my takeaway is that they failed to explain why policies get passed. That’s the result. We have no idea why policies are getting passed.

Identifying p-hackers

Rob Wiblin: Are there any techniques for tackling p-hacking that I might not have heard about?
Spencer Greenberg: Well, one that I just think is really cool is this technique developed by Simonsohn, Nelson, and Simmons called a p-curve analysis. The basic idea is you take a bunch of p-values that a researcher has published, either from one paper or from a bunch of their papers. And think about it as you’re making a histogram of all the p-values they found that were less than 0.05. So you’re kind of looking at the distribution of how often did they get different values: How often did they get ones between 0.04 and 0.05? How often did they get ones between 0.03 and 0.04?
Spencer Greenberg: Then we can think about what we should expect to see. So in a world where all of their results are false, but they don’t do any sketchy methods, like it just happens that everything they study doesn’t exist, we should expect a uniform distribution: every p-value is equally likely. And that’s just sort of the nature, based on the definition of a p-value: if there’s no effects, you expect it to be uniformly distributed, flat. And for p-curves, they just look at the values less than 0.05. So in that case, the histogram we’re talking about here would just be uniformly distributed between 0 and 0.05.
Spencer Greenberg: What if they lived in a world where they were doing everything cleanly — they weren’t using p-hacking — but some of their effects were real? How would that change the distribution? Well, real effects are going to tend to have low p-values — lower than chance — so you’re going to get a bump on the left side of the histogram. At around 0.01 or 0.02, you’re going to see a bulge: more results than that flat, uniform distribution. So if you have a shape like that, that indicates that they are finding real effects that are not p-hacked.
Spencer Greenberg: So then what if they’re in a situation where they’re just p-hacking the heck out of the results, and mostly they’re false positives that they’re publishing? Well, if you think about what p-hacking is doing, it’s getting some results, many of which are greater than p=0.05, so you can’t really publish them in most journals. And then you’re either doing fishy things to get the p-value down, or you’re just throwing away the ones that happen to be just above 0.05 and you’re keeping the ones that happen to fall below it. And so what you get is a bulge of too many results on the right of the distribution, right around 0.05. So that means if you have a bulge on the left, you’re probably finding a bunch of real effects. If you have a bulge on the right, you’re probably finding a bunch of false effects.
Rob Wiblin: Wow. OK, interesting. So people, I guess they’ll find a particular researcher, or find some particular research topic, or a whole bunch of papers on some general theme, and then grab all the p-values and see what distribution they have: whether the bulge is towards 0 or the bulge is towards 0.05. And then they can say, does this literature as a whole have this problem, or does it not?
Spencer Greenberg: Exactly. You could do it on all the primary results of one researcher. You could do it on major results from a whole field. Potentially, if there’s one paper that had like seven studies in it, and they had a bunch of p-values per study, you could even try to do that. But there are some caveats. I mean, this is far from a perfect technique, but I think it’s a really innovative idea.
Rob Wiblin: If there were people doing this on the regular, then if you apply this to a specific researcher over the course of their entire career, and then they know that they can’t end up with this shape that has a bulge towards the 0.05 side, then that would be really chastening for them, because it limits what they can do. Even though you can’t tell which specific papers have real results and which ones don’t, it places far more constraints on what they can publish within their entire body of work.
Spencer Greenberg: Yeah, I think if there was some kind of really strong norm where everyone had these curves published and everyone checked each other’s curves regularly, it could create an incentive like that.

When to go with your gut

Spencer Greenberg: The FIRE Framework is all about when should you go with your gut? And basically, what I claim is that in these four cases, usually you should just go with your gut. It’s not worth going into deep reflection.
The first case is for fast decisions. The F in FIRE stands for “fast.” So imagine that you’re driving down the highway and suddenly a car going the wrong way in your lane is careening towards you. You don’t have time for reflection. You’ve just got to decide: you’re going to the left, you’re going to the right. What are you doing? And just act, right? So a fast decision, you got to go with your gut.
Spencer Greenberg: The second is the I in FIRE. It stands for “irrelevant” decisions. These are just very low-stakes decisions that really don’t matter, where using your reflection is probably just not worth the investment of your conscious mind. You could just be spending your time thinking about something more important. This would be like, you’re at the salad place, and you’re trying to decide, “Do I get carrots in my salad?” It’s like, does it really matter? If your gut tells you to get carrots, get carrots.
Spencer Greenberg: The R stands for “repetitious” decisions. So think about someone who’s a chessmaster, who’s played thousands of games of chess with feedback on how they did. That person is going to develop an incredible intuition for what is a good chess move. It doesn’t mean that they can’t spend the time thinking about it, but it means that they don’t necessarily have to.
There’s this amazing example of Magnus Carlsen playing chess against three people who are pretty good — they’re not like top, top people in the world, but three people who are pretty good. And he only gets a few seconds per move, so he really doesn’t even have time to reflect. And he beats them all. But the craziest part is he’s blindfolded, so he has to keep all three boards in his mind simultaneously and make each move within a few seconds. Think about how ridiculous that is.
So at some point, you’ve done something enough that your intuition is just built up. I think really the key here is intuition is not magic. Sometimes people treat intuition as magic: “Your gut knows all these things.” No, your intuition has to learn, right? So if you’re in an environment where you’ve done a type of decision many times — and, really key thing, you’ve gotten feedback, so your intuition was able to update — then you can often trust your gut.
Spencer Greenberg: The E is for “evolutionary” decisions. There are certain things that are hardcoded in animals — and, people being animals, they’re hardcoded in us. They’re not always right, but they’re pretty good heuristics. Like if you are looking at a piece of food and it smells horrible, just don’t eat it. It’s just not worth it. It might make you really sick.

Articles, books, and other media discussed in the show

Spencer’s projects:

Spark Wave — Spencer’s organisation that conducts research and builds software products with a psychology focus
Spencer’s Twitter: @SpencrGreenberg
Optimize Everything — essays from Spencer and the One Helpful Idea weekly newsletter
Clearer Thinking — more than 70 free tools and training modules to improve decision making, habits, and thinking processes
The Clearer Thinking podcast, including one of Rob’s favourite episodes: Are we all the heroes of our own stories? (with Cate Hall)
Transparent Replications — conducts rapid replications of recently published psychology and human behaviour studies in prominent academic journals (also featured in Vox
Mind Ease — an app for helping people with anxiety, with a focus on panic attacks
GuidedTrack — a platform for building behavioural studies, surveys, and apps (for non-programmers)
Positly — a platform for recruiting people for studies in over 100 different countries, including quality control measures
Thought Saver — a tool to help remember important things and change our behaviour based on things we learn. Spencer made two custom flashcard decks to help listeners remember the important ideas from this episode! Check them out here:
- Deck 1: Core Concepts
- Deck 2: Technical Definitions

Behaviour change:

Ten Conditions for Change — Spark Wave’s framework for creating positive behaviours
Does a simple tool cause people to create habits more reliably? by Magda Zena — a 2022 randomised controlled trial of Clearer Thinking’s Daily Ritual tool, which builds on a 2019 pilot study: Want to form some new daily habits? We ran a massive study to explore which techniques work best. by Gregory Lopez
Spencer’s Twitter thread on the findings of one of the biggest and most effortful studies on behaviour change ever conducted: Megastudies improve the impact of applied behavioural science by Katherine L. Milkman, et al.
Habitica — Rob’s preferred habit-changing app, which makes a role-playing game out of following your daily good habits

Theories of success:

How do we predict high levels of success? — Spencer’s 13 approaches to predicting high levels of success, and his own model
Outliers: The story of success by Malcolm Gladwell

Approaches to problems and solutions:

Soldier Altruists vs. Scout Altruists by Spencer — which builds upon the idea of scout and soldier mindsets in Julia Galef’s book, The scout mindset: Why some people see things clearly and others don’t
Conflict vs. mistake by Scott Alexander

Social science research and reforms:

How can we make science more trustworthy? — Spencer’s podcast episode with Stuart Ritchie, who wrote Science fictions: How fraud, bias, negligence, and hype undermine the search for truth
Importance hacking: A major (yet rarely discussed) problem in science — where Spencer discusses the four approaches to getting published in top journals under the current system
Promising improvements to the system:
- The open science movement and the Open Science Framework, a platform that encourages people to share their materials in an open way
- Registered Reports, a publication format where journals agree to accept a paper before the study has been run to address publication bias — discussed further in The past, present and future of Registered Reports by Christopher Chambers
- P-curve: A key to the file-drawer by Uri Simonsohn, Leif Nelson, and Joseph Simmons — a tool for detecting p-hacking and publication bias
What’s next for psychology’s embattled field of social priming by Tom Chivers, which discusses how almost 100% of social priming research has failed to replicate
Spencer’s Twitter thread on how p-values relate to the replicability of 325 psychology studies

Decision making:

The Decision Advisor — a Clearer Thinking decision-making tool
The FIRE Framework: Deciding when to trust your gut by Spencer
What is your time really worth to you? — a Clearer Thinking calculator
Make decisions like you climb mountains and avoid common decision-making mistakes — Spencer’s talk at EA Student Summit 2020
Beyond cognitive biases: improving judgment by reducing noise — Spencer’s podcast episode with Daniel Kahneman

Extreme views and ideologies:

How ideology eats itself: A quick primer on how to be a genuinely good person who harms the world by Spencer
Evaporative cooling of group beliefs by Eliezer Yudkowsky

Valuism and its potential benefits:

Doing what you value as a life philosophy: An introduction to Valuism — the first part of Spencer’s five-part Valuism sequence
Intrinsic Values Test on Clearer Thinking
Spencer’s essay on a philosophical confusion within a common effective altruist way of seeing things
My mode of EA burnout by LoganStrohl on LessWrong
Effective altruism in the garden of ends by Tyler Alterman on the Effective Altruism Forum (also about EA burnout)

Other 80,000 Hours podcast episodes:

Transcript

Table of Contents

1 Rob’s intro [00:00:00]
2 The interview begins [00:02:16]
3 Social science reform [00:08:46]
4 Importance hacking [00:18:23]
5 How often papers replicate with different p-values [00:43:31]
6 The Transparent Replications project [00:48:17]
7 How do we predict high levels of success? [00:55:26]
8 Soldier Altruists vs. Scout Altruists [01:08:18]
9 The Clearer Thinking podcast [01:16:27]
10 Creating habits more reliably [01:18:16]
11 Behaviour change is incredibly hard [01:32:27]
12 The FIRE Framework [01:46:21]
13 How ideology eats itself [01:54:56]
14 Valuism [02:08:31]
15 “I dropped the whip” [02:35:06]
16 Rob’s outro [02:36:40]

Rob’s intro [00:00:00]

Rob Wiblin: Hi listeners, this is The 80,000 Hours Podcast, where we have unusually in-depth conversations about the world’s most pressing problems, what you can do to solve them, and how to get one in 20 of your experiments to have a p-value below 0.05. I’m Rob Wiblin, Head of Research at 80,000 Hours.

Spencer has been on the show twice before, for episode #39 — Spencer Greenberg on the scientific approach to solving difficult everyday questions, and episode #11 — Dr Spencer Greenberg on speeding up social science 10-fold & why plenty of startups cause harm.

Both were much loved by subscribers, so it was a no-brainer to get him on to talk about the many new things he has been thinking about since he was last on the show in 2018.

Today we talk about:

Whether or not papers with lower p-values are more likely to have real findings
Why something called importance hacking might be as important as p-hacking
Spencer’s project to replicate all the social science papers published in the journals Nature and Science
Whether published social science research is getting any more reliable
When you should and shouldn’t use intuition to make decisions
Trying to model why some people succeed more than others
The difference between what Spencer calls “Soldier Altruists” and “Scout Altruists”
A paper that studied dozens of methods for forming habits, and why Spencer disagrees with how it’s presented
Spencer’s experiment to see whether a 15-minute intervention could make people more likely to sustain a new habit a month later
The most common way for groups with good intentions to turn bad
And Spencer’s mostly guilt-free approach to life and doing good, which he calls “Valuism.”

Some of Spencer’s colleagues have developed a memorisation tool called Thought Saver, and this episode comes with two flashcard decks made using Thought Saver that might make it easier to fully integrate the most important ideas that we talk about. The first deck covers 18 core concepts from the episode and the second deck includes 16 definitions of unusual terms. To use these decks, simply click the links found in the show notes or you can just check out Thought Saver in general at ThoughtSaver.com.

All right, without further ado, I bring you Spencer Greenberg.

The interview begins [00:02:16]

Rob Wiblin: Today I’m again speaking with Spencer Greenberg, who was last on the show back in 2018. Spencer is still an entrepreneur, and he founded Spark Wave, an organisation that conducts research on psychology and that builds software products with a psychology focus — such as apps for mental health and technology for speeding up social science research. He also cofounded clearerthinking.org, which offers more than 70 free tools and training programmes that have been used now by over a million people. Those are designed to help improve decision making and reduce biases in people’s thinking.

Since we last spoke, Spencer has become a competitor of sorts with his own podcast called Clearer Thinking, where he interviews people about all sorts of different topics. And he’s also become kind of a big deal on Twitter, where his handle is @SpencrGreenberg, with the second E in “Spencer” missing. Why is that, Spencer?

Spencer Greenberg: Unfortunately, my name is one character too long for Twitter.

Rob Wiblin: Sounds like a tech company, where you’ve got to drop one of the vowels. Anyway, Spencer’s background is in mathematics, and he has a PhD in applied math from NYU with a specialty in machine learning, especially relevant today. So thanks for coming back on the podcast, Spencer.

Spencer Greenberg: It’s really great to be here. I’m so excited to have this third conversation with you. I feel really grateful.

Rob Wiblin: I hope we’re going to get to talk about smart decision making and Valuism. But first, I feel like you have a hand in so many different things, Spencer. Sometimes I kind of actually find it a little bit hard to keep track, because so many people in my social circle say that they’re collaborating with you one way or another. Is it possible to give us a kind of moderately comprehensive list of all the projects that you’re helping out with one way or another at the moment?

Spencer Greenberg: Yeah. I don’t usually talk about projects that we’re not close to launching, but I’m happy to talk about projects that are already launched or almost launched. So we can kind of go through a few of those if you want.

Rob Wiblin: Yeah, that’s totally fine.

Spencer Greenberg: Great, so first of all, we have Clearer Thinking. It’s a website with lots and lots of free training programmes and tools. Each of them is interactive and you can use them on our website right now at clearerthinking.org for decision making, habit formation, and many other topics.

Then we have a newly launched project called Transparent Replications. The idea is a little crazy. What we’re trying to do is take the two top most prestigious general science journals, which are Nature and Science, and whenever they have a new psychology or behavioural science paper, we want to go quickly replicate it, to see if it replicates as a way to shift incentives. And we’re going to try to do that with more than 50% of the new psychology and behavioural science papers coming out in those journals.

We also have the Clearer Thinking podcast, as you mentioned.

Then we have Mind Ease, which is our app for helping people with anxiety, and we’re especially focusing on panic attacks right now.

We also have Thought Saver, which is a tool for helping people learn and remember everything they learn. The idea is so many of us spend our days learning things and then we just forget most of it, which is ridiculous. So Thought Saver is trying to make it easier to remember all the important things you learn and change your behaviour through the things you learn.

Then we have GuidedTrack, which is a platform for building behavioural studies, surveys, and it also lets you build entire apps. So it’s actually a new programming language designed for non-programmers. It helps you rapidly build things in the behaviour change space.

We have Positly, which is a platform for recruiting people for studies. It actually allows you to recruit people for your studies in over 100 different countries, and we have lots and lots of quality control measures in place to try to improve the quality of participants you recruit.

Then I’ll just mention two projects that are not yet launched, but are coming soon. One is our intelligence project. So we’re aiming to replicate more than 40 claims in the academic literature on intelligence and IQ to see how well they hold up. Like, is it really true that IQ is this big thing that predicts so many things? Well, we’re going to try to find out, and we’re going to test a lot of auxiliary claims around intelligence and IQ as well.

And then we have a meditation project that we’re launching soon. The idea is we went and tried to figure out all the different meditation techniques that exist. Of course, it’s impossible to collect them all, but we were able to find over 150 of them. And then we tried to build a categorisation system for them, and we’re gonna launch this website that helps you explore all these different meditation techniques and understand them at a high level.

Rob Wiblin: OK, so I feel a little bit justified in finding it a little bit hard to track all these different projects, because what’s that? Seven or eight of them? How big are the teams on these different projects?

Spencer Greenberg: Generally quite small. At the smallest, one person, but typically somewhere between two and eight people per project. So we operate with really small teams.

Rob Wiblin: Yeah. Listeners whose interest has been piqued by some of those will be happy to know that we’ll get to I think about half of them in the questions that I’ve got for you today.

I’m always kind of impressed by how much it seems like you’re managing to get done, but maybe to balance my perception and include some stuff that might be less obvious: are there any projects or things you’ve done that have struggled the last few years, or stuff that hasn’t panned out as you hoped?

Spencer Greenberg: Oh, absolutely, absolutely. I’ll give you an example of a totally failed project. I had this idea that by using automated participant recruitment, we could help people build better websites and products. So the idea was: imagine you could point at your website and say, “Make this better. Figure out how to make this better.” And what we would do is we would automatically send a whole bunch of people to your website, and then we’d ask them a tonne of questions, and we’d build this report that sort of analyses your website and how to make it better.

And I think it actually worked pretty well; we’d often find problems with websites and stuff like this. We just couldn’t sell it to anybody. It turned out to be a terrible idea, because what would happen is the really large website developers, they actually would outsource their work, and so they had no desire to have this report themselves. The companies they outsourced it to didn’t want it, because you could tell them that their work was wrong and they wouldn’t be able to tell their client everything they were doing is amazing. And then on the small websites, it turned out they don’t spend any money on optimising their website. They don’t even do basic things.

So we just couldn’t find anyone to sell it to, even though I thought it was pretty cool.

Rob Wiblin: Yeah, interesting. It’s kind of funny that you’re stuck in the middle and there’s no one for whom this is appropriate, no middle-sized customers.

Spencer Greenberg: I know. It’s like, “Surely a magic button to help you improve what you’re doing would be good?” No, no.

Rob Wiblin: Couldn’t sell it. Any other things that didn’t pan out?

Spencer Greenberg: I mean, I try to have more failures in my life than most people ever try at anything. That’s kind of my goal. Because I just think if you’re trying easy things, obviously you can succeed over and over again, but if you’re trying hard things, presumably you should be failing a bunch. You’ll see these reports from the government, where they’re like, “We’re funding all kinds of innovative, high-risk research, and 80% of them succeed.” You’re like, “What?”

Yeah, I mean, I fail on lots of micro things all the time. Like blog posts I’m really excited about that I spent tonnes of hours on, and then just nobody likes it. So yeah, lots of failures.

Rob Wiblin: Yeah.

Social science reform [00:08:46]

Rob Wiblin: All right, let’s push on and talk about problems in social science research, at least as it’s done today. Obviously you’ve thought a tonne about this, because you’re involved in so much nuts-and-bolts psychology research yourself.

I imagine that the audience to this show will be aware of the widely acknowledged issues here — like p-hacking and publication bias, which make it hard to just take the results of empirical academic papers at face value — because these have been huge topics in recent years. But I was kind of alarmed to discover that people were worried and talking about these things as early as the 70s, at least. They don’t seem to really have been fixed that much.

I guess the bottom-line outcome of all of these problematic research practices is that when people have tried to replicate scientific papers, that only half of the time will they find the same effect if they basically try to do the same experiment. For people who haven’t engaged a lot with concerns about this, a really good book that goes over the basic issues and describes some possible solutions is Science fictions: How fraud, bias, negligence, and hype undermine the search for truth by Stuart Ritchie.

Spencer Greenberg: I also have a podcast episode with Stuart talking about this, if you’re curious to check that out. It’s episode #141.

Rob Wiblin: OK, yeah. I feel like I’ve been hearing people wring their hands about this for all of my adult life, more or less, kind of concluding that scientific research in so many fields is hardly reliable and you need to really almost be in the social loop to know which results are real and which ones are bogus. I’m not even sure whether things have gotten better over the last 15 or 20 years. I’m feeling a little bit despondent and weary of the whole thing.

What do you think the state of play is these days regarding this social science reform movement?

Spencer Greenberg: Yeah, it is kind of depressing reading papers from the 70s, and being like, “Well, yep, they’re pointing out all the things that need to be fixed.”

Rob Wiblin: I guess we didn’t do anything about that at the time, and now the chickens have come home to roost.

Spencer Greenberg: Yeah, I think we still have a long way to go to make science better, a very long way to go. However, I do think there are glimmers of hope.

So one thing I’m seeing, and this is just kind of anecdotally, is I see more data being shared. I see more researchers being like, “Here’s my data, go check it out.” And also more material sharing as well, like, “Here are the materials I used for this study.” The Open Science Framework has been a really positive force, where they really are encouraging people to share their materials in an open way. It’s a nice platform for helping you do that. So that’s really cool.

I think there’s way more replication projects, where people are trying to replicate results, so that’s really great.

There’s also this idea of Registered Reports. Chris Chambers has really been an advocate for this. They’re quite an amazing idea, where basically they get journals to agree to accept a paper before the study has been run. So basically the journal knows exactly what the study is going to be, but they don’t yet know the results, nor do the researchers. The journal agrees to accept it, and then the research team goes and runs the study, and it gets published regardless of whether it’s a positive or negative result. And this is really nice, because it reduces the incentive to p-hack your results just to show some cool result because your papers are already going to be published either way. So that’s really nice.

And I would say just generally more awareness, like increased scepticism, is probably helpful, because it means people know they’re going to be scrutinised a bit more for their research methods.

Rob Wiblin: OK, so there’s some real improvement here. What’s the thing that’s blocking bigger improvements? You know, us getting to the point where I imagine I would learn about a research result that came out in Science, and I’d be like, “Wow, that’s probably true.” What do we have to do to get there?

Spencer Greenberg: I think currently something like 40% of papers in top social science journals don’t replicate, but it’s pretty dependent on what field it is. And I think ideally we should get that down to something like 15% not replicating, or something like that. You’re never going to get to zero, because there’s always things that could happen — it could be just bad luck or weird chance or stuff like that — but I think it’s just significantly too high a replication failure rate.

And the basic answer is that it’s an incentive problem, fundamentally. That is the super high-level answer. There’s interesting things to unpack there about what would it mean to make better incentives — but at the end of the day, if you’re a social scientist in academia, you need to get publications in top journals in order to stay in the field, and to get those tenure-track roles, and eventually to get tenure. And if you can’t get published in the top journals, you basically will get squeezed out.

So there’s kind of a double incentive whammy here. One is that if you’re kind of doing fishy methods, you might have a competitive advantage over people who are really playing fairly, right? Because maybe the fishy methods let you publish more often. So that’s really, really bad.

And the second thing is that eventually you’re going to end up with a field that gets filled with the people that are doing the fishy methods, and then that becomes a norm. If you see other people doing fishy things and you’re like, “I guess that’s how research is done,” then that’s obviously going to have a negative effect on the whole field. And so one thing that’s really great about the open science movement is that by pushing back against these norms, it’s trying to create a new set of norms, a better set of norms.

Rob Wiblin: Yeah. It’s such an incredible case of kind of interlocking incentives, where no individual actor is malicious, but anyone who doesn’t play along with this bad game is at risk of getting edged out of the field.

So any academics who decide to be incredibly scrupulous and not hype their results or not allow potentially a false positive to get published, they run a very high risk of getting forced out of their job, not getting their next job, or not getting into the field in the first place.

The journal editors would like to have interesting, exciting results in their journal. And maybe if they’re publishing stuff that’s too boring, they’ll lose their job. People who run academic departments, if they can’t hire staff who are regarded as prestigious and have publications in top journals, then maybe they’ll lose their directorship or just be regarded as lame.

So how do you change all of these incentives simultaneously? It’s very, very tricky.

Spencer Greenberg: Yeah, It’s very tricky. I was talking to an academic friend of mine, and he was saying that every academic has sort of their line in the sand of p-values.

Rob Wiblin: Yeah. Do you just want to remind everyone, in very broad terms, what p-values are?

Spencer Greenberg: Yeah. P-values are just constantly confused. Even people who think they know what they are stumble on them. Loosely speaking, the idea of a p-value is that the lower it is, the less likely your result is to be due to chance or luck. The higher it is, the more likely it is to be due to chance or luck. But that’s a very loose approximation.

More precisely, it’s the probability that we get a result at least this extreme if there was actually no effect. So it’s a conditional thing. Like, in a world where there’s no effect, how likely would you be to get a result that’s at least this extreme? That’s the p-value.

Spencer Greenberg: Because you need p<0.05 to publish in most journals. So some people, if they don’t get p<0.05, they’re like, “OK, it failed.” But other people are like, “Well, it’s p=0.1. There’s probably some way I can get it below 0.05.” Right? Other people are like, “P=0.2, that’s almost p=0.1, which is almost p=0.05.” So it’s like there’s this different willingness to kind of push the boundaries.

And then unfortunately, because of the way the human mind works, we’re incredibly good at rationalising. I think there are not that many researchers who are like, “I know my research is bullshit, but I’m publishing it anyway.” I think there’s a little bit of that. But I think by far the more common thing is, “I know my research is sound. I know that my results are true. Does it really matter if I make a p=0.07 get pushed below 0.05 so I can publish it? Because I know this is overall good work. And I know that this is overall right.”

Rob Wiblin: Yeah, yeah. There’s so much room to fiddle with things without obviously being malicious or obviously necessarily doing something wrong.

Spencer Greenberg: Exactly. “You know, I probably should have removed that outlier anyway. I just didn’t happen to think of it until I saw my p-value wasn’t less than 0.05.”

Rob Wiblin: Yeah, yeah. I guess you’re saying 60% of the time the papers replicate, so reasonably often people are right about that. It’s just that a nontrivial fraction of the time, they’re causing something to get published that isn’t true. And it generally undermines the whole credibility.

Spencer Greenberg: Yeah, 60% is the best average number I’ve been able to construct by looking at a bunch of different replication studies. It does depend a lot on the subfield, though. So for example, social priming — which is this idea that you can do something like have someone hold a warm mug and it causes them to behave more warmly, or you can prime people with words related to old people and then they’ll walk more slowly — that kind of research almost entirely has failed to replicate. Almost 100% of it has failed to replicate, which is pretty shocking.

Because think about that: hundreds of papers were published on this by many different research teams, and it may be that there is just no phenomenon there. How does that even happen? Someone should write a book about how do hundreds of papers get published on something that doesn’t exist? Like, what was happening, right? It’s crazy.

Rob Wiblin: There’s this other kind of priming, where if you show someone the word “doctor,” they’d be more likely to think of the word “nurse.”

Spencer Greenberg: Yeah. So that kind of priming, which is called “word priming” or “semantic priming,” tends to replicate much better. So I think that a lot of that actually is real.

Then you have other subfields. You have stuff where you’re flashing triangles on the screen and watching how people react to different shapes and stuff like that. I think that tends to have a pretty good replication rate. Some of the more social results maybe have a lower replication rate. So it’s not like all the subfields are equal.

Importance hacking [00:18:23]

Rob Wiblin: Yeah, yeah, totally. Are there any issues with science practice that are a big deal that you think listeners to the show might not have heard so much about, or might not appreciate how important they are?

Spencer Greenberg: Yeah, absolutely. One really big one — that I think actually might be on the same order of magnitude as p-hacking in terms of how important it is, but is really not well known and doesn’t really have a standardised name — is something we call “importance hacking.”

And to back up to understand importance hacking, I think it’s useful to think about all the ways to get published in a top journal. Because if every academic needs to get published in top journals to stay in the field and get tenure eventually, it’s interesting to break that down. What are the strategies they could use?

So strategy one: They could actually conduct valuable research. By making an interesting contribution or important discovery or just adding something to scientific knowledge, they could get published in top journals. Cool. And everyone wants to do that; of course that’s fairly desirable. The only problem with that is it’s very difficult. So what are their other options?

Second option: They could commit fraud. They basically can make up data, things like that. Now, I think very few people are willing to do that. It’s just so unethical. It crosses so many lines. It will get you thrown out of the field if you get caught. So I think very few people do that. I think probably only a few percent of results are like that. OK, what else can they do?

Third option: They can p-hack, where basically they get a result that’s a false positive, but they make it look like it’s not a false positive. This is a lot of stuff we’ve just been talking about. You can use fishy statistics. You can try things a lot of different ways, but only report the one that works, and so on.

Rob Wiblin: Or just test the same thing repeatedly until one of them happens to come up positive.

Spencer Greenberg: Exactly. Exactly. I think usually that’s not done like, “I’m going to test the same thing 10 times exactly,” but more like, “Well, that variation didn’t work. What if I tweak this parameter? Oh, now it works! Cool, I’ll publish it.” So it’s a little bit less cynical, but it’s very common. I think probably something like 40% of papers have this happening.

But there is a fourth method, and this is what we call importance hacking: You can get your result in a top journal by tricking the reviewers into thinking that it was a valuable or interesting finding when in fact it was essentially a valueless or completely uninteresting finding. And this only works if you can trick the peer reviewers, because it’s not like they want to publish everything. Peer reviewers can be brutal; a lot of peer reviewers reject stuff. So unless you’ve tricked them into thinking there’s value when there’s not, this method won’t work. So it has to be pretty subtle. It has to be the sort of thing where if you quickly read a paper, you might not notice that this is a valueless result — and actually, it sounds cool if you kind of read it quickly.

Rob Wiblin: Yeah, I think it might really help to have a concrete example to explain how this importance hacking would work.

Spencer Greenberg: Yeah, there are lots of examples. One thing that’s important to note is that there are different ways this is done. So we’ve come up with different categories of importance hacking — of how, essentially, do you make something that’s not interesting and not cool seem interesting?

But one example that comes to mind is this paper, “Testing theories of American politics: Elites, interest groups, and average citizens.” The basic idea of the paper was they were trying to see what actually predicts what ends up happening in society, what policies get passed. Is it the view of the elites? Is it the view of interest groups? Or is it the view of what average citizens want?

And they have a kind of shocking conclusion. Here are the coefficients that they report: Preference of average citizens, how much they matter, is 0.03. Preference of economic elites, 0.76. Oh, my gosh, that’s so much bigger, right? Alignment of interest groups, like what the interest groups think, 0.56. So almost as strong as the economic elites. So it’s kind of a shocking result. It’s like, “Oh my gosh, society is just determined by what economic elites and interest groups think, and not at all by average citizens,” right?

Rob Wiblin: I remember this paper super well, because it was covered like wall-to-wall in the media at some point. And I remember, you know, it was all over Reddit and Hacker News. It was a bit of a sensation.

Spencer Greenberg: Yeah. So this often happens to me when I’m reading papers. I’m like, “Oh, wow, that’s fascinating.” And then I come to like a table in Appendix 7 or whatever, and I’m like, “What the hell?”

And so in this case, the particular line that really throws me for a loop is the R² number. The R² measures the percentage of variance that’s explained by the model. So this is a model where they’re trying to predict what policies get passed using the preferences of average citizens, economic elites, and interest groups. Take it all together into one model. Drum roll: what’s the R²? 0.07. They’re able to explain 7% of the variance of what happens using this information.

Rob Wiblin: OK, so they were trying to explain what policies got passed and they had opinion polls for elites, for interest groups, and for ordinary people. And they could only explain 7% of the variation in what policies got up? Which is negligible.

Spencer Greenberg: So my takeaway is that they failed to explain why policies get passed. That’s the result. We have no idea why policies are getting passed.

Rob Wiblin: Yeah. It doesn’t matter what people think; it doesn’t matter what politicians think. I mean, that’s almost a more interesting result. Why didn’t they go with that?

Spencer Greenberg: Yeah, I think that’s super interesting. And OK, yes, fair: they felt if you had to pick among the things that they did measure, they did find economic elites and interest groups mattered more. But really, they seemed somehow to be missing whatever does matter in their models.

Rob Wiblin: I suppose a cynic might say that there was some politics behind the line that they were choosing. By highlighting the message that what ordinary people think doesn’t matter, but what elites think does matter, they managed to get an enormous amount of coverage from people who are receptive to that message. But, you know, of course I’m not a cynic, Spencer. I wouldn’t make that argument.

Spencer Greenberg: Oh, of course not. You know, another thing they did, and this is a kind of minor thing, but I feel like they juiced the conclusion a little bit by calling them “economic elites.” Because what do you think an economic elite is? What does that say to you?

Rob Wiblin: Well, I guess a rich person, right?

Spencer Greenberg: Right. So what they used for “economic elite” was 90th percentile. Which is like, yeah, OK, that’s a bit wealthy, but economic elite? I don’t know. I think of like billionaires and stuff like that.

Rob Wiblin: Right. Not 10% of the population.

Spencer Greenberg: Yeah, exactly. 10% of the population. OK, yeah.

Rob Wiblin: Yeah, any single person working full time in a professional graduate role, at least in the middle of their career, is now an economic elite, I suppose.

Spencer Greenberg: Yeah. So anyway, I don’t mean to pick on them in particular, but we see a lot of things like this, where when you really read carefully, you start realising that the result is much less interesting than it seems at first.

And some people have asked, “How on Earth do you notice this if peer reviewers don’t?” And part of this is because we’ve actually been doing replications. When you do a replication, you’re essentially rebuilding a thing from scratch. Like imagine you’re trying to figure out how a clock works, and you just kind of examine it from the outside — versus you have to literally smash the clock to pieces and then rebuild it bit by bit. You’re going to notice a lot more about how that clock is put together. And that’s what you have to do in replications. So we just started noticing, like, “What the heck? A lot of these results are kind of not what they seem like on the tin.”

Rob Wiblin: Yeah. Do you have any other neat examples, maybe demonstrating some other class of importance hacking?

Spencer Greenberg: Another class is where sometimes you get these results that are almost circular when you look at them carefully. So an example of this would be like, “Isn’t it interesting that the Big Five predicts who’s going to have a mental health crisis?”

Rob Wiblin: That’s the Big Five personality test, which is kind of a standard personality tester that has academic legitimacy among psychologists.

Spencer Greenberg: Exactly. And so you can imagine interesting findings like, “Personality predicts who’s going to have a mental health problem,” and all these things. But then you look at like the neuroticism subscale of the Big Five, which just has a bunch of questions about whether you have mental health problems. So it’s like, “Hmm, I don’t know.”

So there’s a lot of that. You can dress things up in a way where you give something a fancy word or phrase that sounds scientific-y. But then if you actually go look at the materials and you’re like, “What exactly do you ask to measure that?” then you’re like, “Wait a minute. Of course that’s true. That’s obviously true, based on the questions you asked.”

Rob Wiblin: Yeah. Like conscientiousness predicts whether you’ll stick to tasks, where “conscientiousness” is defined as your answer to the question of whether you stick with tasks.

Spencer Greenberg: Yeah. Some of the conscientiousness questions are about sticking to things.

Or, you know, I don’t have an opinion on this, but some people have critiqued the idea of grit, which has gotten a lot of airplay. They’re saying, “But is grit really different than conscientiousness?” Because the thing is, we’ve known for a long time that conscientiousness predicts a lot of things above and beyond IQ. That’s like a well-known finding for many years. We know that grit is highly correlated to conscientiousness. Has enough work been done to prove that it’s not just another measure of conscientiousness? Because if so, then you have all these new papers that are basically showing something that was already known.

Now, in the defence of grit, the questions are a little different than the standard conscientiousness, but they do have significant overlap. So I think that’s interesting to explore: How much does grit actually differ? And if it doesn’t differ — if it’s just conscientiousness — then I wouldn’t say it’s important scientifically. If it does differ, then it’s really a novel and interesting finding.

Rob Wiblin: Yeah. I’ve seen this debate go back and forth. It seems like substantively, really, for everyday life, I feel like grit may be somewhat different than conscientiousness. But it’s very much in the similar spirit. And so if we knew that conscientiousness mattered, we would have pretty good reason to suspect that grit would matter as well. So yeah, I don’t know. The idea that it was a really interesting research finding that deserved massive promotion always struck me as a little bit odd.

Spencer Greenberg: I think it is possible that what grit is doing is giving you a more narrow subset of conscientiousness that might be even more predictive than the broad trait. And that would be cool if that were true. Like, we could refine conscientiousness to this more specific thing, and that works better than just conscientiousness broadly.

Rob Wiblin: Yeah. Yeah. Is there another one you want to highlight?

Spencer Greenberg: Sometimes what happens is that the thing that was actually shown is much more specific than the claim being made, and in a way that’s not very flattering.

So I’ll give you one example. We were looking at this paper, and what they showed is that people who have a history of trauma explore less in a kind of explore/exploit tradeoff — where you can either explore new things or exploit here, meaning just do the thing that you already know works well. I think that’s kind of cool. I think it’s kind of interesting. Maybe not totally mind-blowing, but it seems pretty cool. But if you actually unpack what they actually showed… Can you guess what they actually showed?

Rob Wiblin: I don’t know. Something about how long they continued trying to solve a maze?

Spencer Greenberg: Yeah, it’s a good guess. It was this incredibly boring apple-picking game: you had an apple tree in front of you, and you either could pick another apple or go to the next tree. Those were your only options. And they found that people with histories of trauma were more likely to stay on the same tree. It’s like, “What? Does that actually prove anything about real-world behaviour?” It just seems way too specific.

Rob Wiblin: Right, right. Yeah, this is another one that I’ve heard many times, which I think was used among economists, and was trying to do interventions and see whether they make people more or less violent. And I mean, obviously, you can’t have people just, you know, just beating one another up in the psychology lab. So I think the method that they used to measure violentness was seeing how much painful hot sauce someone would put on someone else’s food when they were preparing it for them.

And you can see how it’s kind of related, I suppose, if they suspect that the person is going to really dislike having this chili sauce. But you do have to wonder about the external generalisability to everyday life from this attempt to measure how violent someone is.

Spencer Greenberg: Right. And if there were great studies showing that administering hot sauce in these kinds of experiments is linked to violent behaviour in real life, if you could show that link, that’d be really cool. Then you could say we can use this much easier-to-measure proxy. But without that link being established, it’s like, “Well, did we really show what we wanted?”

And I think the fundamental thing about importance hacking is this gap between the way you’re describing your results and then what you actually showed. Whereas if you really explained it super clearly, suddenly people would be like, “Wait a minute, what? That’s what you showed? That’s not that interesting.”

Rob Wiblin: OK, so I suppose importance hacking differs from p-hacking in that it’s a bit less technical, maybe you couldn’t run a replication and clearly just show that, mathematically, the result doesn’t come back again the same way?

Spencer Greenberg: Importance hacking things generally will replicate. That’s really the fundamental difference: if you did the same study, you’d get the same result. You just wouldn’t care.

Rob Wiblin: You still wouldn’t care the second time. Yeah. But I suppose on the other hand, it’s somewhat easier, because you can do a test for importance hacking by really carefully reading a paper. You don’t have to go and do the experiment again; you can just use your ordinary human intelligence to figure out whether you’re being hoodwinked.

Spencer Greenberg: Exactly. And I think if you kind of had a list of importance-hacking techniques in front of you, and you read a paper really carefully with an eye towards these techniques, you would often be able to detect it, especially if the materials are available.

And this is a really key point: It can be really hard to detect importance hacking if they didn’t release the original materials. If they ran a study and you don’t get to see what the participants saw, they could have importance hacked in a way that’s just not detectable in the paper.

Rob Wiblin: How big a problem do you think this is compared to the classics, like publication bias and p-hacking? I guess people also call it the “garden of forking paths.”

Spencer Greenberg: So if you compare it to all the things that lead to false positives, I think it’s probably on the same order of magnitude. I don’t know if it’s as big — maybe it’s a little smaller than all the reasons we get false positives as a problem — but I think it’s a very substantial problem.

And the weird thing is, as far as I can tell, there was no standardised name for it. That’s why we call it importance hacking; we had to give it a name. It’s not like we invented the concept — people have been pointing at it in different ways — but there was no sort of, “Hey, there’s this thing, and we’re going to give it a name, and we’re going to talk about how to solve it.” There just hasn’t been a real movement around that, from what I can tell.

Rob Wiblin: Yeah. Your introduction to this topic made me feel sorry for social scientists. Because it occurs to me that we actually already know a tonne about people — it’s just that most of these things are so obvious that you could never publish them. Not only couldn’t you publish them now, but you probably never could. You’d say, like, “People really enjoy falling in love, and they’re more likely to fall in love if someone’s beautiful. And if they don’t eat for very long, then they get hungry. And if you punch them, then they get annoyed.”

It’s like all of these like incredibly important things that people really need to know about other human beings — and these are like the main important effects that drive people’s behaviour — but it’s so obvious and so evident that you could never publish a paper on these things. So poor psychologists are left trying to find scraps of things that aren’t obvious like this, things that people don’t know. But of course, most things that aren’t obvious are not obvious because either they’re not real, or the effects are kind of small and not so important.

Spencer Greenberg: Yeah. You know, physicists have the advantage that the average person knows shit about physics. And a lot of physics is counterintuitive — what turns out to be true is not what you intuitively think. Where social science, mostly what’s true is exactly what you intuitively think.

Funnily enough, when I had Daniel Kahneman on my podcast, I asked him, “How are you so generative in your career? It’s incredible how many different things you came up with.” And he was like, “Oh, no, mainly I just took things that everyone already knew, and I just clarified them.”

I definitely don’t want to denigrate that. I mean, I think he tends to undersell himself, so I think he’s probably underselling himself there. But I think there’s something to that. A bunch of his findings, it’s not that they were completely unknown before he found them, but he really clarified them. And he really showed that you could cause them to occur in different situations. And then he popularised the idea and gave us a terminology. So that was really, really valuable, even though maybe people had an inkling of some of the things.

So I don’t want to say that just because we kind of know about something intuitively, there’s nothing to be done there. But I still think you’re on the right track. Because humans are so good at modelling other humans — like one of our fundamental evolutionary drives is to understand other people — it does make social scientists’ jobs a lot harder.

Rob Wiblin: Yeah. Yeah. OK, so I agree with you. From reading papers, I’ve seen lots of things like this. Importance hacking seems like a really big deal. Is there any systematic way that we can help to reduce the temptation or capacity of researchers to do this?

Spencer Greenberg: Well, actually, this ties into our work with this new project we launched called Transparent Replications. If you’re interested, you can check it out. We’re aiming to replicate new papers in top journals, and try to do it quickly — right after they come out.

As we started thinking about this, we came up with two original ratings that we’re going to rate these papers on. We’re going to rate them on transparency: essentially, do they make their work transparent by giving us data and code and everything we need to redo the study? And then replicability: does it replicate?

But the problem is this doesn’t address importance hacking. So we started thinking about what we could possibly do to address importance hacking, and we came up with a third rating, which we call clarity.

The basic idea of clarity is we take the claims in the paper — especially the ones in the abstract, or the ones that are really strongly made (or reputedly made) in the paper — and we put them side by side with what they actually proved. Like, just concretely, what did they actually show with their statistics? And then we say, how big is the gap between them? And if there’s a large gap, they get a bad clarity rating. Whereas if the gap is small — what they’re claiming they showed and what they actually showed is really close together — then we give them a good clarity rating.

So this is my best approach so far of thinking about how do you incentivise people not to importance hack? And how do you kind of penalise them if they’re doing it?

Rob Wiblin: Yeah, yeah. I imagine that a lot of this stuff gets through because reviewers are sent so many papers, they’ve got their own students, they’ve got their own work to do — and people aren’t even paid usually, or barely compensated, for reviewing papers. So it’s unsurprising that often they don’t have time to go through things with a fine-toothed comb, or deeply think about whether the paper’s claims add up. It’s a way in which the whole process seems a little bit broken, and it’s not surprising that peer review isn’t achieving the goals that we like to imagine that it is.

Spencer Greenberg: Right. And you can think of it as an adversarial game, right? Like people are trying to get it past the reviewer. If they fail to get it past the reviewer, then it’s not going to get published. So there’s that weird selection pressure to try to find ways to get things through reviewers. And I don’t mean this to be cynical. It’s not like they’re like, “I’m going to trick the reviewers, ha ha.” It’s more just like, “I’m going to make my results sound really cool. And it probably is really cool.” It’s kind of like evolutionary pressure.

I think a few things with reviewers. One is, as you mentioned, they’re generally not paid. Two, they’re generally not spending a tonne of time reviewing. I don’t think that reviewers tend to be very biassed in favour of approving; I’ve seen reviewers be incredibly harsh, and people often complain about reviewers being unreasonably harsh or asking for unreasonable demands and unreasonable changes to a paper. So I don’t think they’re really biassed to being overly optimistic. But I do think that they’re trying to do the job quickly because they’re not getting paid for it, and it’s really just an extra thing on top of what they’re doing. And they’re doing it just to kind of contribute to science, or they feel like they’re supposed to. There’s just not that much motivation.

The other thing is, I don’t think it’s a norm for them to look at the materials, like to look at the exact study as a participant saw it. I wish that that was a norm. I wish that that was just standard, and if you didn’t send the materials, the reviewers are like, “What the heck? Where are your materials?” But right now, a lot of people don’t even publish the materials. And even if they do publish them, I’m not sure that reviewers often actually look at them. And I think importance hacking is a lot easier if you’re not looking at the materials.

Rob Wiblin: By “materials,” you mean kind of photos of where the participants were, and what they saw, and the pieces of paper that were given, and things like that?

Spencer Greenberg: It depends on the study design. But it’s like, I want to know what exactly happened in the study. Like, who saw what precisely? And how were things worded? I mean, there’s a bunch of importance hacking that can occur in just subtle wording and things. You can create a positive effect that doesn’t mean what you thought, because it was worded in a weird way. Just as an example.

Rob Wiblin: Does anyone in this broader research ecosystem have some selfish incentive to stand up to importance hacking?

Spencer Greenberg: It’s tough.

Rob Wiblin: Actually, I know who does. Maybe bloggers, cantankerous bloggers, who read papers and get annoyed and then gain social status by writing a takedown of some paper: “They said they used hot sauce! Can you bloody believe it?!”

Spencer Greenberg: Yeah, so we just need a squadron of bloggers to enforce good norms. But I think you’re right; I think bloggers do have an incentive to do it.

But in the system, it’s not really clear. Because one thing is, imagine someone publishing a paper being like, “That other paper sucks because it was just importance hacked.” That’s not an interesting scientific addition. I mean, yes, maybe it’s helpful and healthy for the ecosystem potentially, but it’s not going to get you acclaimed. So yeah, I don’t think there’s great incentives here.

P-hacking, on the other hand, at least some people have invented cool technology or methods around p-hacking or studied it, so it has led to some very interesting, publishable papers. So maybe that actually makes it easier to work on p-hacking.

Rob Wiblin: Are there any techniques for tackling p-hacking that I might not have heard about?

Spencer Greenberg: Well, one that I just think is really cool is this technique developed by Simonsohn, Nelson, and Simmons called a p-curve analysis.

The basic idea is you take a bunch of p-values that a researcher has published, either from one paper or from a bunch of their papers. And think about it as you’re making a histogram of all the p-values they found that were less than 0.05. So you’re kind of looking at the distribution of how often did they get different values: How often did they get ones between 0.04 and 0.05? How often did they get ones between 0.03 and 0.04?

Then we can think about what we should expect to see. So in a world where all of their results are false, but they don’t do any sketchy methods, like it just happens that everything they study doesn’t exist, we should expect a uniform distribution: every p-value is equally likely. And that’s just sort of the nature, based on the definition of a p-value: if there’s no effects, you expect it to be uniformly distributed, flat. And for p-curves, they just look at the values less than 0.05. So in that case, the histogram we’re talking about here would just be uniformly distributed between 0 and 0.05.

What if they lived in a world where they were doing everything cleanly — they weren’t using p-hacking — but some of their effects were real? How would that change the distribution? Well, real effects are going to tend to have low p-values — lower than chance — so you’re going to get a bump on the left side of the histogram. At around 0.01 or 0.02, you’re going to see a bulge: more results than that flat, uniform distribution. So if you have a shape like that, that indicates that they are finding real effects that are not p-hacked.

So then what if they’re in a situation where they’re just p-hacking the heck out of the results, and mostly they’re false positives that they’re publishing? Well, if you think about what p-hacking is doing, it’s getting some results, many of which are greater than p=0.05, so you can’t really publish them in most journals. And then you’re either doing fishy things to get the p-value down, or you’re just throwing away the ones that happen to be just above 0.05 and you’re keeping the ones that happen to fall below it. And so what you get is a bulge of too many results on the right of the distribution, right around 0.05.

So that means if you have a bulge on the left, you’re probably finding a bunch of real effects. If you have a bulge on the right, you’re probably finding a bunch of false effects.

Rob Wiblin: Wow. OK, interesting. So people, I guess they’ll find a particular researcher, or find some particular research topic, or a whole bunch of papers on some general theme, and then grab all the p-values and see what distribution they have: whether the bulge is towards 0 or the bulge is towards 0.05. And then they can say, does this literature as a whole have this problem, or does it not?

Spencer Greenberg: Exactly. You could do it on all the primary results of one researcher. You could do it on major results from a whole field. Potentially, if there’s one paper that had like seven studies in it, and they had a bunch of p-values per study, you could even try to do that.

But there are some caveats. I mean, this is far from a perfect technique, but I think it’s a really innovative idea.

Rob Wiblin: Wow. Yeah. Do you know what they’ve learned from this? It seems like it would be, you know, not that high effort to do this on lots of different subsets of papers and results and so on.

Spencer Greenberg: Well, I think the disturbing result is that a bunch of stuff is really p-hacked.

Rob Wiblin: OK, yeah. That’s the bottom line. I suppose if there were people doing this on the regular, then if you apply this to a specific researcher over the course of their entire career, and then they know that they can’t end up with this shape that has a bulge towards the 0.05 side, then that would be really chastening for them, because it limits what they can do. Even though you can’t tell which specific papers have real results and which ones don’t, it places far more constraints on what they can publish within their entire body of work.

Spencer Greenberg: Yeah, I think if there was some kind of really strong norm where everyone had these curves published and everyone checked each other’s curves regularly, it could create an incentive like that. Although it’s a very long-term incentive, and I think what you also need — or maybe what you primarily need — is a short-term incentive of, for each additional paper, you need something pushing people to do good work.

Rob Wiblin: Yeah. Yeah, that makes sense.

How often papers replicate with different p-values [00:43:31]

Rob Wiblin: Speaking of p-values, I know that you did a bit of research to see whether the p-value that a paper had was predictive of how likely it was to replicate, if someone tried to do the same experiment again and see whether they got the same finding. Did you find that if the p-value was better — that is, closer to zero — that the paper was more likely to find the same result if someone repeated it?

Spencer Greenberg: Yeah, we looked at 325 studies that were replications of existing studies. And what’s cool about that is with the replication result, they know exactly what they’re testing — because they’re testing precisely what was in the original paper. So it’s very unlikely that those results are p-hacked, and so the replication is going to be a lot more reliable.

Then we can ask the question: Of those 325 different replications, when the original paper had a high p-value, how often did the replication succeed? Whereas if it had a low p-value, how often did the replication succeed?

So I tried to look at, what is the single best dichotomy? Like, what p-value sort of divides the ones that replicated from the ones that didn’t? And looking at this, I found that when the original study p-value was at most 0.01 — so it was 0.01 or less — about 72% replicated. So that’s not bad. On the other hand, when the p-value is greater than 0.01, only about 48% replicated. So it’s a pretty big difference, and at least from that result, empirically, it looks like a smaller p-value really does make a more reliable result.

Rob Wiblin: That seems to imply that we could potentially improve the reliability of published research just by requiring a lower p-value, by kind of shifting the tradition from being it OK to publish below 0.05 to make it that it’s got to be below 0.01. Do you think that’s a good idea?

Spencer Greenberg: Yeah, it’s interesting. I think there’s tradeoffs. In some ways I like the idea, and in some ways I don’t.

First of all, if you actually shifted it, then there’s a concern that this result that I just quoted would stop being true, because then people would game the 0.01, right? We’re in a world where people try to game 0.05, so of course being right around 0.05 is a little bit suspect, right? I think that things would get a little bit worse than they seem here.

The other thing is that if you drop the requirement from 0.05 to 0.01, then people will have to use more money to get a bigger sample. And that’s not necessarily a bad thing. There are definitely advantages to having a larger sample, but it means every study would be more expensive. And then where is that cost coming from? Do people just do less research? Well, maybe that’s worth it because maybe the research is more reliable. But it’s just worth keeping in mind there are tradeoffs.

I just ran some numbers here. You could ask the question of how much bigger does a sample need to be. Unfortunately it depends on a lot of factors, but I’ll just give you an example. Let’s say you’re looking at a correlational study — so you’re just trying to test if a correlation is non-zero. If the real correlation was about 0.2, then with 100 people in your study, you can achieve p=0.05. But if you have to achieve p=0.01, you have to go from 100 people in your study to 170. Similarly, if the real effect was about 0.4 — so instead of being 0.2, it was 0.4 correlation — then with about 25 people, you could get p=0.05. But you need about 40 people to get p=0.01. So you need to go from 25 to 40 people.

So in those two examples, it’s about a 65% increase in the number of study participants to get the same effect at 0.01 instead of 0.05.

Rob Wiblin: OK, so it would increase the cost of running these experiments quite a bit. And then you would hope you’d get somewhat more reliable findings. It seems like this would be a huge tax that you’d be paying in order to offset the dubious research practices that people have a strong incentive to engage in. It feels a little bit crazy that, as a society or as a group of researchers, everyone would accept that they have a reason to cheat, and so we have to set this extra-high threshold that’s above what we think we otherwise would have to. And increasing the cost of research by 60% or 70%, it’s the best that we can do because we can’t fix the underlying incentive problem.

Spencer Greenberg: It’s kind of funny because people are like, “Oh, p=0.05, that’s so easy to cross that threshold.” But imagine you’re a perfect researcher. You’re really not doing anything sketchy. You’re doing everything completely by the book. You’re incredibly good at your job. Well, p=0.05 means that in cases where you’re setting a false effect — so there’s actually no effect there at all — only 5% of the time do you find an effect. That’s not that often, actually. Like that’s already, intuitively, when you think about it, that feels pretty stringent. And yet it’s clearly gamed a lot, right?

Rob Wiblin: Yeah. Interesting. I suppose we could try to do a bit of a combination of changing the incentives and raising the bar a bit.

The Transparent Replications project [00:48:17]

Rob Wiblin: Speaking of changing the incentives, you mentioned you’re trying to run this new project called Transparent Replications in order to make it a little bit harder for people to get away with publishing unreplicable research in prestigious journals. Do you want to explain what you’re doing?

Spencer Greenberg: Yeah. The idea of the Transparent Replications project is that we’re doing replications of papers coming out in top journals to try to shift incentives. And this differs quite a bit from past replication projects, because, while I love the replication projects that have happened, and they’ve taught us a lot, the problem is they don’t change forward-looking incentives very much. They generally will go back a few years and be like, “Let’s take papers from these years and let’s go replicate them.” Cool. But why does that mean that someone now doing the research is going to behave differently?

So our idea is: what would cause people to behave differently? Well, let’s say you thought there was a much-increased probability that your work will be replicated — and not only that, it will be replicated very shortly after it’s published, so that by the time people hear about your paper at all, they also know whether you replicated it or not. Like, “Oh, shit, I better do my work differently to make sure it doesn’t fail to replicate. I don’t want to lose the prestige of my publication.” Right?

Additionally, if you do replicate successfully, you could be broadcasted — where we push your paper out, being like, “Look, this is a really great example of a paper that replicated and it got really good results.” So we can create a positive incentive too.

So the idea, when we were thinking about this, was like, “How do we create the biggest bang for the buck in terms of shifting incentives?” And we started thinking, well, Nature and Science are the two most prestigious general science journals. They cover lots of types of science; they don’t actually publish all that much psychology and behavioural science.

So we started investigating whether we could we just replicate every single new psychology and behavioural science paper coming out in Nature and Science. Is that possible? And we realised that maybe we can’t do every single one, but we could probably do the majority. We probably could do more than 50%. The ones we’re not doing are basically ones that are really hard to do — they’re just really expensive or complicated studies to do.

So within a threshold of difficulty/cost, we’re going to be able to do almost all of them. And the hope is that when people are submitting to the journals, even if they don’t know if they’ll be published — as they generally don’t — that they’re going to have to think, “Wait a minute, if I do get published, I’m going to get replicated. Oh, man, I better make sure my result really holds up.”

Rob Wiblin: Yeah, I guess it means that people who suspect that their results are faulty will not submit to these two journals. So it actually would increase the prestige of the papers that are in these journals within social science or psychology.

Spencer Greenberg: Yeah, the hope is if our project succeeds in a really big way, we would eventually cover more and more journals. In fact, what we’re doing, insofar as we have extra resources, we’re going to be randomly sampling from three other really highly prestigious journals. The hope is if the project succeeds, we can expand the scope and cover more and more and more over time. But we think the biggest bang for the buck to start with is Nature and Science, because they both have this really high prestige and they don’t publish that much, so it’s actually plausible to do a very large proportion of it.

Rob Wiblin: How is this affordable? Even assuming that there’s not that many papers, keeping up with this, wouldn’t it be very expensive? Where do you find the staff and the capacity to take this on?

Spencer Greenberg: Well, a few things on that. First of all, we’re really good at running studies. We’ve been, for years, building tech to run studies — like our platform Guided Track and our platform for recruiting, Positly. So this is really our bread and butter.

The second thing is we kind of tried this out, because I didn’t know if this was going to be possible. So on our first three replications, we tried to keep careful track of how long it actually takes us. And this might change, because maybe we got a bad sample or something, but so far — on the real replications we’ve done on real papers — it’s been about 40 hours of time per paper.

Rob Wiblin: Wow. That’s unbelievable. It probably takes a whole team like months, I would think, to do this work. I’m impressed.

Spencer Greenberg: Yeah. That’s what we’re showing so far, so I’m hoping that we’ll be able to keep that up. Our goal is to publish over 20 replications in the next 12 months. So we’ll see if we can hit that.

Rob Wiblin: Yeah. I would have thought that many of these studies would involve a whole lot of in-person work. If I recall, Positly is about recruiting online participants to do studies. Do you have a place in-person as well, where people come in to do stuff?

Spencer Greenberg: Great question. A shockingly large percentage of research these days is actually done online. It’s just been growing every year; it just keeps growing and growing and growing. So a lot of these studies actually already are online. For now, we’re not going to do lab studies. I would love to do them eventually. Obviously, they’re more complicated; we’d have to make a deal with a lab and so on. So that would be a long-term ambition. But right now, we’re just focusing on online. But actually, it covers a lot. It covers a really large amount.

Rob Wiblin: So obviously, this is your bread and butter, and you’ve built tools to speed it up a lot. But presumably, it’s taking much less time to replicate these papers than originally went into them. I suppose the original researchers have to conceive of the idea, they have to test out whether the method makes any sense, and figure out how to make it work. And then maybe just copying exactly what they did is just, on its face, much more straightforward.

Spencer Greenberg: That is absolutely correct. It’s much easier to replicate something in terms of time than it is to build it from scratch. Because we know exactly what we’re doing. We’re going into it like, “This is precisely what we need to do. Execute, execute, execute.” They don’t know; they’re developing the idea as they do it. So there’s just a lot of false starts and these kinds of things. Also, writing the academic paper takes forever; it’s a really laborious process. We do write reports, and they’re pretty long and detailed, but still, it’s just a lot simpler than writing an academic paper.

Rob Wiblin: Yeah. You don’t have to get it past reviewers too. Do you mind sharing who’s paying for it?

Spencer Greenberg: So we have a couple of different grants. We got one from Slate Star Codex, actually, which was pretty exciting. And that was really great; it allowed us to fund our initial replications and get the thing rolling. And actually, we just got another grant from a private individual. I haven’t asked them if I could share their name, so I’m not going to do it, just to err on the side of safety. But they were just like, “Yeah, I like what you’re doing. This sounds awesome. Here’s some money.” So yeah, it’s pretty cool.

Rob Wiblin: Wonderful. If people want to check out the replications that you’ve done so far, where can they find them?

Spencer Greenberg: Go to replications.clearerthinking.org.

Rob Wiblin: Of the ones that you’ve done so far, did they pan out?

Spencer Greenberg: One of them failed to replicate and two of them did replicate successfully. So that’s cool. It’s kind of in line with expectations. Interestingly enough, though, the ones that replicated didn’t do awesome on the clarity ratings — which, remember, is sort of our importance-hacking measure. So I recommend, if you’re interested in importance hacking, you might want to look at what’s actually going on there. It’s kind of fascinating, the divergence in some cases between what the paper is saying and what the statistical tests showed.

Rob Wiblin: Yeah, I suppose you might expect these things to be negatively correlated, because if you’re trying to get published, then you can either juice the experiment itself — so make the experiment a legitimate measure of the thing you care about, but then juice the results — or alternatively, you can find a result that is real, but then encourage people to misinterpret what that might mean, and make it seem interesting that way.

Spencer Greenberg: Right. And the ideal is obviously just do really interesting research, or add something to the scientific knowledge, but that’s like the hardest thing of all.

Rob Wiblin: Yeah, totally. I couldn’t do it.

How do we predict high levels of success? [00:55:26]

Rob Wiblin: OK, let’s talk now about a blog post you wrote called “How do we predict high levels of success?” What was the main point you wanted to make there?

Spencer Greenberg: I think a lot of people are interested in the question of how do you be successful? So I started thinking about, what are the different theories of what makes for success? And I started kind of enumerating them — starting at really, really simple theories, going on to more complex theories. Then once I’d enumerated all of them, I’d say, “What do I believe is true about this?” And I made an even more complex theory, as is my tendency to do. So maybe I’ll give you just a taste of some of the simple theories, and then I’ll jump into my full theory of success.

Rob Wiblin: Sure, list a couple of them.

Spencer Greenberg: All right, so the simplest theory is just that it’s all luck, right? Everything is luck. And maybe in a certain sense, this is true, because you didn’t choose your genetics, you didn’t choose your parents, or you didn’t choose your early childhood experiences. You didn’t even choose to be a conscientious person versus a disorganised person or whatever, right? So maybe on some deep level. But the problem with that theory is it’s completely unuseful. So let’s move on from that.

Then we have theories like the Malcolm Gladwell theory, where it’s like, it’s some combination of like extreme luck plus extreme practice. So it’d be like 10,000 hours of practice, then that’s how you get to success, right? Maybe that’s slightly more useful, but also clearly not like a very predictive theory.

Then you have some more advanced theories, like Martin Seligman has a theory of success, which basically says it’s skill times effort, times self-promotion, times luck.

Rob Wiblin: Self-promotion, I love it.

Spencer Greenberg: I might be generalising a little bit from things he said. I don’t think he put it quite that way, but that’s how I interpreted his theory. And you know, that kind of makes sense. I think he’s on the right track, in that I think we really should be thinking about success as a product of factors more than the sum of factors.

And there’s a couple of reasons for that. One is that products, multiplying things together, have an interesting property that if you have zero in any of the variables, you get zero output. I think that that actually holds for success. Imagine, for example, skill: if you have literally zero skill, you have no ability to do anything, clearly you can have no success, right? Zero times anything is zero, right? So it kind of makes sense. Whereas if it was additive, then you could have zero skill and still be really successful, because the other factors would just add onto it.

Another piece of evidence that success is more of a product of factors than a sum is due to a really interesting mathematical truth: if you’re summing together a bunch of independent variables, in most cases that will actually produce a normal distribution in the outcome. That’s called a central limit theorem. On the other hand, there’s an alternative theorem for when you’re multiplying factors, where instead of getting a normal distribution, you get what’s called a log-normal distribution, which is a much fatter tail. So in other words, you have many more positive outliers and you don’t have as many results all compactified near the middle.

And if you look at success measured by lots of different things — whether it’s by money, or number of publications, or scientific breakthroughs, or whatever — it’s almost always a fat-tailed distribution, right? For almost all of these things. So that also suggests that this product idea might be the right one, not the sum idea.

Rob Wiblin: OK, so you think the smart way to model success, or the more sophisticated way, is to think of a whole lot of different important factors that go into success and then you multiply them together. So if you think you want to multiply together a whole bunch of factors, what factors would you highlight?

Spencer Greenberg: I would want to multiply many of them. But I’ll just give you a taste of some of them. And the idea here is to try to choose factors where if you have literally zero, if you have absolutely none of the thing, you get a zero in output, or very close to a zero in output. So you kind of want to choose your factors in that way carefully.

One of them I would point to is resources/opportunities. If you literally have nothing — you know, you’re standing in Antarctica with no clothes on and no food — you’re just done for, right? So if you get zero on that, you’re at zero.

Another one is intelligence. If you literally have zero intelligence — you’re not able to in any way manipulate the environment to achieve your goals — you get a zero on everything, right?

Rob Wiblin: I guess there’s health as well, and like the ability to stick to tasks.

Spencer Greenberg: Yeah, exactly. I would put all those in the model. So I ended up having, you can look at my article on this, “How do we predict high levels of success?,” to see my full model.

But then I want to add one more nuance here. After we multiply these together, I want to raise each of the factors to an exponent. So in other words, it’s like Factor 1 raised to a certain exponent, times Factor 2 raised to another exponent, times Factor 3 raised to another exponent.

Rob Wiblin: So this is putting it to the power of 0.5 or to the power of 0.2?

Spencer Greenberg: Exactly, exactly. And the reason I’m doing that is because that exponent depends on what you’re trying to be successful at. So having zero intelligence will make you fail at everything, presumably, except maybe like stopping a door open or something like that. But the importance of intelligence is going to vary a lot depending on what you’re trying to do, right? Some things don’t require that much; some things require a lot. And same for any of these factors.

So for a fixed thing you’re trying to be successful at, you want to think about what the exponent is on that factor — and some factors are going to drive that type of success more. You know, if you’re trying to be an artist and be really successful in art, you’re probably going to need different skills than if you’re trying to be a salesperson or you’re trying to start a new company and so on.

Rob Wiblin: OK, so you’ll kind of have the same list of important factors in the model, but depending on the task, you’ll change their relative weighting by changing what number they are to the power of.

Spencer Greenberg: Exactly.

Rob Wiblin: And I guess if you put things to the power of a number that’s less than one, then you get declining returns, right? So maybe it’s important to be somewhat conscientious, but being extremely conscientious isn’t completely necessary. So you could build that into the model.

Spencer Greenberg: Exactly, exactly. And as that exponent gets smaller and smaller, approaching zero, that factor actually becomes irrelevant — because anything to the zeroth power is one, so the factor no longer matters, right? So to the really extreme, this factor doesn’t matter. But in reality, the exponent will pretty much always be above zero.

Rob Wiblin: So if this is your model, and you’re multiplying all kinds of factors together — and of course, any one of them being zero is a fatal outcome — then would this suggest that it’s really important to be an all-rounder in general?

Spencer Greenberg: It’s a really great question because it seems to be implied by the model, but it’s actually not, and I’ll explain why.

So for any one of these factors, it is true that if you’re too low on it, you’re pretty much screwed — because zero times anything is zero. But in real life, often there’s a way to work around a factor. Let’s say you’re a CEO at a company, but you really, really, really suck at something — like maybe you’re really bad at convincing anyone of anything. Well, at first glance, it seems you’re totally screwed because that’s part of running a company.

But what if you’ve got a cofounder who’s just incredibly persuasive? Now, suddenly, maybe you can be the CEO who focuses on product. Like Mark Zuckerberg, for example, this is what he did. He basically is just a product-focused CEO, and then he has other people who are focused on the rest of the company. So there’s a way to work around any given factor you’re really weak in.

The second thing is that, while it is true you don’t want to be too close to zero on any factor, there are also ways to achieve high values of success by leveraging your strengths. So basically, one way to model this is: for any factor you’re low on, you either are gonna have to bump it up to a reasonable level, so work on that weakness, or find a way to work around it — for example, having a cofounder or team member who can do it for you, or picking a goal that just doesn’t involve it very much. So you have to do one of those things.

But at the same time, you want to look for what I call “unbounded” factors — factors where you can be like 10 or 20 times better than someone else. And usually, people don’t have a lot of factors like that that they’re really good at. So you want to think about, “What are my really extreme strengths?” and really lean into that. So choose a goal that really leverages the things you’re amazing at, and then try to get even more amazing at that.

Like, you think about Tiger Woods, and imagine him being like, “Well, I should get quite good at baseball and soccer as well, because I’m not that good at them.” No, of course not. He should really leverage the things that make him truly unique. But in terms of the game of golf, if there’s anything he’s particularly shitty at, he better at least spend some time working on that as well, right?

Basically what we get out of this framework is that if you’re weak in a thing, you have three options: you can work on getting better at it, find someone else who can do it for you (like a cofounder), or pick a goal that just doesn’t involve it very much because the exponent is really close to zero. And then at the same time, try to find the thing you’re like amazing at and just kind of push that to the Moon — try to get like 10 or 20 times better than everyone else at that thing — to get a really good multiplier in the end.

Rob Wiblin: I suppose there are all of these folk theories about what determines success, which usually highlight one particular thing. You know, it’s just grit, it’s about your ability to stick to it, it’s just luck, or it’s just intelligence, or something. And you’ve kind of said, “I want to build a more complicated mathematical model with lots of different factors, and they’re going to be multiplied and they’re going to have powers.” So mathematically, it’s very good and it can encompass a lot of different conclusions.

But people are reaching, I think, for these simpler folk theories because they want to know empirically which of these factors actually is the most important in determining success. Do you know of any evidence that can help to kind of define these parameters in your model, and help people narrow down what’s most important?

Spencer Greenberg: So what’s really tough, if you’re looking at really high levels of success, is it’s hard to get good sample sizes, right? Because you can kind of make a collection of some of the most successful people in history, but then you’re just like anecdotally investigating each of them. And you can try to make a model. It’s just a tough thing to do. On the other hand, if we’re thinking about more ordinary levels of success — like being good at your job and marrying someone you like and so on — it’s a lot easier to get datasets, so I think more is known about that kind of success.

And if we think about that literature, things that come up a lot in a work context are conscientiousness — for a wide range of work, but not all types of work. But for many different types of jobs, you don’t want to be too low in conscientiousness. There could be diminishing returns. I think that’s an open question. Do you really want to be at 99.9999% of conscientiousness, or is that too much? I suspect that at some point it actually becomes dysfunctional, but at least up until a point, it does seem like it tends to predict job performance.

IQ generally predicts job performance across a very wide range of jobs. So that’s a helpful one.

Then spending time training at a particular skill that’s relevant for that job clearly is really important — but the type of training matters tremendously. So just the number of hours someone has spent doing a thing is a much less good predictor than the number of hours they spent with high-quality feedback. If you think about someone who just plays chess every day: yes, they’re going to get better at chess. But compare that to someone who plays chess every day, and at the end of the day, they break down what they did badly, what the best chess engine said they should have done instead, compare it to what they did, try to figure out why it said that. Even if you control for the amount of time they spend, the second type is going to become vastly better at chess.

Rob Wiblin: Yeah, and I guess social skills as well. Actually, I’d love to know: do we know how much luck matters in outcomes? I suppose here we don’t want to go to the luck thing of, you know, you didn’t choose to be born this way or that — but more just like a truly idiosyncratic luck within someone’s lifetime.

Spencer Greenberg: It’s a great question. I mean, I think luck is hugely important, but I think it’s hard to quantify. Where do you stop, right? Were you lucky that you were the one sperm that, out of the millions or however many sperm there are, happened to get to the egg? That seems ridiculous. But at what point do we stop the luck train, where we start counting things as being your own choice? I don’t know. It’s really hard to even philosophically define the right metric there.

Rob Wiblin: Yeah, as I was saying that, I was realising that maybe there’s no answer to what I was saying. Or I’ll just have to choose an arbitrary line and then try to measure.

Spencer Greenberg: The other thing is that unmodelled variation looks like luck, right? Let’s say we have a model with three factors, but really there are 10 factors that are important. All the other missing factors that we’re not modelling appear as luck in our model. They appear like random chance. So how do you know what’s luck versus what’s actually skill that you just didn’t model properly, or a positive attribute that you didn’t model properly?

Rob Wiblin: OK, so the way to figure out how much is luck would be to actually have the correct full model and then measure all of those inputs. And then whatever you couldn’t explain was luck. But I guess that’s a very heavy lift.

Spencer Greenberg: Exactly. You’d also have to correct for measurement error.

Rob Wiblin: OK, I see. All right, so maybe I shouldn’t expect an answer to that anytime soon.

Spencer Greenberg: But a lot of luck. Luck is huge. It’s clearly huge.

Soldier Altruists vs. Scout Altruists [01:08:18]

Rob Wiblin: Let’s talk about something else you wrote, called “Soldier Altruists vs. Scout Altruists.” What is the Soldier Altruist worldview and the Scout Altruist worldview that you wanted to draw people’s attention to?

Spencer Greenberg: I think this is a useful distinction when it comes to people who are trying to improve the world. The idea is that a Soldier Altruist is convinced that we already know how to improve the world. All we have to do is do it, right? They’re just like, “Come on, we just need to do more, we need to fight more, for what we already know is good.” And why is it that this isn’t solved already? Why are the problems of the world not solved? It’s because people just won’t get their act together to act, or corrupt people are blocking it, or the outgroup is doing bad stuff that makes the world worse. So they’re really focused on the action piece, and they think that’s what’s missing from altruism.

On the other hand, Scout Altruists think that we need to figure out what to do. They think the problem is we don’t actually know what to do — most of the things that seem naively like the right thing to do actually don’t work, they’re not effective — so we need to spend way more time thinking about how to solve the problems, and then testing things out to figure out how to solve the problems. And once we figure it out, then we can go into action mode, but we’re a long way from that.

And I think this is a really interesting distinction. I think that effective altruists in particular tend to be more on the Scout Altruist side. And I think a lot of other people who are trying to help the world tend to be more on the Soldier Altruist side, thinking we’ve just got to take action.

Rob Wiblin: So applying this to a specific case that I have some interest in, say, improving the wellbeing of animals in factory farming, or ending factory farming because we think it’s harmful to the wellbeing of the animals. I suppose that’s a case where it’s quite natural, at least for a critic of factory farming like me, to say, “The problem is that people could just not eat meat. That would be a relatively straightforward thing to do. It’s not rocket science. So really what we need to do is spread the message and shame people so that they’ll change their behaviour.” In a sense, it’s very obvious what needs to happen.

But then I could say, “Well, I don’t know how to persuade people to change their behaviour,” or “I don’t know how to reduce the costs for them to change their behaviour so that they do the thing that I think that they ought to do.”

So from the first point of view, I’m having more of a Soldier mindset. And from the second point of view — where I’m like, “Maybe we need to do social science research to figure out how we inspire behaviour change” — I’m being more of a Scout Altruist. Do you think it’s often going to be the case that it’s a bit hard to say exactly where people stand on any given issue?

Spencer Greenberg: Yeah, I think that’s an excellent example, because it shows the value of each perspective, right? And I think in that case, both perspectives have a really good point. Soldier Altruists would be like, “We just need to stop eating animals and that will prevent them from dying in factory farms.” Or, “We just need to make laws that ban this.” Whereas Scout Altruists would say, “Yeah, but how on Earth do we do that? Like a million attempts have been done to get people to stop eating meat or to change the laws, but they have failed, or they’ve only helped incrementally. So we really do need to figure out what works.” So I think they both just have valid points.

I think in other cases, it’s much less clear what the obvious thing to do is. Like, let’s say you want to prevent nuclear war. It’s not such an easy analysis. You know, “Well, countries just need to not bomb each other.” That seems just sort of naive or something. Or like, “All the countries just need to throw away their nuclear weapons.” Well, that seems to underestimate the importance of game theory here.

Rob Wiblin: I suppose some people do highlight that with war. You know, “If humanity wants to not have war, then we need to stop fighting.” It’s kind of an old refrain. But then of course you can always flip it and say, “Well, we don’t know how to do that. It would be too costly for many people to put down arms from their point of view, because they’ll be giving up something so important to themselves, like the independence of their country.” I guess with many issues, you can put on both of these hats and they might each reveal some aspect of the issue.

Spencer Greenberg: I think that’s right. And I think that’s a useful thing to do: try each hat on. What I would say though, is with something like war, when the Soldier Altruists are like, “Everyone just needs to stop fighting,” it comes across as very naive in that case. Because it’s like, that’s not a real proposal — that’s like a proposal that, “Everyone should just start making the right decisions.” OK, yeah. Whereas something like animals, I think it’s more justifiable to be like, “Well, actually we do know that we could just ban this thing. There’s nothing crazy about banning torturing animals.”

Rob Wiblin: Yeah. This is a reference to Julia Galef‘s book, Scout Mindset, right?

Spencer Greenberg: Exactly, exactly. So I come up with these terms, Soldier Altruist and Scout Altruist, in direct reference to Julia Galef’s idea of the soldier mindset and the scout mindset. I highly recommend reading her book about this. It’s not about this altruistic piece; I just sort of added that on as inspiration taken from her book.

Rob Wiblin: It’s also related to, what’s the Scott Siskind blog post about, you know…?

Spencer Greenberg: Conflict theory and mistake theory.

Rob Wiblin: Yeah, yeah. Can you give a brief summary of that one?

Spencer Greenberg: Yeah. People define conflict theory and mistake theory a little differently, which can create some confusion.

But I think one way to look at it is: when you’re considering how to diagnose the problems of society, is your diagnostic approach one where you’re like, “Society is clearly making mistakes. Let’s analyse this like a doctor who’s trying to cure a disease and figure out how to cure the disease”? Or do you have this reaction that’s more like, “No, no, no, that’s naive. It’s not like we’re in the role of a doctor trying to cure a disease. What’s happening is that certain people are oppressing other people and there’s this conflict between groups. Really it’s more of a zero-sum competition, so one group benefiting actually harms another group. It’s not like we could just cure an organ and everyone’s better off, right?”

So it’s more your emphasis on zero-sum conflict that’s sort of unavoidable, versus your emphasis on, no, we can take these smart actions to make everyone better off.

Rob Wiblin: I feel with that one, I just don’t really come down very strongly on one side or the other. Sometimes it does seem like what’s going wrong is that some people are doing something that harms others, and it’s reasonably clear and they should stop doing that. And other times it really does just seem like, you know, the reason that we haven’t cured cancer isn’t just that some people are bad — it’s that it’s a really hard technical problem and we need to do a lot of research and development to figure out how to fix it.

Spencer Greenberg: Yeah, I think that’s right. I also think it depends on if we are in a scenario of abundance. Because the more abundance there is, the more I think there is room for cooperation.

If you imagine two tribes, and there’s only enough food to feed one and a half of the tribes, yeah, they’re gonna have to struggle to survive. There’s gonna be conflict. Whereas imagine there’s enough food to feed three tribes and there’s only two tribes — well, now there’s a lot of gains they can have from trade. Like, “We’ve got this kind of food that you like better, why don’t we trade some of it?”

Rob Wiblin: Yeah, yeah. It occurred to me a month or two ago. This is so obvious, but it just completely blew my mind. I was just contemplating the Russian invasion of Ukraine and the enormous suffering — you know, so many people dead in Ukraine, their economy completely in tatters, international relations massively damaged. It’s also horrific for Russians, many of whom have died in the war and didn’t want to fight in it. Terrible for Russians in general in terms of their quality of life.

And I was just like, they could just withdraw from Ukraine. Putin could just stop it all. He could just say, “OK, I’ve decided we’re not going to do this anymore. It was a mistake to invade Ukraine.” Or maybe even just give no reason, just say, “Well, we’ve accomplished our goals. We’re going home.”

Spencer Greenberg: He could just declare victory and say “we won” and leave.

Rob Wiblin: I mean, obviously you couldn’t undo the damage that was already done, but it would be that simple. It would benefit Russia; it would benefit Ukraine. I don’t know. I think that is an unusual case. It’s quite rare to find something where almost everyone could just win if someone would just stop making a massive moral and practical mistake.

Spencer Greenberg: Yeah, it’s like the ultimate principal-agent problem, where the person with all the power to do the thing just has horrible incentives and a delusional outlook.

The Clearer Thinking podcast [01:16:27]

Rob Wiblin: Yeah, yeah. OK, pushing on. So you obviously have these blogs on Clearer Thinking and spencergreenberg.com. But yeah, as I said in the intro, one of your newer projects is the Clearer Thinking podcast, which I’m a very regular listener to. I suppose people who enjoy this show are pretty likely to enjoy your show as well.

Do you have any favourite conversations that you’ve done over the last few years since you started it?

Spencer Greenberg: Yeah, so I actually threw this question back at you and said, “Well, what episodes stood out to you?” And you mentioned this one with Cate Hall: Are we all the heroes of our own stories? I really enjoyed taping that episode. I’ll just give you a taste of that. Basically, Cate went through this experience where essentially she went insane. And she describes this process of going insane, and what that was like, and what she learned from it, and how she came out the other side. I think it’s just kind of an amazing story she tells.

But I should say, the format of most of my episodes is: I invite someone who I think is a brilliant thinker, and I have them pick four ideas that they think really matter. And then the aim is to discuss those four big ideas that they brought to the show. That’s kind of the general format. So if you think you’d enjoy that, we’d love for you to check out the Clearer Thinking podcast.

Rob Wiblin: Yeah, the Cate Hall episode was stunning to me. It’s episode 113, if people are scrolling through your feed looking for it.

If I recall, she tells a story of suffering from drug-induced psychosis. And I’m very familiar with mental health problems like depression and anxiety from my own experience, and experience of plenty of people I know, but I’d never heard someone describe what it is like to go through psychosis, and how you feel internally, and then how you recover from that. So yeah, an extremely memorable story. And then I guess the other section [of the Cate Hall episode] about how it’s incredibly hard to persuade people of a narrative in which they’re the bad guy is also a lesson worth remembering.

Spencer Greenberg: Yeah, totally.

Creating habits more reliably [01:18:16]

Rob Wiblin: OK, let’s turn now to what I’m hoping will be a fairly rapid-fire round, where we’ll go through some of the most interesting articles and research that you’ve written over the last few years on, I guess, your two main written outlets: clearerthinking.org and spencergreenberg.com. That’s right?

Spencer Greenberg: Yeah, yeah, that’s right. I also have a newsletter called One Helpful Idea we send out weekly, with just one idea that gets you thinking for the week.

Rob Wiblin: Yeah, cool. So I did a scan down those blogs, and there’s a couple of things I thought would be really fun to go through with the audience.

So your team at Clearer Thinking recently ran a study to figure out whether your habit-formation tool was actually helping people to form habits in reality. And you wrote out the results in an article titled, “Does a simple tool cause people to create habits more reliably?” — which of course we’ll link to, as everything else that we mention here today.

So forming good habits, forming more habits, is a really general way that people might be able to improve their lives and achieve their goals more. And so any better methods here could in principle do a lot of good for a lot of people if they were actually applied by a large fraction of the population. I guess it seems particularly promising, because it doesn’t seem like the methods that we have for habit formation are so amazing right now that we’re anywhere near the technological frontier, the best that we could imagine doing. So what was the setup with the experiment that you ran?

Spencer Greenberg: Yeah, so this is actually for a tool on our website, clearerthinking.org, that you can use now — it’s called Daily Ritual. It will walk you through the intervention, so you can see what the intervention was and you can try it in your own life.

The background is: we wanted to figure out how to help people form new habits, and we really weren’t sure what would work. So our earlier pilot study was a really insane study design, where we took I think about 20 different habit-formation techniques — a number of them drawn from the academic literature, and some of them were our own ideas or crowdsourced ideas. Study participants would get randomised, so they’d get something like five techniques at random assigned to them. Then we’d actually track them over a number of weeks to see if they stuck with their habit.

And then what we could do is we could say, “For everyone who happened by random to be assigned to Technique #7, did they do better than the people who weren’t assigned to Technique #7? What about people who got Technique #3?” So it’s a really interesting study design, because by doing a regression, you can actually look at the causal improvement of having been assigned an extra technique, and study the techniques that way.

From that, five techniques came out as potentially useful. Fifteen of them were kind of garbage, and five were like, maybe these ones work — we’re not sure, this was just a preliminary study, but maybe these work. So then what we did is we packed those five most promising techniques into a single intervention, which is called Daily Ritual.

And recently we got a grant to study Daily Ritual as a package. So it was a kind of confirmatory study: Does this package of interventions work? And so that’s what we did. We recruited hundreds of people. We randomised them either to a control group or the Daily Ritual group. We just did the intervention. It’s a one-shot intervention — you just do it one time, so it’s very low effort, very little time to do; I think it takes maybe like 15 minutes or something.

Then we want to see: Over the next eight weeks, did they stick to their habit more? So everyone, whether in the intervention group or control group, would get a weekly survey where they tell us about how many times did they do their habit that week, how many times did they fail, et cetera. And at the end, they also rated how successful they felt they were with their habit overall.

Rob Wiblin: Are you saying that all of the habit-formation mechanisms that you were testing out were things that people just did a single time?

Spencer Greenberg: Yes, this is a one-shot intervention, so you just do it once at the beginning. There are also reminders that people get sent to them, but yeah, it’s a one-shot intervention. That’s kind of what we’re aiming for. Like, in 15 minutes, can we make you stick with your habit more often, for the next however many weeks. That’s the goal, which is pretty ambitious.

Rob Wiblin: Yeah, it seems like a heavy lift. What were some of the things that you tried, the five?

Spencer Greenberg: The five techniques are based on our pilot study that showed the most promise.

One is really simple: it’s just listing the benefits of this habit. So after you pick a habit, we say, “OK, we want you to give us the reasons why you think that this would improve your life.” So this is trying to create motivation.

The second one we call “home reminders.” Again, really simple: you just write down what your habit is and you stick it somewhere where you’re going to just keep seeing it over and over again. This is just trying to overcome the barrier that people sometimes just forget about what their intention was, and so we want that to be right there so it’s harder to forget.

The next one is one that isn’t always possible for everyone, but it’s called “support of a friend.” Basically you write a short note or letter to a friend about how they could support you in your habit-formation efforts, and you just send it to them. It could be a partner, it could be a roommate, it could just be a random friend.

The fourth one is called “mini habits.” This is related to the work of BJ Fogg and others, where basically you just come up with a tiny version of your habit that you’ll do on days when you think you don’t have enough time to do the full habit. So there’s never an excuse not to do it because you’re like, “Well, I don’t have time, so I’m just gonna abandon my habit.” No, no, no, no, just do the short version when you don’t have time, so that you don’t lose that momentum and the habit-formation mechanism in your brain. So for example, if your goal is to do 20 pushups a day, that’s your habit, instead just do one pushup, right? You never have an excuse not to do one pushup.

And the last technique is called “habit reflection.” I think this is the most unique one. I don’t know actually where we got this from — we might have come up with it, but I haven’t been able to find it anywhere. So if anyone knows where it comes from, let me know. But habit reflection is kind of cool. What you do is you think about a previous time when you’ve succeeded at forming a habit, then you write down what you did that you think helped you with that previous habit. And then you take a moment to write down how what you did last time that helped you could be applied to this new situation. What’s cool about it is it’s a kind of a self-customising intervention, where you figure out what worked for you and then you figure out how to apply what worked for you in this setting.

Rob Wiblin: Yeah, yeah. I think a lot of listeners to the show will be pretty sceptical of this sort of psychology and self-improvement research. And I think understandably so, because a lot of the research that’s out there in this general vein is pretty low quality. What could you quickly say to convince a listener that they should take the results that you’re about to say at all seriously?

Spencer Greenberg: Yeah, so I think a few things about that. First of all, I’m very sceptical of a lot of psychology research, so I’m extremely sympathetic. And as we’ll get into, this is actually an area we work on: like meta-science, how to improve science.

The second thing I’ll say is one thing that’s nice about this work is it really is based on two studies: the first kind of a preliminary exploratory study, and the second really confirmatory study — where we had very clear goals of like, “Here are our main objectives, here’s exactly what we’re going to measure, we’re going to pit this against the control.” I think that kind of research tends to be more reliable, because you’ve kind of already locked down so much about the research design and you know exactly what you’re testing. So I think that should increase reliability.

The other thing I would say is, something that helps me psychologically is that I have a lot of irons in the fire. Imagine that everything I was doing is this one habit, you know, this one intervention. Then think about all the biases that would be operating on like, “We have to show this works. Oh my god, imagine if this doesn’t work.”

Rob Wiblin: “This is our one product, our one chance.”

Spencer Greenberg: Exactly. But having lots of irons in the fire, it’s like, “If this doesn’t pan out, it’s not that big a deal. We can deal with it.” So it actually, I think, makes us better truthseekers, having more things in the fire.

Rob Wiblin: Yeah, so you get to tread a bit more lightly. How did the techniques work then? I wasn’t sure what they were gonna be, but they all sound kind of promising. Intuitively, I feel like they might help me out.

Spencer Greenberg: Yeah, they’re not like shocking techniques, they’re not revolutionary techniques, but the key is to get something that actually works. And so in our confirmatory study, they did seem to work. We increased people’s weekly habits about 0.61 days per week across the eight weeks of the study period. So that’s not like a massive effect size, but for a quick intervention, you do a one-shot deal, that seems pretty good.

Rob Wiblin: It’s from what to what?

Spencer Greenberg: So in the intervention group, people practiced their habit about 3.5 days per week, on average, across the eight weeks. And then the control group was about 0.61 days below that per week. So it’s not a massive effect size, but it was highly statistically significant. I think it was about p = 0.01. And there’s some nice curves that we show, that in fact, in every single week, the intervention group beat the control group.

At the end of the study, we saw the effect seemed to be fading a little bit by like week eight-ish, but still it was at least a little bit higher than the control group. So one thing we thought about is like, maybe you could get a booster at like seven weeks — you go do the tool again, and maybe that gives you a booster shot. So if I was going to run another study, I would want to investigate kind of a booster-shot approach.

Rob Wiblin: Do you know what sorts of habits people were trying to build? I mean, it seems like people were having reasonable success. They were doing it three or four times a week on average. How difficult were the habits they were trying to build?

Spencer Greenberg: We really wanted to make a tool that works for any daily habit. So not the kind of thing where you’re gonna do it once a week, but really things you could do every day, or let’s say at least five days a week. The most popular ones people chose were things like reading books every day, doing daily stretches, practicing a skill every day you want to improve on (like a musical instrument or something like that), drinking more water every day, exercising daily.

So it actually differed quite a bit, the difficulty level. Like exercising daily is clearly way harder than drinking more water every day. But really quite a range of stuff. And because of the randomisation, on average, the difficulty level of the things people were practicing in the control group and intervention group should have been balanced.

Rob Wiblin: Could you figure out how many people were in the study?

Spencer Greenberg: Yeah, I think it was a few hundred that we got final data on.

Rob Wiblin: Was it possible with that sample to figure out which of the five methods were better than the other, or did you just kind of have to average across them?

Spencer Greenberg: Not for this study, because basically the intervention packed them all together. In the pilot study, we got preliminary evidence about how well each intervention worked based on that study design. But I wouldn’t read too much into that because we tested 20 things, we took the five most promising ones. You know, there’s a lot of regression-to-the-mean effects, and I think it’s kind of hard to tell.

Rob Wiblin: OK, so with about 15 minutes of input, people were doing 0.6 more [days] of the habit per week, and it seemed like it was lasting something like two months before the effect was fading out. So I guess they did the habit four more times with this 15 minutes of work. So a medium return, I guess.

I feel like my big barrier here might be that I’m maybe not the kind of person who would use one of these tools typically. I’m not sure quite why I have a barrier to doing that. But do you think that maybe even just getting people to have an interest in using the tool in the first place might be the biggest barrier to having an impact with these kinds of ideas?

Spencer Greenberg: Well, fortunately, a lot of people do use our tools. But I would like to appeal to people like you, so it’s interesting to me how to do that.

I think there’s this sort of pipeline, right? Like we have to get people to hear about one of our tools. So how do we do that? Most of our traffic historically has actually come through large media articles — like we’ve been in the Wall Street Journal and New York Times and Lifehacker, places like that. And so that’s one way people hear about it.

Then once they hear about it, let’s say they look at the first page of the tool, we have to convince them on that first page, “Hey, this is worth your time” — but we also don’t want to overclaim. And so that’s like a really delicate balance of like, don’t overclaim, but also get people excited and motivated, right?

And then we have to keep them engaged throughout the tool. Let’s say for 15 minutes, you have to keep people’s engagement. And that can be quite difficult. There’s like a million other things people could be doing. They could be on TikTok or Facebook or petting their dog or having dinner with their loved ones. So there’s a lot of competition there.

Rob Wiblin: Yeah, yeah. I suppose, although I’m not a huge tool user, I have learned some of these techniques and I think I just use them organically. My partner wanted to build a habit of being more grateful. So she added to our calendar an event every day at 10 PM where we need to tell one another or message one another three things that we’re grateful for from that day. I think we have about a 90% strike rate on that. So I suppose the regular reminder and the commitment with someone else has worked really well for us.

Spencer Greenberg: That’s great. Yeah, you know, really what matters is what works for you, and different people will find different things are helpful. So yeah, if someone hears about our techniques and just implements them on their own, that’s amazing. That’s perfect. They don’t need to use our tool. Although our tool just walks you through step by step, so it makes it like dead easy to apply this stuff. And it will also send you automated reminders and things like that.

Rob Wiblin: Nice, yeah. Do you do experiments like this on lots of the tools that you’re creating, and then adjust them in one direction or the other depending on the results?

Spencer Greenberg: It’s this really interesting balance, because running a randomised controlled trial like this is so much work. And first of all, we need funding for it. So this one, we happened to get a grant, which was awesome. But in general, running a randomised controlled trial on all 70 of our tools would be just insanely expensive and time consuming. So it’s just not feasible. But we try to strike a balance.

So our standard procedure for developing Clearer Thinking modules is actually really involved. It’s like a 30-point checklist of what we do to develop them. But after we basically have a plan for what we’re going to build, and an outline, and we make the first version, and we get a lot of internal team feedback, we then will run a study where we’ll test it on 40 random Americans and have them critique it.

And after that, once we’ve incorporated the feedback from them, we will send it out to our beta test list — we have about 10,000 people who’ve agreed to be our beta testers, and we’ll send it out to the list. Obviously, they won’t all do every tool. But maybe a couple hundred or something will do the tool. Then they’ll tend to leave very detailed feedback and criticism because we asked them for that. And so in that process, we fish out a lot of problems with the tool, confusions, ways it’s not as valuable as it could be. And also, we have them rate it on different dimensions, like: How valuable was the experience of using this? Would you recommend this to a friend? Things like that.

And so that really helps us tell. That’s so much cheaper and easier and faster than running a randomised controlled trial. But it doesn’t prove the thing works. However, it does provide preliminary evidence this is useful. If a bunch of people do a thing and they’re like, “Yeah, that was really valuable,” that’s evidence that it’s valuable. Not as good evidence as actually following up in six weeks.

Behaviour change is incredibly hard [01:32:27]

Rob Wiblin: Are these results consistent with the broader academic literature on habit formation? Is that literature a hopeful one or a pessimistic one?

Spencer Greenberg: Yeah, it’s a good question. I would say one of the lessons of the behaviour change literature is that behaviour change is really hard. We actually created a framework called Ten Conditions for Change specifically to try to address this. Because we know behaviour change is so hard, we felt like we really need a framework to really think systematically about how to do behaviour change. And you can access that for free — it’s on our website, sparkwave.tech.

But one study I think that really makes it clear how hard behaviour change is, is this really huge study that was run fairly recently. I think it was on tens of thousands of people that were gym members, and they tried to get them to go to the gym more often.

Rob Wiblin: Yes, I saw you tweeted about this study, right?

Spencer Greenberg: Yeah, I wrote a long tweet thread about it that went kind of semi-viral. The original paper is called “Megastudies improve the impact of applied behavioural science.”

The basic idea is they got tonnes of researchers — I think it was like 30 different scientists working in small teams — to develop behaviour change interventions. And then they took these tens of thousands of people — like 61,000 participants who already had gym memberships — and used these text message interventions, 54 different interventions that the scientists developed, to try to get them to go to the gym more often.

And if you read this paper, just kind of quickly read it normally, like I did the first time, you might actually be convinced that behaviour change is not that hard. Because at face value, they seem to find that lots and lots of these interventions worked. But on a more careful reading of it, I realised actually that’s not what the paper shows. The paper shows that actually behaviour change is incredibly hard, and almost none of it worked.

Rob Wiblin: OK, let’s take it step by step. There were 60,000 or something participants in the study, and they were all people who were part of this gym chain, and I guess the gym chain could monitor how often they were actually going to the gym. So you had pretty high-quality data on that side. And I guess with this huge sample, they were testing out tonnes of different behaviour-change methods. Is that right?

Spencer Greenberg: Exactly. So these 30 scientists, working in small teams, came up with these different interventions. Many of them were just totally distinct from each other; some were variations on each other. In total, I think it was about 54 interventions.

Rob Wiblin: OK. So a sample of 1,000 participants, roughly, per method that they were testing, which is a very solid number. What went wrong with the analysis that made them draw the wrong conclusion, in your view?

Spencer Greenberg: In order to unpack that, I have to tell you briefly about the study design, because it’s kind of a funny study design.

So there was this pure control group that really didn’t get anything. Then there was a sort of enhanced control group that got a certain kind of “package” of interventions — which involved making a plan about when you wanted to go to the gym, getting reminders, and then getting paid incentives to go to the gym. You would get points that you could redeem, I think on Amazon.

So there’s this pure control group. There’s this enhanced control group. And then all the other interventions were built on top of the enhanced control group. In other words, all these 54 interventions got those basic intervention pieces — the planning, the incentives, the reminders — but then added something on top to see if it benefited it.

Rob Wiblin: OK, so some people got nothing. Some people got kind of the basic package. And then everyone else got the basic package plus one of 50 or something additional things as well.

Spencer Greenberg: Exactly, exactly. Then when you look at how they analysed the study — like their primary analysis that they’re talking about in the abstract — they compared everything to that pure control that got nothing. And they found that lots of stuff beat the pure control. The paper abstract says 45% of these interventions significantly increased gym visits by 9% to 27%. Sounds really great.

The problem is that the enhanced control beat the pure control. And we know that all the other interventions were adding stuff on top of that. So of course they beat the pure control, right? They got this package that beat the pure control; it’s just like a basic feature. So from my point of view, the much better way to analyse this is to compare the interventions against this enhanced control, not against the pure control.

Rob Wiblin: OK, so they did find, reasonably, that this basic package did increase gym attendance compared to the people who got nothing at all. But then they did this funny thing of comparing the people who got the basic package plus something else to the people who got nothing, and then claimed that the something else was the thing that made the difference. Is that what they did?

Spencer Greenberg: Exactly. And to be fair to the researchers, first of all, I think this is an incredible study. I think they should be really commended for putting this together. I mean, think about how difficult this would be to run and how much we learned from that.

So I think, A, that’s awesome. B, they do mention somewhere in the paper this analysis of comparing these interventions against the enhanced control. It’s just not the highlight, right? If you just read it quickly, you wouldn’t necessarily take it away: “Wait a minute, this was a disaster. Almost nothing worked.” You’d think, “Oh wait, 42% of things worked” — that’s what they said in the abstract, or 45% achieved p<0.05. Well, that seems good, right?

Rob Wiblin: Yeah, it’s so strange. I mean, they must have spent an enormous amount of effort on this, and then they’ve kind of, I mean, from our point of view, slightly botched the writeup. How can you defend this? How did this kind of writeup get past peer review? Seems very odd.

Spencer Greenberg: You know, I can only speculate. If I was a reviewer, I would have said, “Hey, that doesn’t seem like the main finding of your paper.” But it certainly sounds a lot cooler that lots and lots of stuff works. It certainly makes behavioural science look better. You know, think about the field of behavioural science in World A, where they test 50 things and almost nothing works, versus World B, where like 45% of it works.

Rob Wiblin: Yeah, yeah. Well, at least they wrote it so that you could figure out all of this stuff. Did they publish the data that they used for the analysis?

Spencer Greenberg: That’s a good question. I haven’t looked into whether they published the data, but they did publish plenty of information. So kudos to the researchers; it was easy for me to get the numbers I needed once I realised that they had analysed it in this way. So that’s awesome.

And then I looked at what actually did work. And it’s kind of funny. Basically there were four things that had evidence of working. And I should say, even the evidence of them working isn’t that strong, because we’re just talking about p<0.05 after testing 54 interventions. So you’d expect there to be some false positives, right? You expect maybe like one to three false positives here. So four things working, it’s not that exciting.

Rob Wiblin: Right, yes. You’d expect two or three false positives, assuming that none of them worked at all and the experiment had been done perfectly with no bias towards finding positive results. And plausibly things that won’t be quite at that level of perfection. But nonetheless, what were the few that worked? I suppose firstly, the basic package that almost everyone got, that worked, right? Remind us what that was.

Spencer Greenberg: Yeah, that seemed to work. And that was a combination of planning when you’re going to the gym, incentives — so every time you went to the gym, you got points that you can convert to Amazon gift cards; it wasn’t a lot of money, but it was something — and then reminders, so you can be reminded of your intentions.

Rob Wiblin: OK, great. It makes intuitive sense that that would work. And what were the additional four extra things, for which we have some reasonable hope that they might have helped as well?

Spencer Greenberg: Right, so there’s two that kind of suck and there are two that are cool. I’ll start with the ones that suck. The first one that I think kind of sucks is giving people bigger incentives: it turns out if you pay people more money to do a thing, they do it more often. I think that probably is true. And I think it’s not very interesting.

Rob Wiblin: Well, I mean, I wouldn’t say that that sucks. It’s kind of useful to know that, and it leads into potentially, some people use these “commitment contract” things — where they give someone else money, and they only give it back if they meet their goals or they manage to stick with their habits. Maybe it endorses that kind of thing, that people do care about monetary rewards.

I guess it also helps to explain why people bother to show up to their jobs at all. Why they show up to jobs even when they don’t enjoy them.

Spencer Greenberg: The longstanding mystery of why people go to work.

Rob Wiblin: Cutting-edge behavioural science. What were the other three?

Spencer Greenberg: Another one that I say sucks is because I’m a bit sceptical. They gave people information about what’s normal: telling participants that a majority of Americans exercise frequently, and that the rate is increasing.

Rob Wiblin: Is that true?

Spencer Greenberg: Well, I don’t know whether it’s true. I don’t know if it’s true. It’s a good question. I hope it’s true. Or at least that they debrief people if it’s not true at the end.

Intuitively, it seems like it could work: we know social norms can be powerful, and there have been previous studies showing that giving people social norms can create desirable behaviour. But what makes me kind of sceptical of it — and I believe it was Jacob Falkovich who pointed this out; I didn’t realise this at first when I posted about it — in that huge list of like 54 interventions, there were other ones that were very similar to this that totally failed. So because of that, it’s maybe probably a false positive, I would say.

Rob Wiblin: I see. So there were several interventions that were very similar, with the same kind of underlying theory. And then one out of a few of them came up positive. But probably if you averaged across all of them, the effect would be not significant, and maybe closer to zero.

Spencer Greenberg: So if you have these different pieces that are kind of the components of the intervention, and those don’t work, but the intervention combining them works, it’s kind of a little bit questionable.

Rob Wiblin: Yeah. Raises an eyebrow. OK, what were the two ones that are more promising?

Spencer Greenberg: So here are the ones I thought were more promising, more interesting. One is giving people bonuses after they mess up. So the basic idea is if you fail to go to the gym when you wanted to, you’ll be told you’re going to get a special bonus if you recover from this mistake. So the next day, if you go at the time you planned, you’ll get extra points. And I think this one probably is not a false positive, because actually in the top five, this occurred twice: there were two slight variations on it, and they both were in the top five. So that seems really promising to me.

Rob Wiblin: How can people apply that in their normal life? You have this issue of falling off the wagon that a lot of people have when they’re trying to change their habits. I suppose you need to have an extra reward for yourself if you miss a day and then you manage to get back on the next day. That’s maybe like an intervention point where you’re particularly able to make a difference by geeing yourself up.

Spencer Greenberg: Right. And I think the key is to think about a failure as not, “Now I’m screwed up and now it’s not even worth it.” It’s like, “No, no, no, wait. Now I can recover and I should feel really happy if I am able to recover.” Because think about doing a habit: you’re going to have failure days, inevitably. If you can’t recover, then you’re pretty screwed. I just think that’s just a reminder that the recovery piece is as important as doing the habit in the first place.

Rob Wiblin: OK, yeah. An app that I use is called Habitica, which is this attempt to make a role-playing game out of following your daily good habits. It has this phenomenon where if you add something to your to-do list and then you fail to hit the deadline, then the reward for doing it the following day goes up a little. I suppose it’s based on this idea that you really don’t want people to despair because they’ve failed to achieve their goals — if anything, you want to reward them even more, because now there’s maybe a barrier to getting started.

Spencer Greenberg: It’s always fun to see cases where a product and academia kind of converge on a similar solution, not necessarily even talking to each other.

Rob Wiblin: Yeah. And what’s the other promising one?

Spencer Greenberg: So the other one is really quite interesting. They gave people a choice of a gain frame and a loss frame for the points they earned. The idea is when you go to the gym successfully at the time you planned, you earn points, right? So that’s a gain frame. But you can equivalently think of it as you start with all the points, and every time you don’t go to the gym, you lose points.

And with this intervention, they actually let people choose. They said, “Do you want to have this many points at the beginning and every time you don’t go, you lose them? Or do you want to have zero points at the beginning and every time you go, you get them?” And of course it’s the same number of points either way. But by letting people choose, they found people actually seem to go to the gym more often. And we don’t know for sure that it’s not a false positive, but I think it’s kind of cute. And if it actually works, that’s pretty cool.

Rob Wiblin: Yeah, I guess it suggests that people might know themselves, and they might understand which one of those is more motivating for them: the whip or the carrot.

Spencer Greenberg: Yeah, it’s kind of choose your whip or your carrot, and maybe people are better than chance at choosing it.

Rob Wiblin: OK, cool. Is there anything to wrap up on habit formation before we move on?

Spencer Greenberg: Well, there was another really fascinating aspect of this experiment, which in a way is really depressing. So they had a bunch of groups try to predict which techniques would work well: they had ordinary people do it; they had professors do it, who were in relevant fields; and then they had practitioners do it. Can you guess the results?

Rob Wiblin: I would guess that nobody was able to predict, probably.

Spencer Greenberg: Yeah, it was pretty garbage. All the predictions were pretty garbage. The ordinary people actually outperformed the professors and practitioners, which is hilarious. But I think that’s probably just a fluke chance. It wasn’t a super strong effect. I think probably the answer here is just like, it’s really hard to know what will work for behaviour change.

And I think this is because there are a lot of barriers to behaviours occurring. And this is what we try to do in our Ten Conditions for Change framework: we try to map out what are all the barriers? And then what you start to realise is in a different behaviour change context for a different person, you could have really different barriers. And so it’s hard to have a one-size-fits-all technique.

I think very often you have to do things like stack techniques. Like for example, that enhanced control stacked three techniques. Or in our habit formation tool Daily Ritual, we stack five techniques. Why? Well, because one technique might work for one person, and another for another. And so if they’re short and easy to do, maybe just put them together.

Rob Wiblin: Yeah, makes sense.

The FIRE Framework [01:46:21]

Rob Wiblin: OK, let’s turn to a new theme, which is decision making. A big theme of your website, Clearer Thinking, is basically helping people improve their lives, which includes making better calls about difficult things they have to potentially change or different ways their life could go.

The audience here is likely to be pretty familiar with some of the common, classic mistakes that people can make when making decisions — at least if you believe the psychology literature. I think one that I was really reminded of recently was how people tend to buy much less food at the supermarket when they’re not hungry at that moment, because I noticed that I was at the supermarket and didn’t feel like buying anything for that week just because I’d just eaten, and I had to overrule that.

Something you’ve written on decision making that I really loved — because frankly, it’s exactly what I think about the topic — is the FIRE Framework for figuring out when you should use intuitive thinking versus deliberative thinking. There’s four situations that you highlighted when you think that it’s good to go with your gut. What’s the first one?

Spencer Greenberg: So the FIRE Framework is all about when should you go with your gut? And basically, what I claim is that in these four cases, usually you should just go with your gut. It’s not worth going into deep reflection.

The first case is for fast decisions. The F in FIRE stands for “fast.” So imagine that you’re driving down the highway and suddenly a car going the wrong way in your lane is careening towards you. You don’t have time for reflection. You’ve just got to decide: you’re going to the left, you’re going to the right. What are you doing? And just act, right? So a fast decision, you got to go with your gut.

Rob Wiblin: Makes a lot of sense. What’s the second?

Spencer Greenberg: The second is the I in FIRE. It stands for “irrelevant” decisions. These are just very low-stakes decisions that really don’t matter, where using your reflection is probably just not worth the investment of your conscious mind. You could just be spending your time thinking about something more important. This would be like, you’re at the salad place, and you’re trying to decide, “Do I get carrots in my salad?” It’s like, does it really matter? If your gut tells you to get carrots, get carrots.

Rob Wiblin: Yeah, yeah. I mean, that’s a case where it’s really easy to tell that it’s a trivial decision, or not a very consequential decision. But I think it is common to sometimes get stuck thinking about something that doesn’t matter all that much. And sometimes you have to mechanically go through and think, “What’s the best outcome that I could plausibly hope for here? And then what’s the worst outcome here?” And then compare them and realise that the stakes is $10. I mean, with money, often you can just get quite disconnected from how much money is actually at stake in a decision in aggregate. At least that’s my experience.

Spencer Greenberg: My favourite example of this is when I was with my lawyer friend, who was making, I don’t know, $400 an hour or something ridiculous. And she’d gotten a gift certificate for her birthday, and we were at a bookstore. This was like five years ago or something. And she spent like two hours deciding what to buy with this $30 gift certificate. I was like, “Do you realise how much of a waste of money this is? Spending that long on a $30 gift certificate for someone that makes as much money as you?” And if it had been fun, if she’d been like browsing and enjoying herself, that’d be one thing. But she was like stressed out over this.

Rob Wiblin: Yeah. Sometimes you really do have to stop and analyse. You have a calculator on your website that helps people figure out what is their rough value of time. So I have a sense of, you know, should I spend a half an hour to save $10 or not? And obviously that’s going to vary a lot between individuals, but it’s worth having some sense of how much time it’s sensible to spend for any particular amount of monetary stakes.

OK, what’s the R?

Spencer Greenberg: The R stands for “repetitious” decisions. So think about someone who’s a chessmaster, who’s played thousands of games of chess with feedback on how they did. That person is going to develop an incredible intuition for what is a good chess move. It doesn’t mean that they can’t spend the time thinking about it, but it means that they don’t necessarily have to.

There’s this amazing example of Magnus Carlsen playing chess against three people who are pretty good — they’re not like top, top people in the world, but three people who are pretty good. And he only gets a few seconds per move, so he really doesn’t even have time to reflect. And he beats them all. But the craziest part is he’s blindfolded, so he has to keep all three boards in his mind simultaneously and make each move within a few seconds. Think about how ridiculous that is.

So at some point, you’ve done something enough that your intuition is just built up. I think really the key here is intuition is not magic. Sometimes people treat intuition as magic: “Your gut knows all these things.” No, your intuition has to learn, right? So if you’re in an environment where you’ve done a type of decision many times — and, really key thing, you’ve gotten feedback, so your intuition was able to update — then you can often trust your gut.

On the other hand, if you’ve never done a type of decision before — like if Magnus Carlsen suddenly plays a new game that has nothing to do with chess, he might really suck; in fact, his intuitions might be actively harmful in a new game that’s completely unrelated — or if you’ve done something before, but you never got to get feedback, so your intuition couldn’t learn or the learning algorithm in your brain couldn’t update, then your intuition may not be reliable. Imagine every day shooting archery, but you’re blindfolded, and you never get to see if you hit the target. It doesn’t matter how many times you shoot the target if you can’t find out if you hit it or not.

Rob Wiblin: Yeah, that makes sense. What’s the E?

Spencer Greenberg: The E is for “evolutionary” decisions. There are certain things that are hardcoded in animals — and, people being animals, they’re hardcoded in us. They’re not always right, but they’re pretty good heuristics. Like if you are looking at a piece of food and it smells horrible, just don’t eat it. It’s just not worth it. It might make you really sick.

If you hear a really loud noise suddenly, unexpectedly, probably you should try to get away. Because maybe it’s just like a balloon popping, but maybe it’s an explosion or an earthquake or whatever, right? Well, OK, maybe you can’t run from an earthquake. But the point is, there’s a reason why we’re scared of really loud sudden noises, because they often indicate danger. So there are certain kinds of evolutionary intuitions we have that are pretty reliable. And generally, you should err on the side of following them.

Rob Wiblin: I guess a tricky case here is reading other people — where it’s something that humans definitely have evolved to do to understand the people around them, but it’s also possible to read people completely wrong sometimes, despite that. Do you have any thoughts on it?

Spencer Greenberg: Yeah, I think this is part evolutionary and part repetitious.

So every once in a while, you may just be walking down the street, and you see someone and they just give you a really bad vibe instantly. I think that’s sort of an evolutionary thing that can happen, where let’s say someone’s face just looks really angry or something like this. I do think we have a general thing like, “Oh, that person looks really mad. Maybe I should stay away.” Or they just seem totally not there, right? They seem like maybe they’re on very serious drugs or something. Probably better to avoid them just in case. And I would put that in the evolutionary bucket, just like err on the side of safety. If there’s a low cost of avoiding them, why not avoid them?

Then there’s the repetitious piece. And this is like, throughout our lives, we get lots and lots of feedback interacting with people. And there, I just think you have to kind of know how good your intuitions are, right? Are you more the sort of person that’s like, “I interact with people and I get a bad vibe, but I push past it. And then later I find out like they’re actually not a good person and they hurt me.” Or are you more the sort of person that’s just like, “You know what, I’m just nervous all the time and I get nervous at everybody. My signal of nervousness around someone just doesn’t mean anything.” There, I think you just have to use your past experience and understand, like, did you get the proper feedback to be good at predicting that thing? Or are you more someone who just doesn’t have reliable indicators?

Rob Wiblin: Yeah, it’s an interesting question, why it is that our intuitions about people are ever really wrong, given that they often are quite wrong. I suppose one thing is that there’s massive distribution in skill at reading and understanding other people: some people are exceptional, and, as with any difficult task, some people aren’t very good at it.

But there’s also just a big adversarial element to a lot of social situations, where people are very often trying to obscure their true feelings about you or what you’re saying. So it’s possible to get incorrect feedback all the time about what people are actually thinking or how they’re reacting to you, because people don’t necessarily want to wear their heart on their sleeve. And that can be a big problem if you haven’t learned to see through it.

Spencer Greenberg: Yeah, I think all of that’s true. I would also just add that often we don’t have much information, right? Like if someone walks up to you in the street and says, “Excuse me,” you really have very little information to make a judgement of like, “Is this person just asking for directions? Is this person going to assault me?” So of course it’s going to be error prone because you don’t have much to go on.

But I also agree that some people could be much better at picking up on things, some much worse. In New York, when someone approaches you, you probably tend to err on the side of, “Why is this person talking to me?” Whereas maybe in a small town, that’s totally normal. And you’re just like, “Hello, how are you doing today?” You know?

How ideology eats itself [01:54:56]

Rob Wiblin: Yeah. OK, let’s now go through a short and sharp blog post that you wrote, titled, “How ideology eats itself: A quick primer on how to be a genuinely good person who harms the world.” I’ll explain in a minute why I see it as a decision-making issue primarily. What was your main message with that one?

Spencer Greenberg: Here I was thinking about this pattern I’ve seen, where ideologies that are really well intentioned can go off the rails. And I started thinking about what’s going on here: why does there seem to be this trajectory of them starting as this really positive force, and then maybe ending as a more negative force? I started realising there’s a sort of somewhat predictable series of things that happens that takes the ideology in a bad place. So I’ll just signpost a series of steps I think is pretty common.

So first of all, the ideology has some good ideas — some ideas that are really appealing to people — that draw people into it. They’re attracted to it. And once people start joining the group, as with any group, there’s now incentives to show group membership, to signal that you’re a member of the group — that makes people in the group like you more, and trust you more, and shows you’re a good member, and so on.

That naturally makes an incentive to essentially ignore doubts that you might have. Imagine that you’re a Christian that’s like constantly doubting and making people aware of those doubts versus a Christian that’s just like, “Yes, I totally believe.” It’s going to be a lot easier to fit in in a Christian group in the latter situation. So I think that there can be a pretty strong incentive to not talk about doubts, but also even internally to try to suppress your own doubts — because they can kind of reduce your own sense of group cohesion and your fitting in.

Rob Wiblin: Yeah, that makes sense. I think we’ve all seen a thing like that play out. What happens next?

Spencer Greenberg: Right. So now we have people trying to fit in, and they’re suppressing their own doubts. The next phase is that this ingroupiness tends to create outgroups. So if you’re the good guys — every group believes they’re the good guys, right? — then the people that oppose your group are the bad guys. Then you start to get this pressure to start seeing the outsiders as bad.

And this happens in cults, where cults generally have a positive mission initially — like, “We’re going to help the world,” or often, “We’re going to save the world” — but then also they start viewing the world as like outsiders who are bad. “We’re the good people. They’re the bad people.” You also see this for much larger-scale communities, where it’s like, “We’re the good guys; the capitalists are the bad guys.” Or, “We’re the good guys; the communists are the bad guys” or whatever. So you start getting this like us-versus-them, good-versus-bad mentality.

OK, so now we have two pieces in place: suppression of doubt and ingroup-versus-outgroup. Outgroup is bad. We’re the good guys.

Now we get into this third phase — and this is where really weird stuff starts to happen — where we’re the good guys, so we have the good beliefs. But the problem is if some of the good beliefs turn out to be false. And I think that this is pretty likely to happen, because, you know, you want to kind of double down on what you believe. Like if you’re a communist, to fit in really well, you almost want to have an exaggerated belief about how good communism is; you don’t want to notice all the ways that communism doesn’t track the world very well. Or if you’re hanging out with all the capitalist libertarians, you have a lot of incentives to sort of double down on “capitalism is good for everything; it’s never bad” — and not really track cases where it’s just a disaster.

So what starts to happen if some true beliefs end up in the bad-guy category? Like, the bad guys believe that true thing, and us good guys believe the false thing. Of course, you don’t know that; that’s not what it feels like on the inside. But if that actually happens, now you get this extremely strange incentive to not look at the world too closely. So suddenly you have a group that’s convinced they’re the good guys, they’ve got some false beliefs, and they’re actively refusing to look at the way the world really is.

I think this is where groups can get really scary and do really horrible things — even though everyone is well intentioned, and they’re really trying to do good, they set out to do good. Yet they severely damage the world. And I would suggest that a bunch of groups in history have gone through a series of steps similar to this.

Rob Wiblin: Yeah. On the last one, something that might be a little bit more recognisable is just that you know on some level that belief X — that’s really important to your group — might not be 100% right. You worry that if you look into it too much, then you might start to have doubts about it, or have a more complicated nuanced view about the issue that other people won’t appreciate. So you’re just like, “Maybe I won’t look into that. Maybe I’ll just say the thing that helps me to get along with people, and not think about this issue too deeply.” I think that’s a pretty relatable action.

Spencer Greenberg: Exactly. Actually, in addition to this leading groups to do a great deal of harm, I think there’s an internal harm this can cause, psychologically, where people are strongly incentivised to not look at reality. There’s this doublethink that people start to engage in, of just like all the evidence around that thing being true is suppressed internally, and you’re not allowed to think about it, and you feel guilty and bad.

I think you can see this in some religious communities, where people who are doubters feel extreme negative psychological effects from the doubt, and then they’re trying to pretend that they’re not doubting. You can see this in ideological groups, like activist groups, where an activist starts realising, “Maybe we’re actually causing harm, but no, no, no, no, we can’t be causing harm.” So it can be very intense.

Rob Wiblin: I think that probably basically all human groups go somewhere along this spectrum. The early stage, I think a very frequently studied finding, or a regularity in human behaviour is that if you take a bunch of people who agree about some topic where many other people disagree, and you get them to just sit in a group and talk about that topic for a while, they come away with a substantially more extreme view on that topic — where they think that they are even more right now than they were coming in.

So if you ever just get a group of people who have any particular idiosyncratic opinion, and they start socialising together — as people generally do, because they like to hang out with people who share their interests and share their views — then you’re at least going to get this initial effect of pushing your views out to be more confident about the things that you have in common with the people around you. Then some groups continue along this track towards the second and third stages.

Spencer Greenberg: What you just said reminds me of an idea. I think Eliezer Yudkowsky had this idea of evaporative cooling — basically you can get this funny effect when you’ve got this intense group, where the people who are the most believers can end up driving out the people that are kind of on the fence. But then if you think about what that does to the group, the average intensity then goes up, and it kind of keeps ratcheting up as the least intense people get driven out.

Rob Wiblin: So that’s another way that a group can become more extreme in its average opinion over time. And I guess it’s a somewhat positive feedback loop there, where, as the group diverts away from the mainstream, then the people who are closer to the mainstream might preferentially leave. Of course, the positive feedback loop can’t be that extreme, otherwise we would see this phenomenon being even more powerful than it actually is.

Spencer Greenberg: And clearly, you also suddenly see the sort of opposite effect for fan clubs and things like this: originally, fan clubs are just the weirdos who discovered this niche band, and they’re just showing up to every show, and why are they so obsessed with it? And then the band becomes popular, and now the fan club is people who are like, “Yeah, I occasionally listen to them. I love them.” And the original fans are like, “Fuck you. We’re the real fans.” So that sounds like the opposite, where it either gets more extreme and intensified, or it becomes mainstream and watered down, and now it’s just boring, right?

Rob Wiblin: Yeah. Of course there’s other things that temper this phenomenon. Just reality intervenes and people actually observe the world, and then that causes them to moderate their extreme views. Or they want to fit in with their family. So I think this is a very general phenomenon that you are describing that ranges from moderately powerful to extremely powerful, depending on the social dynamics of a group.

But the reason I see this as an interesting decision-making problem is that if you’re ever trying to do something significant — be an activist on some topic, start a startup, something that’s beyond an individual — then you almost certainly need to find other people who share your goal and share your interests and share your passion about some topic in order to collaborate with them, because most important stuff is beyond one person, and you’ll be able to accomplish more as a group.

But then as soon as you have a group of people who have been selected to have some common goal, then all of these phenomenon can potentially start to kick in and potentially cloud your judgement — or at least cause you to double down on the goal and the beliefs of the group, where originally you might have been more wary to do so.

So avoiding this is not trivial. In order to avoid this, you could just make sure that you don’t hang out with people who share your opinions. But then how do you get stuff done in the world? It seems like actually it’s not so simple to navigate. And maybe to some extent you need to tolerate this phenomenon in moderation, and mostly be wary of the further stages where it gets really toxic and hard to escape from.

Spencer Greenberg: I think a simple rule of thumb — that’s not always easy to follow — is make sure to hang out with some normies too. Whatever weird group you’re in, if you’re at the point where you’re only hanging out with your weird group, that’s probably quite unhealthy. And just more typical people actually can really ground you when you’re like, “Well, of course, blah, blah, blah,” and they’re like, “What? Dude, you’re getting weird.”

Rob Wiblin: Yeah, I completely agree with that. There’s a whole lot of benefits to not just hanging out with one social group, and not just hanging out with people who have a particular set of views. It also seems like the thing where you start demonising and criticising people who don’t share your peculiar opinions, that seems to me where you’re starting to get to the dangerous stages to a greater extent.

Spencer Greenberg: Yeah, absolutely. I think if you have an ingroup, but the ingroup is kind of like, “Yeah, other people are fine. There are lots of other good groups, blah, blah, blah, we don’t really have an outgroup. There’s nobody we’re demonising,” then I think that tends to be a healthier equilibrium. Whereas if it’s, “They’re the bad guys ruining everything, and we’re the good guys on a crusade,” part of why that gets scary is because if they really are the bad guys, you might believe that you’re justified in doing bad things to them. And that’s where it can really get off the rails. Like, “We’re the good guys, so it’s OK if we do some kind of sketchy stuff, because we’re good and they’re bad,” right?

Rob Wiblin: Yeah, yeah. I suppose you could self-consciously go in for this, where you say, “Well, we wanted to start a company with goal X and using method Y, and so we came together because all of us are much more positive, much more optimistic about this than anyone else. And that’s helped us with team cohesion. It causes us to be able to work harder because we all are perhaps a little bit deluded in favour of this project. And as we spend more time together, probably we might even become even more deluded in favour of our goal, but we’re willing to accept that. And on some higher level, I appreciate that we might well just be wrong. I’m happy to say that we might be wrong, and at some point we might have to abandon the whole thing. But for now, we’re going to enjoy inhabiting this kind of group dynamic that keeps us motivated and makes us good friends and helps us do something difficult.” That seems like a plausible middle ground to me.

Spencer Greenberg: And for really difficult things — especially early on, when they’re very likely to fail — it’s really important to have people around you that are fighting with you and believe in you. Think about startups. Why is it that startups can feel really culty? Well, they’re trying to do this crazy thing and they have to really believe in it. And if they didn’t believe it, they would have just given up by now, because it’s so hard, right?

Rob Wiblin: Yeah, exactly. There were lots of less culty-seeming startups, and they all disappeared because the people quit.

Spencer Greenberg: Right. So there’s a bit of selection pressure towards a group that is unified and fighting for the same crazy thing, and has a bit of a weird belief about something.

Rob Wiblin: Sports teams are a good example of this, where people get really into their groupy mindset, but then usually they’re also able to walk away from it. And they don’t actually think that people who support other sports teams are bad. It’s more indulging this as a guilty pleasure almost.

Spencer Greenberg: Yeah. It’s really interesting to think about sports as simulated war, because we, as more civilised humans, think like, “Oh, it’s bad to club people to death. We shouldn’t do that. But if you club them to knock them down to steal the ball, OK, that’s acceptable.” And like, yes, we don’t really hate the people on the other side, but we can pretend to hate them. We can play-hate them. If we meet one of them, we could be like, “You’re awful, blah, blah, blah.” But everyone kind of knows it’s a game, because we have this very warlike internal mechanism in our brain, and we want to indulge it.

I mean, I don’t, because I don’t like watching sports, but most people… No, I’m just kidding. I actually do mixed martial arts for fun, so I can understand the warlike side. But like, we have this thing, and it does feel good as a human to explore it.

Rob Wiblin: Totally. OK, yeah. On the decision-making theme, you have this really nice talk called “Make decisions like you climb mountains,” with various common pitfalls that people run into when they’re trying to make big decisions or planning out their life. Unfortunately we’ve run out of time to do that one today, but we’ll stick up a link to that for people who would like to check it out.

Spencer Greenberg: If only your podcast episodes were longer, Rob, then we could fit in more.

Rob Wiblin: It’s a common complaint. People are just like, “Why so short? Why so short?”

Valuism [02:08:31]

Rob Wiblin: Let’s move on to an approach to life that you’ve recently started outlining on your blog, which you’ve dubbed Valuism. I honestly can’t believe that Valuism wasn’t already taken by some other idea. Spencer, how have philosophers or scientists not grabbed this term before? It’s an excellent piece of branding.

Spencer Greenberg: First of all, I give credit to Kat Woods, who gave me the idea of calling it Valuism. So thank you, Kat.

Second of all, I think there are a couple things out there called Valuism, but they’re not well known. They’re not popular. So I actually purposely avoided reading about them because I didn’t want to accidentally reproduce them. I wanted to just be like, “I know there’s something, but I have no idea really what they are, so let me just go do my own thing.” And I think, having done that, what I’m calling Valuism is quite different from these other things. And they’re quite obscure, so I don’t think there’s too much danger of them being confused.

Rob Wiblin: Yeah, OK. Obviously we’ll stick up a link to your blog post for people who’d like to read the full series, but first, what is Valuism in a nutshell?

Spencer Greenberg: Valuism is my life philosophy, and I am proposing that you consider it as a life philosophy. And basically, in a nutshell, it just says: Figure out the things you intrinsically value, and then try to use really effective methods to create more of what you intrinsically value. And that’s it.

So it sounds incredibly obvious and simple, but I would argue a few things. One, I would argue this is how very few people live — so I would argue it might seem common, but it’s actually not. Second, I think there’s a lot of nuance in what we’re talking about exactly. What do we mean by “figure out your intrinsic values”? And third, I think there’s some nice advantages to living this way.

Rob Wiblin: What is Valuism defined in contrast to? What are some examples of people who aren’t following Valuism?

Spencer Greenberg: I think the most common thing that people do is they just do some combination of: following reasonable heuristics, being influenced by their culture and their parents and their friends and social circles, setting goals that sound nice to them and then trying to achieve them, and using their intuition as a guide. I think that’s a lot of how people live. I don’t mean everyone — there are people that are trying to be more principled, like have a philosophy — but I don’t think most people have a philosophy.

But some people have a philosophy, so let’s talk about some popular philosophies. One is effective altruism; it’s kind of a relevant philosophy. Some people try to use that to guide them. And many, but not all, effective altruists are utilitarian, so that’s an even more specific philosophy telling you how to live. So that’s an alternative. Then of course there are a lot of different religious philosophies that tell you how to live: Christianity tells you how to live, Islam tells you how to live, and so on.

Rob Wiblin: And I guess there’ll be other moral philosophies that tell you what virtues are important to cultivate, or tell you what rules you shouldn’t break, or tell you you might try to make the world more just and have a particular conception of justice. So those would be more formal frameworks that one might choose to decide what to value or what to do.

Spencer Greenberg: Exactly. Although, funnily enough, I almost never encounter anyone who views those as their life philosophy. I mean, there are philosophers — literal philosophers, academic philosophers — who do believe in those systems, but I very, very rarely will encounter someone who’s not an academic philosopher who’s like, “Yes, I am a Kantian,” or “Yes, I am a virtue ethicist.” You know what I mean?

Rob Wiblin: Yeah, that’s interesting. Part of that is presumably because of the groups that you move in — you’ve somewhat opted to move in groups that are more inclined towards consequentialist-flavoured life philosophies. But do you think there’s something about those other philosophies that make them less grabbing in general?

Spencer Greenberg: As far as I’m aware, there is not a large movement to attempt to spread them. So I don’t know of the virtue ethics equivalent of effective altruism that’s like, “Virtue ethics! Everyone be virtuous!” You know what I mean? I do think that aspects of virtue ethics appeal to a lot of people, and they’re like, “Oh, yeah, I like that. I like the idea of trying to be a good person,” et cetera, but it just doesn’t feel coherent and kind of unified.

Rob Wiblin: Yeah, I think if one casts a somewhat broader net, then we might be able to find, for example, some spiritual traditions that look a little bit like virtue ethics — or at least place a lot of value on cultivating virtue, even if that’s not the only thing that’s going on. And I guess there’s political movements that care about justice a lot and have a particular conception of justice, where people can get really, really into those.

Spencer Greenberg: Yeah, that’s absolutely right. And I think some of the ancient Greek philosophies were kind of in this direction of how to live a good life, and would have a specific worldview around that.

Rob Wiblin: Yeah, yeah. There are people who identify as Stoics, for example. And some flavours of Buddhism would have some particular conception about what kind of character ought one cultivate.

Anyway, so there’s two different ways that you can differ from Valuism it sounded like. One is having a more fluffy decision-making process, or a nonspecific one — where maybe you act a bit more on habit, or mimicry of the rest of society, or not really deeply considering your goals. And the other one is maybe even more structured, or feeling like there’s external constraints on you that you ought to follow — where it’s not just about how you feel or what you value, it’s about what you ought to value. Is that right?

Spencer Greenberg: Exactly, a lot of the spiritual and religious worldviews say, “This is the right thing to value. This is the important thing. The other things don’t matter.” And I think that’s true of most of these philosophies; they are much more rigid, and they’re kind of telling you what’s important.

Rob Wiblin: So what are your concerns with the former?

Spencer Greenberg: Basically, I worry that the typical thing people do is they’re trying to achieve something, but they’re not really sure what it is. And they’re letting all these forces influence them, but they’re not being very careful about how much forces are influencing them, and so it ends up being a not very self-directed way of living.

To contrast this with Valuism, Valuism says there’s a lot of things in the world that you value. In other words, you can think of your brain as having different operations. One operation is predicting what’s about to happen. Another operation is telling you how valuable a state of the world is. You can imagine a state of the world where everyone’s happy: that’s really valuable. You can imagine another state of the world where everyone’s being tortured: that’s really disvaluable. So you’ve got this basic valuing operation: you can consider states of the world and value them.

But most of the things that you value, you don’t value for their own sake — you only value them as a means to other ends. The classic example of this would be cash, like bills: do you value them intrinsically? Do you value them for their own sake, or do you merely value them as ends to other things? Well, I think the answer’s clear. You value them as ends for other things. If they couldn’t get you anything else — you couldn’t use them to buy anything, you couldn’t use them to get status, you couldn’t even burn it to stay warm — it’d be totally valueless, right? Imagine you’re on a desert island and someone gives you a bunch of cash, but you can never leave the island. What’s the point? So we use cash to get other things, so it’s valuable — but it’s not intrinsically valuable.

So what is intrinsically valuable? We actually did a bunch of research on this. We created this thing we called our Intrinsic Values Test — you can do it on clearerthinking.org. It asks you over 90 questions about different things you might intrinsically value, and it has you rate them. And it gives you a little training module at the beginning to teach you about what an intrinsic value is, to make sure you understand it. When we did this, we were able to analyse what sort of intrinsic values people have. We’ve categorised them, and we ended up finding about 22 categories of intrinsic values. We’ve looked at, for Westerners, what are the most common ones, and so on.

Rob Wiblin: So intrinsic values are the things that you value, even if they’d bring you nothing else. And I guess you think that many people — who are acting more on intuition or habit or conformity with the rest of society — are missing out on chances to create things that they really intrinsically value, because they’re perhaps not spending enough time introspecting or analysing, “What is it that I personally really care about, and how could I get more of that?”

Spencer Greenberg: Right. So let’s say you find out that actually you’re doing a lot of social mimicry. I think most people, on reflection, would be like, “I don’t really want to engage in social mimicry. It’s not something I value. So insofar as I’m being influenced by it, maybe I should reduce that.” Or let’s say you really, really value reducing suffering in the world, but the way you set your goals are just based on what the people around you say, or your parents tell you, or the things that happen to be in front of you. I think upon reflection, most people are like, “Yeah, wait, no, I really care about reducing suffering in the world. Is this goal I’m setting actually going to reduce suffering?”

So I think upon reflection, you start realising that a lot of the forces that are not related to your intrinsic values, you probably should be reducing the effect of them. And then you should probably be a bit more strategic about trying to get the intrinsic values you have.

Rob Wiblin: Yeah, yeah. So just imagining what defences would people offer of the first way of living, where you don’t spend all that much time reflecting on your intrinsic values and then thinking through how to get much more of them.

One objection might be just being so calculating and being so deliberative about your life is maybe the wrong way to live for one reason or another.

Another might be a more pragmatic concern that really deeply analysing all of your intrinsic values and doing goal factoring — where you try to figure out “Why am I doing this on some really fundamental level, and should I continue doing it or could I achieve my goal better some other way?” — in theory that might work well, but in practice people will make mistakes, or they’ll find this mentality to be alienating somehow. And so in practice, human beings — given our weaknesses and our particular predilections — instead, it’s actually good a lot of the time to just go with the flow and just copy other people, because society as a whole has learned a generally good way to live. And maybe you should sometimes dabble in this more deliberative analysis, but you shouldn’t go all in because you’ll mess it up.

Spencer Greenberg: Yeah. I think there’s something to that critique. It is true that even if a method works in theory, it doesn’t necessarily work in practice, and you could be worse off trying to use it.

And we’ve talked in the past about Bayesianism: yes, in theory, Bayesianism tells you how to update your probabilities. But does that mean you should be trying to explicitly calculate using Bayes’ rule? Not necessarily. Maybe that actually makes your decisions worse. So I’m open to that.

What I would say is that I find that when people start clarifying their intrinsic values, they genuinely find it clarifying. I’ll give you one example. I have a friend who was very, very unhappy. She was very depressed, and I was trying to help her explore why she was depressed. She realised it had something to do with her boyfriend, but she was confused about it, because she was like, “He’s such a great guy. He has all these great qualities.”

So I had her do this little exercise, where I had her list her intrinsic values, and then I had her write down all the good things about her boyfriend. And the amazing thing was that there was almost no overlap.

Rob Wiblin: Wow.

Spencer Greenberg: And then I had her write down her parents’ values, and they almost perfectly matched her boyfriend’s. And it was like, “Oh, yeah…” Once she saw this, she was like, “Holy shit, I’m dating the person that my parents –”

Rob Wiblin: “– think I should be with.”

Spencer Greenberg: “– would’ve chosen for me, but I’m super unhappy and miserable. And I didn’t really understand why, because he’s such a great guy.” And he is a great guy in many ways, according to many worldviews.

So yeah, that’s kind of been my experience: that people find it clarifying. As they dig into their intrinsic values, they start realising, “Wait, maybe I should tweak this thing over here, because it’s not really getting me towards what I care about.”

I do agree that you can over-optimise or you can overthink things. Absolutely. But then I would add: the first piece of Valuism is understanding your intrinsic values. The second piece is then using effective methods to increase the amount of your intrinsic values you’re producing. And the effectiveness is key there. If you’re using a method and it’s not effective, stop using it — even if in theory, it’s a great idea. It’s like, yes, we have to think about what actually works in practice.

Rob Wiblin: OK, let’s park that side of things and consider the other way in which people deviate from Valuism, which is trying to follow more externally or philosophically imposed goals in their life. What concerns do you have about that?

Spencer Greenberg: It’s a really interesting question, and it depends on what worldview you have. If you are religious, and you’re absolutely convinced that your religion is the one true religion, and all the other religions are wrong except for yours, then I’m not going to convince you. But chances are I do think you’re wrong. I think you’re making a mistake. I think you believe something false.

Now, I could be wrong, certainly, but I think one of the drawbacks of these kinds of rigid worldviews is that if they’re wrong, they’re just pushing you to do something that just is not aligned with reality. So from the point of view of anyone who believes in one of these religious worldviews, all the other religious worldviews are mistaken — and they just happened to luck on the one true worldview, and the 99% of other people with different worldviews are just doing something weird.

So one thing that’s nice about Valuism is that it gives you an approach to a life that doesn’t involve thinking that you’ve figured out the ultimate right answer of what’s true.

I would also put in this bucket utilitarianism, because I’ve witnessed what I think is a pretty strange phenomenon, where I know a bunch of people that say they’re utilitarian — often they’re coming from effective altruism, but not always — but when pushed on it, they’re like, “Yeah, but I only think it has a 20% chance of being true. But I’m still living my life by this theory.” And I’m like, “OK, but that’s a little bit weird. You actually think it’s more likely that this is totally false, and yet you’re trying to live by this set of rigid standards.”

So anyway, I think it gives you another approach to thinking about what to do with your life that doesn’t involve being really confident in the right answer to life, the universe, and everything.

Rob Wiblin: Yeah, yeah. Some people have taken a big interest in moral philosophy. They might end up thinking that some particular school of thought of moral philosophy is very likely to be true, and there are in fact objective things that one ought to value and actions that one ought to take regardless of how one feels about them. But let’s bracket those folks for now, because that would be a huge conversation in itself.

Rob Wiblin: I think you’re particularly targeting this series at people who don’t really believe that there is objective morality — that there are objectively right actions in this way — but nonetheless kind of think that they want to or do act as if there are in fact those standards, even though in fact there aren’t. Do you want to explain what you think about how those folks would explain what they’re doing?

Spencer Greenberg: So I have this essay basically exploring, well, what do people really mean? Is it really a coherent position to be like, “I don’t believe in an objective moral truth, but X is the right answer”? You know, “I’m utilitarian,” or “I’m a Christian,” or whatever. It seems contradictory at face value to say the only thing that matters is maximising global utility, but then denying that there’s an objective answer to what matters.

And there’s different ways to unpack what people might be doing there. One thing is they might be making a claim about their psychology. They might be saying, “I don’t think there’s an objective answer to what’s valuable, but I’m claiming my psychology only cares about maximising global utility.” And there, if that’s their claim, I would just say that they’re wrong. I would just say that there are no humans like that. I’ve looked for them. And every human where I’m like, “Wow, this is the most utilitarian person I’ve ever met,” then it’s like, “No, clearly they value their close friends more than a random stranger, and clearly they value their own suffering at least a tiny bit more than a random person in another country.” So I think they’re just misevaluating their own psychology.

Rob Wiblin: To someone who doubts that, is there any kind of undeniable example that you can give? Maybe the fact that they don’t value the wellbeing of all beings in the universe completely equally is implicit in the actions that they would take. And also it would just be an extraordinary, insane case for evolution to produce someone who had such a very peculiar set of values, in terms of just what actions their brain is designed to take.

Spencer Greenberg: Yeah, I think both of those things are right. I think also if they just introspect carefully on different thought experiments, like: You can actually either save your most loved one in the world, or you can let on average 1.1 other people — strangers that you’ll never see — die. You really introspect on that, and you’re really telling me you would let your most loved person in the world die, and that actually feels equally valuable?

I came up with a bunch of thought experiments to kind of test this. One is around equality. Because if someone really believes that the only thing that matters is the sum of utility, as many utilitarians claim, then redistributing it is valueless. And that means that they would prefer a world where there’s one being that has all the utility in the world to a world where the utility is all equally distributed, but it has the tiniest bit less total sum. And I think almost nobody actually agrees with that. Actually, upon reflection, people are like, “Actually no, the world with one being who has all the utility is not a great world.”

Rob Wiblin: So what’s the alternative move that someone could make to explain what they mean by that?

Spencer Greenberg: If you think, “There’s no objective moral truth, but I think I still live as though there is” — whether it’s utilitarianism or some other worldview — then another thing I think sometimes people mean is that they feel logically convinced. They’re like, “If I view myself as a psychological creature, there’s psychological facts about these things I value that are not just maximising utility, but I’m not logically convinced of those. Whereas I’ve heard some arguments that convince me that utility matters and the other things don’t.” But then, what gets to me about that argument is like, “Wait, but you’re saying you’re logically convinced, but you also don’t believe that it’s objectively right. So isn’t that a contradiction? What do you mean you’re convinced? So what are you convinced of exactly?” I think it’s kind of hard to answer what they’re convinced of.

Rob Wiblin: This is getting to the limits of my understanding of moral philosophy or metaethics. I think one way that you might try to explain it is that they’re saying that “I am passionate about this particular language game or this particular conceptual game, and within the rules that I’m going to establish for what I mean by right and wrong, I think that those rules imply this particular conclusion. However, it’s not mandatory to accept this particular game or this particular framework for thinking about these terms.” Does that make sense?

Spencer Greenberg: Yeah. Then my question would just be, “OK, why did you adopt those rules?” Now, I do think there is a niche philosophical view that tries to defend this, but the reason I don’t address it too much is just because I think almost nobody believes it. But when I’ve talked to philosophers about this, they’ll sometimes bring it up, that there is a philosophical view that says there’s not an objective right answer, but because of facts about the world, all rational agents who had enough information would all come to the same conclusion about what’s right to do. And this kind of view, because very few people believe it, I don’t think that’s really what’s driving their behaviour. But some people think this is true.

Rob Wiblin: Yeah, OK. So what in general would you say to folks who are thinking or living this way now? They don’t really believe that there’s something objective that they have to do, but nonetheless they’re trying to fit their life into the strictures of a particular moral philosophy?

Spencer Greenberg: What I would say, first of all, is I would try to introspect on what you actually value and see if you really do value all beings equally. If you really put zero value on equality. If you really put zero value on truth, you’d be happy to have a world where everyone’s completely deluded: everyone’s happy because they think they’re winning their Nobel Prize at every moment. In fact, nobody’s winning the Nobel Prize, and they’re just totally tricked and they’re tricked against their will. Everyone was forced into these machines that delude them because they would be happier. Is that actually, according to your values, a better world?

And I would say that most people, if they do the careful introspection, they’ll realise that they do have these psychological values that differ from this sort of rigid rule-based system that they think is the answer to these questions.

Rob Wiblin: I could imagine someone saying that on your worldview of Valuism, there’s no objective reasons why I ought to do anything. There is no “should” here; there’s not even any reason why I should try to achieve the things that I psychologically value. Someone might be tempted to jump all the way to nihilism, or just say, “Well, there’s just no reasons for action whatsoever.”

Spencer Greenberg: Yeah, absolutely, there is no “should” in Valuism. There’s no reason you should be a Valuist, because should either implies an implicit goal — “You should do this if you want to achieve this other thing” — or it implies some kind of objective moral truth: “You should do this, full stop. Because it’s objectively right.”

And there is none of that in Valuism. I would just say that if you feel like you could benefit from a life philosophy, here’s one to consider. It’s totally aligned with what you value, and also it’s about being effective at getting the things you value.

I also just want to point out that I think there are a bunch of side benefits to Valuism — reasons you might adopt it that just will benefit you. One is that I think that people often pursue things that are instrumentally valuable without realising it. So someone will be like, “I really want to help people, so I’m going to become a doctor.” And then they spend years and years and years becoming a doctor. But then because they didn’t carefully track the difference between their values and their intrinsic values, they’ve invested all this time only to discover that now they’ve ended up being a kind of doctor where they don’t actually feel like they’re helping more people than would’ve been helped if they hadn’t gotten into it, or something like this.

And so it’s this careful separation. I think almost everyone upon reflection will realise, “Wait, if I’m going to seek my values, it should be the intrinsic ones, not the other ones.” But because we don’t carefully track it, we get in these weird situations where we pursue our non-intrinsic values.

Another benefit I would point to is that I think a lot of times the most difficult decisions in our lives are when our different intrinsic values are kind of pitted against each other. And I’ve found it really helpful, working with friends and stuff, to be like, “OK, you’re making this really difficult decision. Let’s actually break this down and think about what intrinsic values of yours are at stake.” And once we write them down, I find that can be really clarifying for them. They’re like, “Wait a minute. So if I choose A, it supports these intrinsic values of mine, but if I choose B, it supports these other ones. That’s why this is so hard. There’s no single best answer; it’s really just a process of balancing my intrinsic values.” So I think that that can be quite clarifying and helpful to people.

Rob Wiblin: Yeah. Are there any other psychological benefits that people get from adopting this kind of perspective on what they’re doing? A big benefit that I could imagine compared to the life that one might lead — where one’s trying to follow a somewhat externally imposed set of rules, or an externally imposed philosophy; or even a philosophy that one part of your mind really embraces, but that the rest of your mind really isn’t intuitively on board with — is just that you might be much more motivated by what you’re doing. Because you’ve introspected and tried to figure out, “What do I value?” And that’s going to be very related to what is intuitively motivating on a moment-to-moment basis. So you’ll get into flow with your work more, or you’ll find yourself procrastinating less, for example — because the stuff you’re doing is stuff that you do truly care about.

Spencer Greenberg: That’s a really great point. And I don’t know of any studies, obviously — this is something I’m pushing; I just released this essay — but it does intuitively make sense to me.

Logan wrote a really nice article on LessWrong that was about burnout — just independently wrote it, not knowing about my Valuism thing, and I didn’t know about her thing — talking about how she thinks a bunch of burnout occurs because of people doing things that are out of alignment with their values.

And I think there’s really something to that. When you’re making a lot of your life decisions based on what other people want you to do, or what you feel like your social expectations are from your culture, and they’re out of alignment with your values, I think that’s really grinding psychologically. Whereas if you’re doing something because you fundamentally value it, I think that is motivating, and it just has a much healthier psychological vibe to it.

Rob Wiblin: I think that this is what’s going on when people say, “I act as if utilitarianism was true, even though I’m not a moral realist”: they’re saying that one part of their mind — which has many different aspects to it, many different ways of thinking — the rational analytic philosophy part kind of embraces this approach. But then of course, the rest of them — including their actual body, and the part that cares about their needs, and cares about their social reputation and so on — is not on board with this, or is only partially on board with it. And you end up with a mind kind of at war with itself, or a mind kind of divided against itself with different goals, where one of the goals is claiming supremacy and trying to dominate the others.

This does seem like it can create a lot of psychological tension for someone, as you might imagine. And people have suggested all kinds of different ways of dealing with this pragmatically. I guess embracing Valuism is one approach. Another is to say that maybe on some level, you think goal X is most important, but you should recognise that you contain many different personalities and many different interests, and they should all talk to one another and reach some kind of accord, reach some kind of agreement on a lifestyle that they can all be on board with, where they’re all getting something out of the bigger picture.

Spencer Greenberg: I think Valuism is pretty compatible with some of these views of related internal family systems, where you’re like, “I am made of all these different agents with different goals.” I mean, I don’t think we should take that literally — I don’t think you literally have all these different agents inside of you — but I think there’s something to that, where it’s like, “I actually have different parts of me that want different things.:” And you can think of it as like, “I have different values.”

And I do think that when you try to squish one of the values and let another value win, often people find that that other value is like trying to come out, and it’s actually weird and unhealthy. Whereas if you’re just like, “OK, you both matter — let’s find a reasonable balance,” that tends to be a nicer, healthier way to live, I think.

“I dropped the whip” [02:35:06]

Spencer Greenberg: If you wouldn’t mind, I’d love to end on something totally out of character, which is to read a poem.

Rob Wiblin: Completely. Yeah, go for it.

Spencer Greenberg: So Tyler Alterman wrote an interesting essay about how he burned out in trying to help the world. And then Kat Woods took little bits and pieces of his essay and turned it into a poem, and I ended up including it at the end of my Valuism piece because I just thought it’s just really lovely.

Rob Wiblin: Yeah, please go for it.

Spencer Greenberg: So here we go. It’s called, “I dropped the whip.”

Totalized by an ought.

I sought its source outside myself.

I found nothing.

The ought came from me,
an internal whip toward a thing which,
confusingly,
I already wanted –

to see others flourish.

I dropped the whip.

My want now rested,
Commensurate,
Amidst others of its kind – !
terminal wants for ends-in-themselves:
loving,
dancing,
and other spiritual requirements of my particular life.

To say that these were lesser seemed to say,
“It is more vital and urgent to eat well than to drink or sleep well.”

No –
I will
eat,
sleep, and
drink well
to feel alive;

so too
will I
love and
Dance and
Help

Rob Wiblin: My guest today has been Spencer Greenberg. Thanks so much for coming on The 80,000 Hours Podcast, Spencer.

Spencer Greenberg: Thanks so much for having me, Rob. This was great.

Rob’s outro [02:36:40]

Rob Wiblin: As always we have plenty of new content for you on our website 80000hours.org.

Some recent pieces we’ll link to include:

If you’d like to stay up to date on our written content, some of the top new roles on our job board, and what the team here is thinking about, you can get on our newsletter at 80000hours.org/newsletter. You’ll get about one email a week.

We’ve got over 200,000 people on the list these days and if you join you’ll be able to choose from 3 different free books that we’ll pay to mail a physical copy out to you.

All right, The 80,000 Hours Podcast is produced and edited by Keiran Harris.

Audio mastering and technical editing by Ben Cordell and Milo McGuire.

Full transcripts and an extensive collection of links to learn more are available on our site and put together by Katy Moore.

Thanks for joining, talk to you again soon.

Learn more

Improving decision making (especially in important institutions)

Science policy and infrastructure

Data science (for skill-building & earning to give)

Found a tech startup

Related episodes

October 17, 2017

#11 – Spencer Greenberg on speeding up social science 10-fold & why plenty of startups cause harm

Listen now

August 7, 2018

#39 – Spencer Greenberg on the scientific approach to solving difficult everyday questions.

Listen now

May 15, 2018

#30 – Eva Vivalt on how little social science findings generalize from one study to another

Listen now

February 2, 2022

#120 – Audrey Tang on what we can learn from Taiwan's experiments with how to do democracy

Listen now

September 13, 2017

#7 – Julia Galef on making humanity more rational, what EA does wrong, and why Twitter isn't all bad

Listen now

July 28, 2021

#106 – Cal Newport on an industrial revolution for office work

Listen now

June 21, 2021

#103 – Max Roser on building the world's first great source of COVID-19 data at Our World in Data

Listen now

November 20, 2017

#15 – Phil Tetlock on how chimps beat Berkeley undergrads and when it's wise to defer to the wise

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

Highlights

Experimental evidence on how to *actually* go to the gym more

The factors that predict success

Four improvements in social science research

Biggest barriers to better social science research

Importance hacking

Identifying p-hackers

When to go with your gut

Articles, books, and other media discussed in the show

Transcript

Rob’s intro [00:00:00]

The interview begins [00:02:16]

Social science reform [00:08:46]

Importance hacking [00:18:23]

How often papers replicate with different p-values [00:43:31]

The Transparent Replications project [00:48:17]

How do we predict high levels of success? [00:55:26]

Soldier Altruists vs. Scout Altruists [01:08:18]

The Clearer Thinking podcast [01:16:27]

Creating habits more reliably [01:18:16]

Behaviour change is incredibly hard [01:32:27]

The FIRE Framework [01:46:21]

How ideology eats itself [01:54:56]

Valuism [02:08:31]

“I dropped the whip” [02:35:06]

Rob’s outro [02:36:40]

Learn more

Improving decision making (especially in important institutions)

Science policy and infrastructure

Data science (for skill-building & earning to give)

Found a tech startup

Related episodes

About the show

What should I listen to first?

Experimental evidence on how to actually go to the gym more