#147 – Spencer Greenberg on stopping valueless papers from getting into top journals
You can get your result in a top journal by tricking the reviewers into thinking that it was a valuable or interesting finding when in fact it was essentially a valueless or completely uninteresting finding.
And this only works if you can trick the peer reviewers, because it’s not like they want to publish everything. Peer reviewers can be brutal; a lot of peer reviewers reject stuff. So unless you’ve tricked them into thinking there’s value when there’s not, this method won’t work. So it has to be pretty subtle.
Can you trust the things you read in published scientific research? Not really. About 40% of experiments in top social science journals don’t get the same result if the experiments are repeated.
Two key reasons are ‘p-hacking’ and ‘publication bias’. P-hacking is when researchers run a lot of slightly different statistical tests until they find a way to make findings appear statistically significant when they’re actually not — a problem first discussed over 50 years ago. And because journals are more likely to publish positive than negative results, you might be reading about the one time an experiment worked, while the 10 times was run and got a ‘null result’ never saw the light of day. The resulting phenomenon of publication bias is one we’ve understood for 60 years.
Today’s repeat guest, social scientist and entrepreneur Spencer Greenberg, has followed these issues closely for years.
He recently checked whether p-values, an indicator of how likely a result was to occur by pure chance, could tell us how likely an outcome would be to recur if an experiment were repeated. From his sample of 325 replications of psychology studies, the answer seemed to be yes. According to Spencer, “when the original study’s p-value was less than 0.01 about 72% replicated — not bad. On the other hand, when the p-value is greater than 0.01, only about 48% replicated. A pretty big difference.”
To do his bit to help get these numbers up, Spencer has launched an effort to repeat almost every social science experiment published in the journals Nature and Science, and see if they find the same results. (So far they’re two for three.)
According to Spencer, things are gradually improving. For example he sees more raw data and experimental materials being shared, which makes it much easier to check the work of other researchers.
But while progress is being made on some fronts, Spencer thinks there are other serious problems with published research that aren’t yet fully appreciated. One of these Spencer calls ‘importance hacking’: passing off obvious or unimportant results as surprising and meaningful.
For instance, do you remember the sensational paper that claimed government policy was driven by the opinions of lobby groups and ‘elites,’ but hardly affected by the opinions of ordinary people? Huge if true! It got wall-to-wall coverage in the press and on social media. But unfortunately, the whole paper could only explain 7% of the variation in which policies were adopted. Basically the researchers just didn’t know what made some campaigns succeed while others didn’t — a point one wouldn’t learn without reading the paper and diving into confusing tables of numbers. Clever writing made their result seem more important and meaningful than it really was.
Another paper Spencer describes claimed to find that people with a history of trauma explore less. That experiment actually featured an “incredibly boring apple-picking game: you had an apple tree in front of you, and you either could pick another apple or go to the next tree. Those were your only options. And they found that people with histories of trauma were more likely to stay on the same tree. Does that actually prove anything about real-world behaviour?” It’s at best unclear.
Spencer suspects that importance hacking of this kind causes a similar amount of damage to the issues mentioned above, like p-hacking and publication bias, but is much less discussed. His replication project tries to identify importance hacking by comparing how a paper’s findings are described in the abstract to what the experiment actually showed. But the cat-and-mouse game between academics and journal reviewers is fierce, and it’s far from easy to stop people exaggerating the importance of their work.
In this wide-ranging conversation, Rob and Spencer discuss the above as well as:
- When you should and shouldn’t use intuition to make decisions.
- How to properly model why some people succeed more than others.
- The difference between what Spencer calls “Soldier Altruists” and “Scout Altruists.”
- A paper that tested dozens of methods for forming the habit of going to the gym, why Spencer thinks it was presented in a very misleading way, and what it really found.
- Spencer’s experiment to see whether a 15-minute intervention could make people more likely to sustain a new habit two months later.
- The most common way for groups with good intentions to turn bad and cause harm.
- And Spencer’s low-guilt approach to a fulfilling life and doing good, which he calls “Valuism.”
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.
Producer: Keiran Harris
Audio mastering: Ben Cordell and Milo McGuire
Transcriptions: Katy Moore