Transcript
The AI myth that moved global markets [00:00:00]
Rob Wiblin: If you were following technology news in August last year, you almost certainly heard about this MIT study showing that 95% of generative AI pilots at companies were failing. This result was big enough to contribute to a Nasdaq selloff. People worked hard to come up with sophisticated explanations for how something this crazy could be true. And it was repeated by Forbes, Axios, The Hill, Harvard Business Review, and dozens of others, becoming a staple of elite opinion and one of the most enduring and widely cited statistics in the “AI is overhyped” backlash.
The problem? The study behind these headlines is incredibly weak, worse than you could imagine. And that headline is also a completely incorrect description of what it found — even taking the study entirely on its own terms.
The story behind this study will demonstrate that whenever you see a juicy headline, even one with an attractive conclusion, and even one purporting to come from MIT, it might just be complete nonsense.
The math was totally wrong [00:00:52]
The most important thing to know is that this report did not show that 95% of generative AI pilots at companies are failing, as almost all journalists claimed. Rather, the report found that, of all the organisations surveyed:
- 60% had investigated custom enterprise AI tools.
- 20% had gotten to the point of actually doing some pilot project with them.
- And 5% — of the total — had gone on to successfully deploy those tools in production.
So 80% percent of companies simply never piloted any custom, task-specific generative AI. Saying that 95% of them were failing is like saying 95% of Tinder users have failing marriages, when 80% of the people you’re talking about have never even gone on a date in the first place!
Moreover, according to their own survey, the primary reason why pilots didn’t progress to deployment wasn’t that they were going badly, but just the very familiar and generic “organisational unwillingness to adopt new tools.”
The bar for success was insanely high [00:01:46]
Now, the media is definitely at fault for putting a completely incorrect number in their headlines. But the report itself also makes this mistake about its own graph, referring to a 95% failure rate for enterprise AI solutions. Wrong: these results actually show that among the 20% who did in fact pilot a custom AI tool, about 25%, a quarter, were successful. That’s not really a low strike rate for a pilot project, and it’s a success rate five times higher than what everyone was told about.
And in reality, a 25% success rate is actually very impressive, once you appreciate the bar a project had to clear to count as a success in this study. To qualify as a success, an AI application had to show a “marked and sustained” productivity or profit-and-loss impact within six months. They don’t define “marked” or “sustained,” but a marked and sustained improvement in profitability or productivity within six months is obviously a high bar for any new project to clear. It’s widely understood that enterprise tech deployments often take years to show bottom-line impacts, even if they’re going quite well.
And notice that, by this standard:
- An AI project that merely breaks even: that’s a failure.
- A project that has benefited the company in some way that hasn’t yet markedly affected productivity or profits: a failure.
- And a project that’s on track to be profitable next year, but isn’t yet: equally a failure.
Do you get the sense that maybe these authors would prefer to find that these projects aren’t working? Well, we’ll come back to that.
But another key thing to keep in mind is that these projects were running on 2024 AI models — the ones that couldn’t figure out that a marble in a cup would fall out if you turned the cup upside down. AI was just hot garbage back then compared to what we have access to today. That a quarter of projects could easily turn a profit with models like that is actually kind of remarkable, if it is true.
The study ignores its own best finding [00:03:28]
But we haven’t even gotten to the weirdest thing about how this study’s results were described, because all the numbers we’ve been talking about so far refer exclusively to custom, task-specific AIs that companies develop or procure for some specific narrow use case.
But that’s not the most common, or indeed the best, way to use artificial intelligence. Most of us just use ChatGPT or Claude or Gemini to get our work done faster, or do it to a higher level of quality. And indeed, the report found that staff at over 90% of companies surveyed regularly use generative AI for their work tasks, in many cases multiple times a day.
So the headline result should actually be that a quarter of custom applications of AI rapidly turn a profit, and that almost all workers at the companies surveyed are using personal AI tools somewhere between regularly and constantly.
But for some reason, the report is decidedly unimpressed by these uses of AI, and it makes the comment that “these tools primarily enhance individual productivity, not profit and loss performance.” I may not be a tenured professor of business, but if your staff are each individually more productive, doesn’t that mean you can sell more products while hiring fewer staff, at least if you’re competently managing your organisation? And wouldn’t that provide some sort of opportunity to improve your profitability? I mean, giving a delivery driver a faster van only enhances their individual delivery speed, but does that mean that faster deliveries wouldn’t impact a company’s bottom line?
The sample was tiny [00:04:49]
A final weakness of this report is that while the Fortune article that launched this study into the mainstream claimed that the research was based on 150 interviews with leaders and a survey of 350 employees, the paper itself reveals that it’s really based on 52 interviews and a paper survey of another 153 people.
Why did Fortune inflate the sample size by two to three fold, prompt dozens of other news outlets to do the same, and never correct the record? I’ll discuss my theory for that next.
But the small sample size really matters, because it means the uncertainty intervals on their results are actually huge. The finding that 5% of companies have successful custom AI projects appears to be based just on those 52 interviews — which means that that 5% number is probably based on something like two or three actual companies out of 52.
Flip one or two interviews the other way and you’d get a completely different headline. The true rate could easily be three times higher or three times lower. You just can’t tell from a sample this tiny. Not that you’ll learn that by reading the report — or anything journalists wrote when covering it, for that matter.
The report wasn’t even available when it went viral [00:05:55]
But things get worse. In this case, not only was this paper not peer-reviewed or scrutinised by external experts, the article wasn’t even available to journalists around the world trying to cover the story so they could describe its claims precisely and think through whether they actually made sense. In fact, when that Fortune story went mega viral, the report describing the methods and results wasn’t publicly available anywhere.
The link to the underlying report, I kid you not, took you to a Google form where you could specifically request the PDF by giving them your personal details and explaining why you were interested in their organisation and how you plan to use the agentic web. It still takes you to that form even now.
The bottom line is that this survey went viral around the world, moved markets, and was already on track to hugely shape AI discourse, telling a lot of people something that they desperately wanted to believe — before the media or the public or policymakers could read it and see that its headline result might be based on not even a handful of custom AI deployments. That’s media incentives for you.
The hidden conflicts of interest [00:06:58]
So if it’s not a great or even a good piece of research, why did it become one of the most widely cited things written about AI in 2025? Surely a big part of the explanation has to be that it has the MIT label on it.
Like the Fortune article that made it famous, everyone, including me, came to refer to it just as “the MIT study.” We immediately took the results super seriously, because in our heads we’re picturing a serious, peer-reviewed paper coming out of MIT’s business or management school, or at least a project conducted by people who specialise in doing this sort of social science research.
But no. There’s no indication that it was ever intended to be an academic paper that would go through peer review. And the authors are an MIT professor and a postdoctoral fellow, both now working on AI agent frameworks; a product manager at Microsoft who works on AI agents; and a startup founder also working on developing and commercialising agentic AI systems.
And that raises another really problematic aspect of this. One of the report’s main conclusions is that AI tools aren’t flourishing in business because they lack learning, memory, and contextual adaptation. And it says that the solution to that is agentic AI frameworks — coincidentally, exactly the kind of thing that they’re all either currently developing or trying to sell.
The report then specifically names NANDA, the project that all four of them are involved with one way or another, as one of the best paths forward to solving the problem that they’ve just identified.
There are many other times reading this report that their interpretation of their survey results made extremely little sense to me, but would consistently seem to lead them to the conclusion that their AI agent frameworks are really essential for businesses. Oh, and keep in mind too that the evidence shows that they should definitely be bought from an external organisation, not developed internally.
We’ll list some of the most striking examples in a document linked in the video description, rather than go through them all here.
But the bottom line is that this group published a research report with what I think is at best a strained interpretation of their data, concluding that current AI is failing, and the solution to that is exactly the technology they themselves are building and selling. And this was marketed under the MIT brand with no conflict of interest disclosure, just to note that it reflects their views and not those of their employers.
Now, to be fair, I’m sure these people genuinely believe that what they’re building is useful and will help businesses adopt AI. And they might well be right about that. They’re even probably right about that. But what we’ve got here is a very different beast from what journalists, the public, and investors had in their minds when they were told an MIT study had demonstrated that AI is completely failing to help businesses.
The real lesson [00:09:27]
As you can see, the real lesson here isn’t one about artificial intelligence. This study isn’t good enough to teach us anything new about that one way or the other.
It’s a story about how a confusing report — based on 52 interviews and 153 survey results, opaque and very questionable data analysis, undisclosed conflicts of interest, and a remarkably convenient conclusion — could get the MIT stamp of approval, go viral through Fortune, move the Nasdaq, and become conventional wisdom remembered by literally tens of millions of people, before anyone can even read the study. That should worry us, whatever you think of AI.
Learn more