The Next-Gen LLM Might Pose an Existential Threat
I’m pretty sure that the next generation of LLMs will be safe. But the risk is still high enough to make me uncomfortable.
How sure are we that scaling laws are correct? Researchers have drawn curves predicting how AI capabilities scale based on how much goes into training them. If you extrapolate those curves, it looks like the next level of LLMs won’t be wildly more powerful than the current level. But maybe there’s a weird bump in the curve that happens in between GPT-5 and GPT-6 (or between Claude 4.5 and Claude 5), and LLMs suddenly become much more capable in a way that scaling laws didn’t predict. I don’t think we can be more than 99.9% confident that there’s not.
How sure are we that current-gen LLMs aren’t sandbagging (that is, deliberately hiding their true skill level)? I think they’re still dumb enough that their sandbagging can be caught, and indeed they have been caught sandbagging on some tests. I don’t think LLMs are hiding their true capabilities in general, and our understanding of AI capabilities is probably pretty accurate. But I don’t think we can be more than 99.9% confident about that.
How sure are we that the extrapolated capability level of the next-gen LLM isn’t enough to take over the world? It probably isn’t, but we don’t really know what level of capability is required for something like that. I don’t think we can be more than 99.9% confident.
Perhaps we can be >99.99% that the extrapolated capability of the next-gen LLM is still not as smart as the smartest human. But an LLM has certain advantages over humans—it can work faster (at least on many sorts of tasks), it can copy itself, it can operate computers in a way that humans can’t.
Alternatively, GPT-6/Claude 5 might not be able to take over the world, but it might be smart enough to recursively self-improve, and that might happen too quickly for us to do anything about.
How sure are we that we aren’t wrong about something else? I thought of three ways we could be disastrously wrong:
- We could be wrong about scaling laws;
- We could be wrong that LLMs aren’t sandbagging;
- We could be wrong about what capabilities are required for AI to take over.
But we could be wrong about some entirely different thing that I didn’t even think of. I’m not more than 99.9% confident that my list is comprehensive.
On the whole, I don’t think we can say there’s less than a 0.4% chance that the next-gen LLM forces us down a path that inevitably ends in everyone dying.
