Alignment Bootstrapping Is Dangerous
AI companies want to bootstrap weakly-superhuman AI to align superintelligent AI. I don’t expect them to succeed. I could give various arguments for why alignment bootstrapping is hard and why AI companies are ignoring the hard parts of the problem; but you don’t need to understand any details to know that it’s a bad plan.
When AI companies say they will bootstrap alignment, they are admitting defeat on solving the alignment problem, and saying that instead they will rely on AI to solve it for them. So they’re facing a problem of unknown difficulty, but where the difficulty is high enough that they don’t think they can solve it. And to remediate this, they will use a novel technique never before used in history—i.e., counting on slightly-superhuman AI to do the bulk of the work.
If they mess up and this plan doesn’t work, then superintelligent AI kills everyone.
And they think this is an acceptable plan, and it is acceptable for them to build up to human-level AI or beyond on the basis of this plan.
What?
It takes remarkable hubris to believe that a problem is this hard, to believe that humanity’s survival depends on getting the right solution, and yet be this confident that it will be solved.
If you don’t know how hard a problem is, then it’s harder than you think.
If you plan on using a technique that’s never been used before, then that technique is less effective than you think.
If you have a problem of unknown difficulty that you want to solve using unknown methods, and you don’t know how you will develop those methods, and failure would be catastrophic, then you shouldn’t do that.
Imagine if NASA wanted to land on the moon and they were trying to figure out how to make rocket fuel, but metalworking hadn’t been invented yet so all their rockets were made of wood. And they said, we are working on figuring out how to make some material won’t get incinerated by rocket fuel; no, we don’t know what that material is, and we have no theory of how to make it; but don’t worry, in 2020 we only had maple and today we are using oak, so we’re making good progress.
This would not be an acceptable plan for solving a medium-stakes problem. It is certainly not an acceptable approach when a failure would destroy everything that matters in the world.
A steelman of this position would be something like:
Yes, our plan has a good chance of failing and killing everyone. But if we don’t build ASI using alignment bootstrapping, some other company will build ASI using even worse techniques, and we’re even more likely to die. So building ASI this way is our best option, even though it’s extremely risky.
Some people believe this. A small number of individuals have said things to this effect. I think they’re still wrong, but at least I get it.
To my knowledge, no one has ever said this in their capacity as a person who is directly working on ASI development or alignment.
I don’t respect AI companies when they publish their roadmaps for handling alignment, and then at no point does the roadmap say anything like
This is a bad plan that has an unacceptably high risk of killing everyone. We’d much prefer to coordinate to slow down and take our time. We would support a global halt on developing ASI until it can be proven safe; but until such time as that happens, we will continue building ASI using our least-bad plan.
The fact that they don’t say this makes coordination more difficult—it’s a self-fulfilling prophecy. Concealing the difficulty of the alignment problem actively contributes to the situation in which the wider world does not take AI risk seriously, and safety-minded developers feel forced to follow a dangerous plan as the least-bad option.
Every major safety proposal by an AI company should start with a disclaimer like “This is a frighteningly risky plan that we are not at all confident in, but it’s our best option due to the lack of widespread agreement about the importance of AI risk.” And they should be simultaneously pushing for global regulations so that they no longer have to take this dangerous route.