Alignment Bootstrapping Is Dangerous
AI companies want to bootstrap weakly-superhuman AI to align superintelligent AI. I don’t expect them to succeed. I could give various arguments for why alignment bootstrapping is hard and why AI companies are ignoring the hard parts of the problem; but you don’t need to understand any details to know that it’s a bad plan.
When AI companies say they will bootstrap alignment, they are admitting defeat on solving the alignment problem, and saying that instead they will rely on AI to solve it for them. So they’re facing a problem of unknown difficulty, but where the difficulty is high enough that they don’t think they can solve it. And to remediate this, they will use a novel technique never before used in history—i.e., counting on slightly-superhuman AI to do the bulk of the work.
If they mess up and this plan doesn’t work, then superintelligent AI kills everyone.
And they think this is an acceptable plan, and it is acceptable for them to build up to human-level AI or beyond on the basis of this plan.
What?
Continue reading