Inner alignment asks the question: How can we robustly aim our AI optimizers at any objective function at all?
We refer to this problem of aligning mesa-optimizers with the base objective as the inner alignment problem.
As an example, evolution is an optimization force that itself 'designed' optimizers (humans) to achieve its goals. However, humans do not primarily maximize reproductive success, they instead use birth control while still attaining the pleasure that evolution meant as a reward for attempts at reproduction. This is a failure of inner alignment. (Waluigi Effect)