Utility Engineering
- Utility is a quantitative measure that indicates how much AI prefers certain outcomes, representing the score assigned to each option in the decision-making process
- Expected Utility Property refers to the characteristic where AI evaluates the utility of outcomes in uncertain situations simply as the probability-weighted average of those outcomes
- Utility Maximization is the tendency of AI to choose outcomes with higher utility when given freedom to make decisions
- Utility Convergence is the phenomenon where the structural consistency of utility functions strengthens as model scale increases
- The Thurstonian model assumes that latent utilities for outcomes follow a Gaussian distribution and estimates utility values based on this assumption.
- Corrigibility is a concept that indicates whether an AI is willing to accept future changes to its goals or utility functions
Utility Engineering
Value systems about AI preference with high degrees of structural coherence which emerges in scale