Utility Engineering

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 26 22:10
Editor
Edited
Edited
2025 Feb 19 12:29
Refs
Refs

Utility Engineering

  • Utility is a quantitative measure that indicates how much AI prefers certain outcomes, representing the score assigned to each option in the decision-making process
  • Expected Utility Property refers to the characteristic where AI evaluates the utility of outcomes in uncertain situations simply as the probability-weighted average of those outcomes
  • Utility Maximization is the tendency of AI to choose outcomes with higher utility when given freedom to make decisions
  • Utility Convergence is the phenomenon where the structural consistency of utility functions strengthens as model scale increases
  • The Thurstonian model assumes that latent utilities for outcomes follow a Gaussian distribution and estimates utility values based on this assumption.
  • Corrigibility is a concept that indicates whether an AI is willing to accept future changes to its goals or utility functions
 
Political biases visualization across LLMs and politicians simulated by an LLM https://arxiv.org/pdf/2208.10264
Political biases visualization across LLMs and politicians simulated by an LLM https://arxiv.org/pdf/2208.10264
 
 
 

Utility Engineering

Value systems about AI preference with high degrees of structural coherence which emerges in scale
 
 

Recommendations