UST D3CODE 2024

Created
Created
2024 Sep 27 23:5
Creator
Creator
Seonglae ChoSeonglae Cho
Editor
Edited
Edited
2024 Nov 23 14:25
Refs
Refs

Scale

scalable solutions that create a positive social impact

innovation, problem-solving, design thinking, and programming skills
not only address immediate challenges but also create lasting, positive social impact
  • Scaling the foundation of the future
  • Local beginnings to global influence
  • Building ethical frameworks that scale
  • Scaling impact
  • Exploring future frontiers
  • Human in the loop
UST D3CODE 2024 Plans
AI Governance principles
  • Explainability: Mechanistic Interpretability is central to the project, providing a promising path for understanding how AI models make decisions
  • Safety: Activation Engineering offers a mechanism for making AI models safer, especially when handling advanced systems like AGI, which may pose existential risks
  • Transparency: By directly manipulating activations and making changes explicit, the system provides a transparent method for controlling AI
  • Reproducibility: Unlike Prompt Engineering, which depends on natural language randomness, Activation Engineering provides more reproducible results through mathematical feature manipulation
  • Robustness: Activation Engineering overrides input prompts, providing a more robust system for resisting jailbreaking and malicious inputs

Social Impact

  • Explicitly control AI behaviors by switching on and off specific activation patterns
  • Improve AI safety by allowing for robust control, making it possible to prevent harmful actions by AI, particularly in AGI systems
  • Ensure transparency through mechanisms that clearly illustrate how internal activations affect outputs
  • Enhance reproducibility by bypassing the randomness inherent in Prompt Engineering and relying on more deterministic feature manipulation
 
notion image
 
 
 

 

Recommendations