Model Diffing

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 15 15:37
Editor
Edited
Edited
2025 Apr 21 10:45
Model diffing is a method for precisely comparing internal representations or functional differences between different neural networks (or different versions of the same model)
  • Diffing models as a way to make safety auditing easier

General methods

  • Swapping weights
Model Diffing Methods
 
 
 
 
 
 
2018
2022
 
 

Recommendations