Adversarial Training

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jan 14 9:44
Editor
Edited
Edited
2025 Aug 28 10:10

Defense method that makes models robust against adversarial examples

Adversarial Trainings
notion image
 
 
 

Adversarial Example
s works due to the
Superposition Hypothesis

This interference occurs because even a small stimulation of a specific feature can simultaneously disturb other features, allowing attackers to achieve significant effects with minimal perturbations.
For this, the paper proposes that
Adversarial Training
→ increased robustness → reduced superposition → increased interpretability, thus connecting robustness and interpretability
 
 

Recommendations