
Adversarial Examples works due to the Superposition Hypothesis
This interference occurs because even a small stimulation of a specific feature can simultaneously disturb other features, allowing attackers to achieve significant effects with minimal perturbations.
For this, the paper proposes that Adversarial Training → increased robustness → reduced superposition → increased interpretability, thus connecting robustness and interpretability
influence humans too
Images altered to trick machine vision can influence humans too
In a series of experiments published in Nature Communications, we found evidence that human judgments are indeed systematically influenced by adversarial perturbations.
https://deepmind.google/discover/blog/images-altered-to-trick-machine-vision-can-influence-humans-too/

Seonglae Cho