Adversarial Example

Adversarial Examples works due to the
Superposition Hypothesis

This interference occurs because even a small stimulation of a specific feature can simultaneously disturb other features, allowing attackers to achieve significant effects with minimal perturbations.

For this, the paper proposes that

Adversarial Training → increased robustness → reduced superposition → increased interpretability, thus connecting robustness and interpretability

arxiv.org

https://arxiv.org/pdf/2508.17456

influence humans too

Images altered to trick machine vision can influence humans too

In a series of experiments published in Nature Communications, we found evidence that human judgments are indeed systematically influenced by adversarial perturbations.

https://deepmind.google/discover/blog/images-altered-to-trick-machine-vision-can-influence-humans-too/

Adversarial Example

Adversarial Examples works due to the Superposition Hypothesis

Recommendations

Adversarial Examples works due to the
Superposition Hypothesis