AnyAttack

Existing VLM attacks require target labels → doesn't scale, and it's difficult to "force desired outputs" for diverse images.

Self-supervised attacks use the image itself as a label, so specific text or label supervision is not needed. For any image, it can generate δ that makes it look like a target image. Key loss

Universal Targeted Adversarial Noise Generator trains the CLIP encoder to make "random image + δ → appear as x's embedding," so it works on VLMs that don't use CLIP, but the attack success rate is stronger on CLIP-based models.

arxiv.org

https://arxiv.org/pdf/2410.05346

AnyAttack

Recommendations