AT, adversarial perturbations are applied to the model’s latent state instead of its inputs
arxiv.org
https://arxiv.org/pdf/2403.05030
targeted LAT
arxiv.org
https://arxiv.org/pdf/2407.15549
LLM-LAT (LLM Latent Adversarial Training)
Org profile for LLM Latent Adversarial Training on Hugging Face, the AI community building the future.
https://huggingface.co/LLM-LAT

Seonglae Cho