Adaptive Deployment of Untrusted LLMs Reduces Distributed ThreatsAs large language models (LLMs) become increasingly capable, it is prudent to assess whether safety measures remain effective even if LLMs intentionally try to bypass them. Previous work...https://arxiv.org/abs/2411.17693