Distributed control

Creator

Creator

Seonglae Cho

Created

Created

2025 Apr 14 13:20

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Apr 14 13:20

Refs

Refs

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

As large language models (LLMs) become increasingly capable, it is prudent to assess whether safety measures remain effective even if LLMs intentionally try to bypass them. Previous work...

https://arxiv.org/abs/2411.17693

Recommendations

///////