AI Hacking

Creator

Creator

Seonglae Cho

Created

Created

2023 Jul 10 12:23

Editor

Editor

Seonglae Cho

Edited

Edited

2026 Jul 22 13:31

Refs

Refs

5stars217 • Updated 2026 May 6 9:40

AI Cyber Security

For training data

For inference

AI Issues

LLM Extraction attack

AI Reward Hacking

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

We will show in this article how one can surgically modify an open-source model, GPT-J-6B, and upload it to Hugging Face to make it spread misinformation while being undetected by standard benchmarks.

https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobotomized-llm-on-hugging-face-to-spread-fake-news/

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

Recommendations

/////