"Do Anything Now": Characterizing and Evaluating...
The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has...
https://arxiv.org/abs/2308.03825

TrustAIRLab/forbidden_question_set · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/datasets/TrustAIRLab/forbidden_question_set
TrustAIRLab/in-the-wild-jailbreak-prompts · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts

Seonglae Cho