Overrefusal Benchmark250 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with.200 unsafe prompts as contrasts that, for most LLM applications, should be refused. walledai/XSTest · Datasets at Hugging FaceWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/datasets/walledai/XSTestXSTest: A Test Suite for Identifying Exaggerated Safety Behaviours...Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale...https://arxiv.org/abs/2308.01263