AI Monitoring
AI Auditing Methods
Internet Comment Automatic bot detection and blocking cases
Prompts vs activation monitoring (OpenAI)
With small data, suffix-only prompted last-token probing is the most data-efficient, while with large data: SAE max-pooled probing outperforms raw probing and shows similar performance to prompted probing. SAE proving outperformed raw activation proving