not SAE, just logistic regression performs wellTry training token-level probes — LessWrongTL,DR: I train a probe to detect falsehoods on a token-level, i.e. to highlight the specific tokens that make a statement false. It worked surprising…https://www.lesswrong.com/posts/kxiizuSa3sSi4TJsN/try-training-token-level-probes