NSFW Features
CLIP Vision Transformer
When normalized by the number of patches, it was observed that a similar number of ViT features as language models, and there are claims that SAE reinsertion reduces loss and eliminates noise.
ViT-Prisma
Prisma-Multimodal • Updated 2025 Jun 5 23:51