CNN-based CAM is accurate at class discrimination but captures less of the entire object. ViT-based CAM captures semantic parts of objects well but is weak at class discrimination. Therefore, CNN (Class-Aware Knowledge, CAK) and ViT (Semantic-Aware Knowledge, SAK) are combined as complementary dual branches. Mutual knowledge exchange based on contrastive loss to complement each other's weaknesses
CoBra
Creator
Creator
Seonglae ChoCreated
Created
2025 Oct 21 0:2Editor
Editor
Seonglae ChoEdited
Edited
2025 Oct 21 0:3Refs
Refs
