YOLOE integrates detection and segmentation across text, visual, and prompt-free mechanisms. It uses Re-parameterizable Region-Text Alignment (RepRTA) for text prompts, Semantic-Activated Visual Prompt Encoder (SAVPE) for visual prompts, and Lazy Region-Prompt Contrast (LRPC) for prompt-free detection.
YOLOE: Real-Time Seeing Anything
Object detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories,...
https://arxiv.org/abs/2503.07465


Seonglae Cho