CLIP

Creator

Creator

Seonglae Cho

Created

Created

2022 Apr 20 2:46

Editor

Editor

Seonglae Cho

Edited

Edited

2026 Jul 17 15:15

Refs

Refs

Text embedding

Contrastive Learning-Image Pre-training

Connecting text and images trained by

Contrastive Learning with text and image at the same time. CLIP makes input text to

Embedding vector for image processing. This zero-shot capability made

Image Labeling requirement very low.

notion image

notion image

CLIP Usages

Enhanced usages for
Image Segmentation and
Object Detection

Minderer et al., Simple Open-Vocabulary Object Detection with Vision Transformers, 2022

Luddecke and Ecker, Image Segmentation Using Text and Image Prompts., 2022

https://cdn.openai.com/papers/dall-e-2.pdf

CLIP · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/docs/transformers/model_doc/clip

CLIP · Hugging Face

microsoft/LLM2CLIP-Openai-L-14-336 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336

microsoft/LLM2CLIP-Openai-L-14-336 · Hugging Face

Recommendations

///////