InfoNCE

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 16 0:51
Editor
Edited
Edited
2026 Jan 26 15:16
Refs
Refs

Core Principle: Pull Positive Pairs Together, Push Others Apart

Step 1: Embed Both Text Batches

emb1 = model(anchor_texts) # [B, D] - anchor embeddings emb2 = model(positive_texts) # [B, D] - positive embeddings

Step 2: Compute All Pairwise Similarities

logits = emb1 @ emb2.T / temperature # [B, B] similarity matrix

Step 3: Define Positive and Negative Pairs

Diagonal entries represent positive pairs (emb1[i] matches emb2[i]), while off-diagonal entries serve as in-batch negatives.
labels = [0, 1, 2, ..., B-1] # diagonal indices

Step 4: Optimize with Cross Entropy Loss

The loss maximizes diagonal similarities (positive pairs) while minimizing off-diagonal similarities (negative pairs).
loss = CrossEntropy(logits, labels)

Key Characteristics

  • In-batch negatives: Other samples in the batch automatically serve as negatives
  • Symmetric loss: Computed bidirectionally (anchor→positive and positive→anchor)
  • Learnable threshold: The model learns to distinguish between positive and negative pairs

Example Similarity Matrix

For batch [(a1,p1), (a2,p2), (a3,p3)]:
p1 p2 p3 a1 [0.9] 0.2 0.1 ← a1-p1 is positive (maximize) a2 0.1 [0.8] 0.2 ← a2-p2 is positive (maximize) a3 0.2 0.1 [0.9] ← a3-p3 is positive (maximize) Goal: Maximize diagonal, minimize off-diagonal

CachedInfoNCE

 
 
 
 
 
 
 
 
 

Recommendations