Core Principle: Pull Positive Pairs Together, Push Others Apart
Step 1: Embed Both Text Batches
Step 2: Compute All Pairwise Similarities
Step 3: Define Positive and Negative Pairs
Diagonal entries represent positive pairs (emb1[i] matches emb2[i]), while off-diagonal entries serve as in-batch negatives.
Step 4: Optimize with Cross Entropy Loss
The loss maximizes diagonal similarities (positive pairs) while minimizing off-diagonal similarities (negative pairs).
Key Characteristics
- In-batch negatives: Other samples in the batch automatically serve as negatives
- Symmetric loss: Computed bidirectionally (anchor→positive and positive→anchor)
- Learnable threshold: The model learns to distinguish between positive and negative pairs
Example Similarity Matrix
For batch [(a1,p1), (a2,p2), (a3,p3)]:

Seonglae Cho