Text embedding

Creator
Creator
Seonglae Cho
Created
Created
2022 Apr 3 14:39
Editor
Edited
Edited
2025 Mar 16 17:31

Embeddings are arrays of floating point numbers that represent the semantic meaning of a piece of content

Text encoding is more broader concept which means does not requires to reserve original semantics like
One Hot encoding
The advantage of vectorization is that it enables operations such as addition, subtraction, and multiplication.
We can separate text embeddings using below parameters
  • Multi-lingual
  • Context window size - max passage size
  • Model size - computing resource
  • embedding vector length - storage resource
Text embedding Notion
 
 
Text Embedding Models
 
 
Sentence embedding Methods
notion image
 
 
 

Leaderboard

OpenAI embeddings, AWS or other commercial embeddings have much larger context windows like 8192.
Performance varies significantly between tasks, so decisions shouldn't be made solely based on overall leaderboard performance.

What is embedding

 
 

Recommendations