Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/NLP/Text Tokenizer/Special Token/
FIM Token
Search

FIM Token

Creator
Creator
Seonglae Cho
Created
Created
2024 Feb 27 8:47
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Feb 27 8:50
Refs
Refs

Fill in the middle

 
 
 
 
 
Tiktoken: OpenAI’s Tokenizer | Hacker News
* the cl100k_base tokenizer has ~100k tokens -- previous tokenizers had ~50k. (enc.n_vocab gives 100277 but some numbers in that range don't work, starting at 100256)
Tiktoken: OpenAI’s Tokenizer | Hacker News
https://news.ycombinator.com/item?id=34008839
arxiv.org
https://arxiv.org/pdf/2207.14255.pdf
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/NLP/Text Tokenizer/Special Token/
FIM Token
Copyright Seonglae Cho