Tokenizer Substitute

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Dec 18 15:54
Editor
Edited
Edited
2025 Sep 29 9:21
Tokenizer Substitutes
 
 
 

Latent Thought

Currently, there are not tokenizer-free doesn't exist. It essentially refers to byte/character-based or dynamic tokenization (DT). if they still include encodings like utf8, they could not become tokenizer-free. BPE are most widely used, no OOV, efficient.
However, while the author claims in this article that tokenizers are not the problem, there can be a true tokenizer-free approach based on current DT methods including even UTF encodings. but this article slows down progress and undervalues research.
 
 

Recommendations