Tokenizer Substitutes
Latent Thought
Reasoning to Learn from Latent Thoughts | alphaXiv
View 4 comments: I disagree with the premise that human-written text is the "culmination" of underlying thought process.Does anyone else have issues with this? Conversation, for example, is the formulation of speech, ...
https://www.alphaxiv.org/abs/2503.18866v1
Currently, there are not tokenizer-free doesn't exist. It essentially refers to byte/character-based or dynamic tokenization (DT). if they still include encodings like utf8, they could not become tokenizer-free. BPE are most widely used, no OOV, efficient.
However, while the author claims in this article that tokenizers are not the problem, there can be a true tokenizer-free approach based on current DT methods including even UTF encodings. but this article slows down progress and undervalues research.
There is no such thing as a tokenizer-free lunch
A Blog post by Catherine Arnett on Hugging Face
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers

Seonglae Cho
