Tokenizer Substitutes
Latent Thought
Currently, there are not tokenizer-free doesn't exist. It essentially refers to byte/character-based or dynamic tokenization (DT). if they still include encodings like utf8, they could not become tokenizer-free. BPE are most widely used, no OOV, efficient.
However, while the author claims in this article that tokenizers are not the problem, there can be a true tokenizer-free approach based on current DT methods including even UTF encodings. but this article slows down progress and undervalues research.