Fill in the middle Tiktoken: OpenAI’s Tokenizer | Hacker News* the cl100k_base tokenizer has ~100k tokens -- previous tokenizers had ~50k. (enc.n_vocab gives 100277 but some numbers in that range don't work, starting at 100256)https://news.ycombinator.com/item?id=34008839arxiv.orghttps://arxiv.org/pdf/2207.14255.pdf