Tiktoken: OpenAI’s Tokenizer | Hacker News
* the cl100k_base tokenizer has ~100k tokens -- previous tokenizers had ~50k. (enc.n_vocab gives 100277 but some numbers in that range don't work, starting at 100256)
https://news.ycombinator.com/item?id=34008839