Huggingface Tokenizer input

input_ids are the numerical representations of the tokens in the input sequence

These are token indices, numerical representations of tokens building the sequences that will be used as input by the model. They are often the only required parameters to be passed to the model as input. Each tokenizer works differently but the underlying mechanism remains the same. For example, the BERT tokenizer converts text into a sequence of integers, where each integer corresponds to a specific token in the vocabulary

Maxime Labonne - Decoding Strategies in Large Language Models

A Guide to Text Generation From Beam Search to Nucleus Sampling

https://mlabonne.github.io/blog/posts/2022-06-07-Decoding_strategies.html

Huggingface Tokenizer input_ids

Recommendations