input_ids are the numerical representations of the tokens in the input sequenceThese are token indices, numerical representations of tokens building the sequences that will be used as input by the model. They are often the only required parameters to be passed to the model as input. Each tokenizer works differently but the underlying mechanism remains the same. For example, the BERT tokenizer converts text into a sequence of integers, where each integer corresponds to a specific token in the vocabulary
Maxime Labonne - Decoding Strategies in Large Language Models
A Guide to Text Generation From Beam Search to Nucleus Sampling
https://mlabonne.github.io/blog/posts/2022-06-07-Decoding_strategies.html

Seonglae Cho