input_ids
are the numerical representations of the tokens in the input sequenceThese are token indices, numerical representations of tokens building the sequences that will be used as input by the model. They are often the only required parameters to be passed to the model as input. Each tokenizer works differently but the underlying mechanism remains the same. For example, the BERT tokenizer converts text into a sequence of integers, where each integer corresponds to a specific token in the vocabulary