create token embedding table
- number of embeddings
- embedding dimension
padding_idx
- index of padding token in the input indices- The main purpose of this parameter is to provide a way to ignore certain tokens during the embedding lookup, which is particularly useful for batch processing of sequences of varying lengths
- the embedding vector for the padding index will not be updated during training.
- This is the token id value not vector index
Increasing the vocabulary size to a multiple of 64 means that the data can be more easily divided into equally sized batches that align with the way memory is managed and computations are performed on GPUs.
return embedding tensor for input index tensor