Abstractive Non-Deterministic Turing Machine with probability
Autoregression is a statistical technique that uses past values to predict future values in a time series. It's a regression of a variable against itself.
The confusing terminology is that Masked Attention is considered opposite to Bidirectional LM because information about tokens after the current token is not considered. The key point is unidirectionality/bidirectionality
Autoregressive Model Notion
Decoder-only LLM to Encoder-Decoder Transformer
Injective fucntion to final hidden state

Seonglae Cho