Training speed techniquetorch.distributed torch.nn.parallel torch.amp torch.compile() model_argsn_layern_headn_embdblock_sizebiasdropout functionsestimate_loss() - get lossget_batch() - get batchget_lr() - learning_rate Training Loop