Config based training (Declarative Programming)
pip install torchtune
tune run --nproc_per_node 2 full_finetune_distributed --config llama2/7B_full
tune run lora_finetune_single_device \ --config llama2/7B_lora_single_device \ batch_size=8 \ enable_activation_checkpointing=True \ max_steps_per_epoch=128