The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook
- chat.py
- test_inference.py
- convert.py
ML Blog - Quantize Llama models with GGUF and llama.cpp
GGML vs. GPTQ vs. NF4
https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html

Seonglae Cho