LLaMa.cpp

Creator

Creator

Created

Created

2023 Dec 30 16:8

Editor

Editor

Edited

Edited

2025 Jan 23 21:44

Refs

Refs

AI Inference Tool

ggerganov • Updated 2023 Dec 30 16:6

The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quantization on a MacBook

chat.py

test_inference.py

convert.py

ML Blog - Quantize Llama models with GGUF and llama.cpp

GGML vs. GPTQ vs. NF4

https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html

Backlinks

Recommendations

/////////