The main goal of llama.cpp
is to run the LLaMA model using 4-bit integer quantization on a MacBook
- chat.py
- test_inference.py
- convert.py
llama.cpp
is to run the LLaMA model using 4-bit integer quantization on a MacBook