Introducing quantized Llama models with increased speed and a reduced memory footprint
As our first quantized models in this Llama category, these instruction-tuned models retain the quality and safety of the original 1B and 3B models, while achieving 2-4x speedup.
https://ai.meta.com/blog/meta-llama-quantized-lightweight-models/