Gemma 3 QAT launch with MLX, llama.cpp, Ollama, LM Studio, and Hugging Face
Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 50 comments
Hi!
Some weeks ago we released GGUFs corresponding to the QAT checkpoints of Gemma 3. Thanks to QAT, the model is able to preserve similar quality as `bfloat16` while significantly reducing the memory requirements to load the model. That is, QAT is an additional fine-tuning that makes the model more rigorous to quantization.
As we only released the GGUFs, we got feedback that it would be great to have the unquantized QAT-based checkpoints to allow people to quantize for their own tools. So...we did it! Today we're releasing the unquantized QAT-based checkpoints. The models preserve quality better than naive quantization.
**We also collaborated with Prince (from MLX), llama.cpp, Ollama, LM Studio, and Hugging Face to make sure you can use the models in all your favorite tools!**
* Blog post : [https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/](https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/)
* Unquantized checkpoints: [https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b](https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b)
* Ollama: [https://ollama.com/library/gemma3](https://ollama.com/library/gemma3) (try ollama run gemma3:12b-it-qat)
* LM Studio: [https://lmstudio.ai/model/gemma-3-12b-it-qat](https://lmstudio.ai/model/gemma-3-12b-it-qat)
* MLX: [https://huggingface.co/collections/mlx-community/gemma-3-qat-68002674cd5afc6f9022a0ae](https://huggingface.co/collections/mlx-community/gemma-3-qat-68002674cd5afc6f9022a0ae)
* llama.cpp: [https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b](https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b)
Enjoy!
50 Comments
Fluffy_Sheepherder76@reddit
gptlocalhost@reddit
AdOdd4004@reddit
karl-william@reddit
hackerllama@reddit (OP)
Disonantemus@reddit
dampflokfreund@reddit
Disonantemus@reddit
-Ellary-@reddit
dampflokfreund@reddit
-Ellary-@reddit
sxales@reddit
ekaknr@reddit
Papabear3339@reddit
sxales@reddit
Papabear3339@reddit
sxales@reddit
durden111111@reddit
coder543@reddit
hackerllama@reddit (OP)
alphakue@reddit
dampflokfreund@reddit
hackerllama@reddit (OP)
dampflokfreund@reddit
hackerllama@reddit (OP)
coder543@reddit
hiper2d@reddit
swagonflyyyy@reddit
Nevril@reddit
swagonflyyyy@reddit
East-Cauliflower-150@reddit
Zestyclose_Yak_3174@reddit
Calcidiol@reddit
Aaaaaaaaaeeeee@reddit
idkman27@reddit
DunderSunder@reddit
Papabear3339@reddit
Papabear3339@reddit
maglat@reddit
AaronFeng47@reddit
chibop1@reddit
ApprehensiveAd3629@reddit
FullstackSensei@reddit
Any-Mathematician683@reddit
hideo_kuze_@reddit
R46H4V@reddit
busylivin_322@reddit
TacGibs@reddit
busylivin_322@reddit
Accomplished_Mode170@reddit