ggml: add Q1_0 1-bit quantization support (CPU) - 1-bit Bonsai models

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 43 comments

Bonsai's 8B model is just 1.15GB so CPU alone is more than enough.

https://huggingface.co/collections/prism-ml/bonsai