Qwen2.5 7B chat GGUF quantization Evaluation results

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 39 comments

**This is the Qwen2.5 7B Chat model, NOT coder** |Model|Size|Computer science (MMLU PRO)| |:-|:-|:-| |qwen2.5:7b-instruct-q8\_0|8.1 GB|56.59| |iMat-Qwen2.5-7B-Instruct-Q6\_K|6.3 GB|58.54| |qwen2.5:7b-instruct-q6\_K|6.3 GB|57.80| |iMat-Qwen2.5-7B-Instruct-Q5\_K\_L|5.8 GB|56.59| |iMat-Qwen2.5-7B-Instruct-Q5\_K\_M|5.4 GB|55.37| |qwen2.5:7b-instruct-q5\_K\_M|5.4 GB|57.80| |iMat-Qwen2.5-7B-Instruct-Q5\_K\_S|5.3 GB|57.32| |qwen2.5:7b-instruct-q5\_K\_S|5.3 GB|58.78| |iMat-Qwen2.5-7B-Instruct-Q4\_K\_L|5.1 GB|56.10| |iMat-Qwen2.5-7B-Instruct-Q4\_K\_M|4.7 GB|58.54| |qwen2.5:7b-instruct-q4\_K\_M|4.7 GB|54.63| |iMat-Qwen2.5-7B-Instruct-Q3\_K\_XL|4.6 GB|56.59| |iMat-Qwen2.5-7B-Instruct-Q4\_K\_S|4.5 GB|53.41| |qwen2.5:7b-instruct-q4\_K\_S|4.5 GB|55.12| |iMat-Qwen2.5-7B-Instruct-IQ4\_XS|4.2 GB|56.59| |iMat-Qwen2.5-7B-Instruct-Q3\_K\_L|4.1 GB|56.34| |qwen2.5:7b-instruct-q3\_K\_L|4.1 GB|51.46| |iMat-Qwen2.5-7B-Instruct-Q3\_K\_M|3.8 GB|54.39| |qwen2.5:7b-instruct-q3\_K\_M|3.8 GB|53.66| |iMat-Qwen2.5-7B-Instruct-Q3\_K\_S|3.5 GB|51.46| |qwen2.5:7b-instruct-q3\_K\_S|3.5 GB|51.95| |iMat-Qwen2.5-7B-Instruct-IQ3\_XS|3.3 GB|52.20| |iMat-Qwen2.5-7B-Instruct-Q2\_K|3.0 GB|49.51| |qwen2.5:7b-instruct-q2\_K|3.0 GB|44.63| |---|---|---| |llama3.1-8b-Q8\_0|8.5 GB|46.34| Static GGUF: [https://www.ollama.com/](https://www.ollama.com/) iMatrix calibrated GGUF using English dataset(iMat-): [https://huggingface.co/bartowski](https://huggingface.co/bartowski) Backend: [https://www.ollama.com/](https://www.ollama.com/) evaluation tool: [https://github.com/chigkim/Ollama-MMLU-Pro](https://github.com/chigkim/Ollama-MMLU-Pro) evaluation config: [https://pastebin.com/YGfsRpyf](https://pastebin.com/YGfsRpyf)