mistral-small-3.2 OCR accuracy way too bad with llama.cpp compared to ollama?

Posted by caetydid@reddit | LocalLLaMA | View on Reddit | 27 comments

Hi, I have evaluated mistral small 3.2 for OCR tasks using ollama. The accuracy has been very satisfying while some bugs cause it to run on CPU solely with a rtx 4090 (about 5t/s). So I switched to llama.cpp and obtain between 20-40t/s using the model + mmproj from unsloth. Both models are Q4\_K\_M. The accuracy is way worse than what I get when using ollama. How can that be? Is it using another vision projector, or am I doing sth wrong? I use 32k context, temp=0, all other settings are defaults. I do not explicitely use quantized kvcache or flash attention. Any idea how to get on par with ollamas excellent OCR accuracy? thanks & greets