mistral-small-3.2 OCR accuracy way too bad with llama.cpp compared to ollama?
Posted by caetydid@reddit | LocalLLaMA | View on Reddit | 27 comments
Hi,
I have evaluated mistral small 3.2 for OCR tasks using ollama. The accuracy has been very satisfying while some bugs cause it to run on CPU solely with a rtx 4090 (about 5t/s).
So I switched to llama.cpp and obtain between 20-40t/s using the model + mmproj from unsloth. Both models are Q4\_K\_M. The accuracy is way worse than what I get when using ollama. How can that be?
Is it using another vision projector, or am I doing sth wrong? I use 32k context, temp=0, all other settings are defaults. I do not explicitely use quantized kvcache or flash attention.
Any idea how to get on par with ollamas excellent OCR accuracy?
thanks & greets
27 Comments
Fireblade_5555@reddit
Awwtifishal@reddit
HumanAppointment5@reddit
Awwtifishal@reddit
HumanAppointment5@reddit
caetydid@reddit (OP)
Awwtifishal@reddit
HumanAppointment5@reddit
caetydid@reddit (OP)
triynizzles1@reddit
HumanAppointment5@reddit
Gregory-Wolf@reddit
HumanAppointment5@reddit
Gregory-Wolf@reddit
caetydid@reddit (OP)
HumanAppointment5@reddit
HumanAppointment5@reddit
HumanAppointment5@reddit
HumanAppointment5@reddit
pseudonerv@reddit
fp4guru@reddit
Gregory-Wolf@reddit
caetydid@reddit (OP)
Gregory-Wolf@reddit
caetydid@reddit (OP)
Cergorach@reddit
caetydid@reddit (OP)