Handwriting OCR in mass

Posted by batty_1@reddit | LocalLLaMA | View on Reddit | 18 comments

I have about 50 million pages of handwritten/machine print mix documents. I want to convert all of these to markdown, preserving structure. I need as close to perfect accuracy as possible on the handwritten elements: these are boilerplate forms with handwritten elements, so those handwritten elements are really the critical "piece".

I've been trying some variation of this for about six months and could never quite get it right: decimal points would be removed, leading negative signs, sloppy handwriting completely misunderstood, etc.

recently, I revisited the problem and tried Qwen3.5:9b loaded up on my 4070 super and I was astonished by the results. Damn near 100% accuracy for even very complicated scenarios (faded handwriting, "one-line" markout corrections, etc.). I am still able to achieve 30-40 tokens per second and a page takes about 10-15 seconds - this is spun up and being called using Ollama's GGUF, thinking disabled.

The issue I'm having is that, in about 20% of the pages, Qwen hits a repetition loop and starts flood filling the markdown with empty rows ("| | | ...") until it exceeds the token allowance. This is a double whammy: it both truncates the page results and runs for 3-5x as long (average page is 400-600 tokens vs. filling 2048 tokens with nonsense).

Repetition penalties don't seem to work, nor does any amount of prompt manipulation. I've tried various other versions of the same model in vLLM and llama.cpp, but I can't achieve the same accuracy. The quantization they have on the Ollama side is magic.

I tried Gemma4 last night and had about 95% the accuracy and no repetition loops and about a 30% speed increase - which was great, but not good enough for this use case.

Has anyone else encountered this, or had a similar use case they worked through, and can provide some guidance? I appreciate it.

Fine tuning isn't off the table, and that might be what it takes, but I wanted to ask you guys, first.

(the elephant in the room: I don't intend on running all 50 million pages through my one 4070 ultra. just trying to get the pipeline solid first)