Looking to build a local OCR solution to extract data from pdf scans.
Posted by oilman99999@reddit | LocalLLaMA | View on Reddit | 14 comments
Attached is an example of what it can be but the percentage of handwritten docs is not large. Most of them are typed and not too hard to extract. I am wondering if there is an open source solution that would be good at reading and extracting values from either. I am testing it on my home pc (RTX 4090) but at work we have a DGX Spark that I can use for this
scottgal2@reddit
Use docling. docling.ai I've built several products for customers for doing almost the same thing and really...docling solves many of the issues around 'normal' ocr.
Top_Fisherman9619@reddit
wow this is a solid project, thanks for mentioning this. Will keep this in mind
LocalLLaMA-ModTeam@reddit
Rule 1 - Search before asking. The content is frequently covered in this sub. Please search to see if your question has been answered before creating a new post.
VonDenBerg@reddit
Like how many? GLM or Gemini, no setup required. Pray and spray
Top_Fisherman9619@reddit
glm ocr is insane
exaknight21@reddit
Seconded.
youcloudsofdoom@reddit
I was actually using qwen 3.5 35b MoE for this today, did a brilliant job at OCR on handwritten notes photographed on paper, running locally on my 4060.
realtag2025@reddit
Also just tried it on the QWEN 3.5 9B . it got everything write except for dates that are handwritten.
SomeOrdinaryKangaroo@reddit
qwen 3.5 provides next generation ocr capabilities, highly recommend it
fernandolv3@reddit
Link to tesseract: https://github.com/tesseract-ocr/tesseract
total_amateur@reddit
Tesseract may be the way to go if you want open source.
I just built a quick JavaScript tool for OCR to Google Sheets using Google APIs if OP wants to reuse the logic. It was built to allow plugging in different APIs.
My use case was a bit different, though - reading receipts to split shared expenses.
FoxiPanda@reddit
I’ve found Gemma-4-26B to be my favorite for this.
Tesseract, Qwen3.5, Gemma-4, and various small OCR models could do this (Nvidia-ocr2, GLM-OCR, etc).
However I like Gemma-4 because it is willing to put in (?) marks where it isn’t sure about a name and won’t get stuck for 100s trying to decipher it. It straddles the right balance of speed, accuracy, and willingness to mark uncertainty in the analysis that I really find appealing.
turtleisinnocent@reddit
A dude managed to run IBM Granite to do OCR using WebGL. I think the notebook is on HF. Zero install, just run. Granite is pretty good too, it can handle TeX and maths, if that's something you need.
NoFaithlessness951@reddit
Even tesseract can do this