Looking to build a local OCR solution to extract data from pdf scans.

Posted by oilman99999@reddit | LocalLLaMA | View on Reddit | 14 comments

Looking to build a local OCR solution to extract data from pdf scans.

Attached is an example of what it can be but the percentage of handwritten docs is not large. Most of them are typed and not too hard to extract. I am wondering if there is an open source solution that would be good at reading and extracting values from either. I am testing it on my home pc (RTX 4090) but at work we have a DGX Spark that I can use for this