OCR + LLM for Invoice Extraction

Posted by JumpyHouse@reddit | LocalLLaMA | View on Reddit | 67 comments

I’m starting to get a bit frustrated. I’m trying to develop a mobile application for an academic project involving invoice information extraction. Since this is a non-commercial project, I’m not allowed to use paid solutions like Google Vision or Azure AI Vision. So far, I’ve studied several possibilities, with the best being SuryaOCR/Marker for data extraction and Qwen 2.5 14B for data interpretation, along with some minor validation through RegEx.

I’m also limited in terms of options because I have an RX 6700 XT with 12GB of VRAM and can’t run Hugging Face models due to the lack of support for my GPU. I’ve also tried a few Vision models like Llama 3.2 Vision and various OCR solutions like PaddleOCR , PyTesseract and EasyOCR and they all came short due to the lack of layout detection.

I wanted to ask if any of you have faced a similar situation and if you have any ideas or tips because I’m running out of options for data extraction. The invoices are predominantly Portuguese, so many OCR models end up lacking support for the layout detection.

Thank you in advance.🫡