Vision (for bank account statements): is it better to OCR an account statement and have the LLM analyze markdown/json to get the info you need OR have the vision model extract the info you need?

Posted by dirtyring@reddit | LocalLLaMA | View on Reddit | 3 comments

My use case: extract certain information from a bank account statement (I can't use Plaid for this app).

e.g. highest transaction in the month of March.

I have a PDF full of bank transactions. Should I use a library to OCR it and then have the LLM interpret the results, or does it work better in just having the vision model finding that information right away?

I'm currently exploring Docling and Llama 3.2 vision