Vision (for bank account statements): is it better to OCR an account statement and have the LLM analyze markdown/json to get the info you need OR have the vision model extract the info you need?
Posted by dirtyring@reddit | LocalLLaMA | View on Reddit | 6 comments
My use case: extract certain information from a bank account statement (I can't use Plaid for this app).
e.g. highest transaction in the month of March.
I have a PDF full of bank transactions. Should I use a library to OCR it and then have the LLM interpret the results, or does it work better in just having the vision model finding that information right away?
I'm currently exploring Docling and Llama 3.2 vision
divyeshk95@reddit
How can I extract some information from bank statements? Working on a bank statement analyser with pdfs for N different banks. Combination of NLP and regex isnt working
brotie@reddit
Preprocessing the PDFs with something that does PDF text extraction as its primary function will yield much better results than hoping the vision model nails it
sshan@reddit
If the pdfs follow a standard structure and there are no graphs or infographics like this case. (straightforward charts are fine)
GHOST--1@reddit
extract the data using an OCR i.e. surya OCR or doctr (document transformer). Then send it to an LLM.
Directly giving it to an LLM/VLM for OCR may sometimes result in hallucinated text.
jonahbenton@reddit
All PDF bank statements encode the actual data as text, extractable with non AI PDF processing tools. That data is sufficient for a lot of tasks. LLM vision can be a good assist if there is some specific question that can be posed as an either/or based on the data.
Old school (lol) OCR in general tends not to be reliable enough on its own and you can't query it.
ismaaiil933@reddit
Llama 3.2 is not super good with vision tasks.