Best LLM for OCR invoices and make JSON and calculate values?

Posted by Difficult-Bluejay-52@reddit | LocalLLaMA | View on Reddit | 6 comments

Hi. I have been using gpt4 and gpt4o for a while and recently switched to Sonnet 3.5. I want to know what other LLM models you have tried for OCR.

This is what we are currently using and our requirements.
We send a bunch of pictures 1-10 that contain pages from one invoice or multiple invoices.

The LLM has to go through each image, extract this information, and make this JSON (and sum up values):

{

"Currency":"",

"Vendor":"",

"CourierName":"",

"CourierNumber":"",

"Consignee": "",

"ACC number": "",

"Items":[{"Description":"","QTY":"","Unit Price":"", "FileID": ""}],

"Subtotal":"",

"Tax":"",

"Shipping&Handling":"",

"Shipping&HandlingDiscount":"",

"Discount":"",

"Refund":"",

"Coupon":"",

"GiftCard":"",

"Credit":"",

"Total":""

}

This works 70-80% of the time, but sometimes the sum-up values are incorrect, failing in the sum and giving the wrong totals (subtotal or tax or shipping, total, etc.) and I would like to try other llms to see if they can do better!

Thanks.