LLM for finance
Posted by rtk85@reddit | LocalLLaMA | View on Reddit | 12 comments
Any specific LLM best for financial and/or accounting related tasks? Specifically, dealing with large data sets, pdf extraction (bank statements), tracing transaction from bank statement to ledger, identifying unusual trends, clean excel outputs!
FederalAnalysis420@reddit
i honestly think you could just use something like claude, for this exact workflow you probably don't need local infra or a custom pipeline. It's cloud so obviously processing is fast and they allow you to have way ore docs in chat compared to gpt.
Teslaaforever@reddit
pipeshub
ExosFantome@reddit
Think carefully before ingesting data directly into an LLM. Given the current rate of hallucinations, especially local LLMs aren't yet 100% accurate. A better approach would be developing a custom application using an llm. And then use that app to manage that data securely.
Enough_Big4191@reddit
i wouldn’t think “one model solves this,” it’s more pipeline than model. pdf extraction, matching to ledger, and anomaly detection are all different failure modes. for finance stuff, smaller strong instruct models can work, but the real work is structuring the data before and after. parsing pdfs cleanly and enforcing consistent schemas matters more than swapping models. otherwise it looks fine until u try to trace one transaction and it breaks.
Extra-Perception2408@reddit
If you can afford it, Claude is one of the beet for finance related
rtk85@reddit (OP)
That’s what I use today. However, where it fails is that it often times out on long running tasks. Right now, we use a lot of India outsourcing for overnight work where we provide details instructions and source files and they fo process it as we instruct. This typically means: 1) parsing PDFs accurately; 2) taking various files (30+) and putting into excel flag files; 3) creating tables in a specific format from source data.
I’m wondering if there’s a local model that can handle those appropriately that’s worth making the hardware investment. Currently tinkering with a bot Claude helped make that should add resiliency and uses API vs sub but to be determined on how that works out. I also built it to try out local models hosted on openrouter so I can compare results, hence the ask here as to which may be best!
Puzzleheaded_Base302@reddit
timeout can be fixed by tweaking llm settings. the most basic way is to simply increase timeout value.
gamesta2@reddit
I use docking (tesseract ocr). Then a different llm to process the extracted data.
segmond@reddit
All LLMs can do this, with proper steering.
xiaobaibaii@reddit
Currently i am using a non-LLM for a report extraction relating to credit, which has extracted pretty accurately. I try to reduce the amount of input for LLMs as much as possible if necessary, and rely on public LLMs (e.g. Claude or Codex) especially dealing with client sensitive data.
TheRealMasonMac@reddit
I'd say Gemma4 since the series is SOTA for NLP, but you could try Qwen3.5 too.
mxmumtuna@reddit
That’s going to be a multi-step pipeline.