Building a Production-Grade RAG Chatbot for a Complex Banking Site, Tech Stack Advice Needed?

Posted by codexahsan@reddit | LocalLLaMA | View on Reddit | 5 comments

Hey everyone,

I’m currently working on turning a fairly large and structured financial website into an AI-powered knowledge assistant (RAG-based). The site itself isn’t trivial, it has multiple product categories (cards, loans, accounts), nested pages, FAQs, and a mix of static + dynamic content.

My goal is to move beyond basic keyword search and build something that can:

understand user intent
retrieve relevant information across pages
return structured, clear answers (not just summaries)

Planned stack so far:

Backend: FastAPI
RAG orchestration: LangChain
Database: PostgreSQL
Vector DB: Pinecone

Before I go too deep, I’d like some guidance from people who’ve built similar systems.

Main things I’m thinking about:

For crawling: should I rely on existing tools (like Playwright/Scrapy pipelines), or build a more custom structured extractor from the start?
For retrieval: is Pinecone a solid long-term choice here, or would something like a self-hosted vector DB be better?
How would you structure the ingestion pipeline for a site with mixed content (product pages vs FAQs vs general info)?
My plan is: Scrape -> Markdown Conversion -> Chunking -> Pinecone Upsert -> FastAPI/LangChain RAG. Does this order make sense, or am I missing a crucial step like a Reranker or PII masking (since it's banking)?

Current rough flow in my head:

Crawl and extract structured content
Clean + chunk with metadata
Store embeddings
Build retrieval + re-ranking layer
Generate answers with grounding

I’m trying to build this properly (not just a basic “chat over docs”), so any advice on architecture decisions or common mistakes would really help.

Thanks in advance.

[-]

Strong_Worker4090@reddit

Nice stack, sweet project! Usually existing tools like Scrapy or Playwright can save you time. I general I try to use tooling that exists when I can rather than re invent the wheel, but it depends on how complex the site’s structure is. If you’re dealing with dynamic content (e.g., account-specific FAQs), Playwright’s ability to handle JS rendered pages can help.

For RAG specifically, I’d focus early on building a good eval workflow. Too many teams skip this and end up tuning blindly. Write 20-30 "gold standard" queries with expected outputs and test retrieval + generation against those. It’s not perfect but helps you measure progress as you iterate.

[-]

codexahsan@reddit (OP)

Appreciate it, yeah I might use Playwright for now since there’s some dynamic content involved, and it gives more control over extraction.

Also agreed on evals, a couple of others mentioned that too, so I’ll prioritize building a small gold query set early maybe 20–30 queries will work.

should I evaluate retrieval and generation separately, or end-to-end from the start?

[-]

Strong_Worker4090@reddit

Yeah, I’d do both, but separate them first. If retrieval is weak, I've seen generation evals get pretty noisy b/c the model never had the right context to begin with. I’d start by testing whether the system pulls the right chunks/pages for a small set, then evaluate answer quality on top of that, then run end-to-end once both are decent. Makes it way easier to see where things are actually breaking instead of calling it a generic 'RAG problem.'

[-]

Exact_Guarantee4695@reddit

nice project, this is exactly where teams burn time. biggest win for us on a similar finance corpus was building eval queries first, then tuning chunking and metadata filters before touching prompts. did you already make a small must-answer set from real support questions?

[-]

codexahsan@reddit (OP)

Not yet, but that makes a lot of sense. I’ve been focusing more on the ingestion + retrieval side so far, but defining a small eval set would probably make tuning much more objective