Building a Production-Grade RAG Chatbot for a Complex Banking Site, Tech Stack Advice Needed?
Posted by codexahsan@reddit | LocalLLaMA | View on Reddit | 5 comments
Hey everyone,
I’m currently working on turning a fairly large and structured financial website into an AI-powered knowledge assistant (RAG-based). The site itself isn’t trivial, it has multiple product categories (cards, loans, accounts), nested pages, FAQs, and a mix of static + dynamic content.
My goal is to move beyond basic keyword search and build something that can:
- understand user intent
- retrieve relevant information across pages
- return structured, clear answers (not just summaries)
Planned stack so far:
- Backend: FastAPI
- RAG orchestration: LangChain
- Database: PostgreSQL
- Vector DB: Pinecone
Before I go too deep, I’d like some guidance from people who’ve built similar systems.
Main things I’m thinking about:
- For crawling: should I rely on existing tools (like Playwright/Scrapy pipelines), or build a more custom structured extractor from the start?
- For retrieval: is Pinecone a solid long-term choice here, or would something like a self-hosted vector DB be better?
- How would you structure the ingestion pipeline for a site with mixed content (product pages vs FAQs vs general info)?
- My plan is: Scrape -> Markdown Conversion -> Chunking -> Pinecone Upsert -> FastAPI/LangChain RAG. Does this order make sense, or am I missing a crucial step like a Reranker or PII masking (since it's banking)?
Current rough flow in my head:
- Crawl and extract structured content
- Clean + chunk with metadata
- Store embeddings
- Build retrieval + re-ranking layer
- Generate answers with grounding
I’m trying to build this properly (not just a basic “chat over docs”), so any advice on architecture decisions or common mistakes would really help.
Thanks in advance.
Strong_Worker4090@reddit
Nice stack, sweet project! Usually existing tools like Scrapy or Playwright can save you time. I general I try to use tooling that exists when I can rather than re invent the wheel, but it depends on how complex the site’s structure is. If you’re dealing with dynamic content (e.g., account-specific FAQs), Playwright’s ability to handle JS rendered pages can help.
For RAG specifically, I’d focus early on building a good eval workflow. Too many teams skip this and end up tuning blindly. Write 20-30 "gold standard" queries with expected outputs and test retrieval + generation against those. It’s not perfect but helps you measure progress as you iterate.
codexahsan@reddit (OP)
Appreciate it, yeah I might use Playwright for now since there’s some dynamic content involved, and it gives more control over extraction.
Also agreed on evals, a couple of others mentioned that too, so I’ll prioritize building a small gold query set early maybe 20–30 queries will work.
should I evaluate retrieval and generation separately, or end-to-end from the start?
Strong_Worker4090@reddit
Yeah, I’d do both, but separate them first. If retrieval is weak, I've seen generation evals get pretty noisy b/c the model never had the right context to begin with. I’d start by testing whether the system pulls the right chunks/pages for a small set, then evaluate answer quality on top of that, then run end-to-end once both are decent. Makes it way easier to see where things are actually breaking instead of calling it a generic 'RAG problem.'
Exact_Guarantee4695@reddit
nice project, this is exactly where teams burn time. biggest win for us on a similar finance corpus was building eval queries first, then tuning chunking and metadata filters before touching prompts. did you already make a small must-answer set from real support questions?
codexahsan@reddit (OP)
Not yet, but that makes a lot of sense. I’ve been focusing more on the ingestion + retrieval side so far, but defining a small eval set would probably make tuning much more objective