Hybrid search (BM25 + vectors + RRF) barely improved over pure semantic on 600 technical docs. What am I missing?
Posted by Fuzzy-Layer9967@reddit | LocalLLaMA | View on Reddit | 8 comments
My setup: +600 technical docs (50 pages avg, lots of schemas/diagrams), chunked and embedded with BGE-M3, OpenSearch as vector DB. Semantic retrieval was ok but not great on our technical docs.
Read everywhere that hybrid search with RRF was supposed to be the next level.
Implemented it, BM25 + vector + RRF fusion -> Result: almost no improvement. Like, negligible.
Am I missing something obvious?
Is hybrid overhyped on technical docs with lots of schemas/tables or is my setup just broken?
DistanceAlert5706@reddit
Missing proper benchmarks. They actually will show you on hit at 1,5,10 what's wrong. Also missing reranker step. Play around with RRF constants, candidates pools.
Fuzzy-Layer9967@reddit (OP)
Didn't mention but we have reranker, feature flagged so tried With and without...
I'll do that tks
CommonPurpose1969@reddit
Once the number of chunks surpasses 10.000 documents, vector embeddings won't work anymore. There was a paper to that problem. That is why you need BM25 too.
Fuzzy-Layer9967@reddit (OP)
Ohh, very interesting. I'll try on a restricted version of our data. Tks !
TacGibs@reddit
What models are you using ? How many vectors per embeddeding ?
ATM the best embeddeding and reranker are Qwen3 8B VL embeddeding and reranker.
Yes they're big, but for a reason :)
I'll never understand people using very small models and expecting wonderful results on large documents.
llm_practitioner@reddit
You're definitely not crazy, hybrid search with RRF gets treated like a magic bullet, but it often falls flat on highly structured data.
The biggest culprit here is likely your parsing and chunking strategy. If those schemas, diagrams, and tables were processed with a standard text splitter, they likely turned into a wall of garbled text. BM25 can't effectively keyword-match against a broken table layout, so the sparse retrieval isn't adding any real value to the dense vectors.
Also, BGE-M3 is already a powerhouse that natively supports its own sparse (lexical) representations. Stacking a separate BM25 pipeline on top of it and mashing them together with a naive RRF might actually be adding noise rather than signal. I'd highly recommend looking into layout-aware chunking (to keep your technical schemas strictly intact) before worrying about tweaking the retrieval algorithm.
Fuzzy-Layer9967@reddit (OP)
Thanks for the answer!
For the chunking strategy I heavily rely on Docling, they have a chunking strategy included in their parsing stack. So data is well structured.
But you get a point with separated bm25, I'll check the implementation with the team tomorrow tks.
llm_practitioner@reddit
Ya, Good Luck!🤞