Hybrid search (BM25 + vectors + RRF) barely improved over pure semantic on 600 technical docs. What am I missing?

Posted by Fuzzy-Layer9967@reddit | LocalLLaMA | View on Reddit | 8 comments

My setup: +600 technical docs (50 pages avg, lots of schemas/diagrams), chunked and embedded with BGE-M3, OpenSearch as vector DB. Semantic retrieval was ok but not great on our technical docs.

Read everywhere that hybrid search with RRF was supposed to be the next level.
Implemented it, BM25 + vector + RRF fusion -> Result: almost no improvement. Like, negligible.

Am I missing something obvious?
Is hybrid overhyped on technical docs with lots of schemas/tables or is my setup just broken?