Every SOTA on its own data

Posted by Cheryl_Apple@reddit | LocalLLaMA | View on Reddit | 5 comments

Feels like every new RAG paper shows huge gains… but always on their own curated dataset.
Once you swap in messy PDFs, private notes, or latency-sensitive use cases, the story changes fast.

Anyone here actually compared different RAG flavors side by side? (multi-hop vs. rerankers, retrieval-aug agents vs. lightweight hybrids, etc.)
What did you find in practice — stability, speed, or truthfulness?

Would love to hear war stories from real deployments, not just benchmark tables.