Every SOTA on its own data
Posted by Cheryl_Apple@reddit | LocalLLaMA | View on Reddit | 5 comments
Feels like every new RAG paper shows huge gains… but always on their own curated dataset.
Once you swap in messy PDFs, private notes, or latency-sensitive use cases, the story changes fast.
Anyone here actually compared different RAG flavors side by side? (multi-hop vs. rerankers, retrieval-aug agents vs. lightweight hybrids, etc.)
What did you find in practice — stability, speed, or truthfulness?
Would love to hear war stories from real deployments, not just benchmark tables.
ArtisticKey4324@reddit
Em dash detected, slop rejected
Cheryl_Apple@reddit (OP)
and how to chose a rag framework which real suitable my own dataset ?
ArtisticKey4324@reddit
I've just been sitting in a puddle of my own shit and urine for days, how long till things start growing down there?
Hoblywobblesworth@reddit
Optimise for your own dataset to get to SOTA on your own dataset. Yep, sounds about right.
Cheryl_Apple@reddit (OP)
But which framework is sota for my own dataset ? How to chose ? Do you have some idea , for example the rag framework ?