Have long context models solved attention dilution yet?
Posted by yuch85@reddit | LocalLLaMA | View on Reddit | 10 comments
I recently came across a claim that because the Gemini models have 1m context, there is no longer a need to RAG or chunk long documents.
Just wondering whether this is actually true of Gemini or any long context model out there now because the last I tried and read up on this, I was under the impression that performance drops dramatically around the 100K-200K token range, which is consistent with my real life experience.
The claim I read about was in relation to a use case where accuracy is absolutely critical (legal docs) so I'm wondering whether it's really true and whether chunking and text splitting is dead for long docs where you need 100% attention to all parts of the doc.
If it's true or coming true soon, then do we just hold out for those models then...
Mountain_Station3682@reddit
Performance at 1 million isn't going to be perfect, here is a benchmark showing how it drops off.
The first 128K tokens are almost perfect anyway.
https://contextarena.ai
Everlier@reddit
*perfect for lookup based tasks. Current benchmark is to measure how models distinguish multiple similar but non-identical sources of information in the long context.
Overall quality of reasoning and nuanced understanding still decreases significantly after model's native training window. Gemini 3 Pro is probably the most impressive one in this aspect.
National_Meeting_749@reddit
I would also take the first 128k with a grain of salt. Almost is doing a lot of work in that sentence.
yuch85@reddit (OP)
Yeah my thoughts exactly. Maybe ok depending on what you're doing but for some use cases "almost" isn't good enough.
National_Meeting_749@reddit
Personally, until someone sets up a robust verification system I'm not sure LLMs are up for legal work yet.
MrPecunius@reddit
Can't be worse than some of the utter crap I've seen come out of family law practices (plural) in the days before LLM assistance was available. One firm stated in a petition that my grandfather modified his trust two years after he died ... and that was far from the worst.
National_Meeting_749@reddit
Did they just whole cloth make up case law? Lmao
MrPecunius@reddit
Even better: no citations whatsoever in that petition that I can recall!
The standards are insanely low in the 8-10 firms I've dealt with in the past 15 years or so. I've been directly and indirectly involved in various family trust controversies, some manufactured by the law firms to generate billable hours. No one cares about the work product.
National_Meeting_749@reddit
Lmao, well. I'll have to take your word for it.
All I'm saying is that I wouldn't risk my license, if I had it, on unverified LLM output.
Chromix_@reddit
Ah, that benchmark made me rethink of Kimi-Linear-48B-A3B. The long context performance seems fantastic compared to other open models. A llama.cpp PR for it was created just yesterday. Hopefully we'll get support there soon.
The benchmark itself misses newer models like Qwen3-Next. Interestingly the gpt-oss models deteriorate even faster there than on fiction.liveBench (which that site also links).