Have long context models solved attention dilution yet?

Posted by yuch85@reddit | LocalLLaMA | View on Reddit | 10 comments

I recently came across a claim that because the Gemini models have 1m context, there is no longer a need to RAG or chunk long documents.

Just wondering whether this is actually true of Gemini or any long context model out there now because the last I tried and read up on this, I was under the impression that performance drops dramatically around the 100K-200K token range, which is consistent with my real life experience.

The claim I read about was in relation to a use case where accuracy is absolutely critical (legal docs) so I'm wondering whether it's really true and whether chunking and text splitting is dead for long docs where you need 100% attention to all parts of the doc.

If it's true or coming true soon, then do we just hold out for those models then...