Yi-34B-200K model update: Needle-in-a-Haystack improved from 89.3% to 99.8%

Posted by rerri@reddit | LocalLLaMA | View on Reddit | 10 comments

Copy-pasta from model card:  >The long text capability of the Yi-34B-200K has been enhanced. > >In the "Needle-in-a-Haystack" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pretrain the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance. [https://huggingface.co/01-ai/Yi-34B-200K](https://huggingface.co/01-ai/Yi-34B-200K)

Reply to Post

10 Comments

[-]

bassgojoe@reddit

How are you folks using these base models? My use case for long context models, summarizing documents, seems like it wouldn't with a base model (versus chat/instruct). How are y'all doing it?

[-]

Beginning_Category64@reddit

Try giving some In-context examples. It'll probably work then.

[-]

VicboyV@reddit

Glad to see context length being taken seriously. You can’t call it intelligent if it’s got short term memory.

[-]

Mescallan@reddit

This is still equivalent to short term memory. The model is not altering it's weights or permently able to recall information. 200k at 99% is superhuman short term memory, but context length is not going to give us long term memory in the way humans have it.

[-]

Vehnum@reddit

If that is the case, when/how will their long-term memory be improved?

[-]

Mescallan@reddit

It's very very likely future models will be able to alter their weights based on novel stimulus. That is what we do to build an evolving, internal world model. Currently weights are static without manual intervention through training/fine-tuning, but for future models to exist in the world and adapt to whatever niche they are being used for, they will need a way to update their weights, or have a new layer architecture that can be updated automatically. We have three types of memory, short term, episodic, and long term. current models only have short term (context), and long term (weights), but there is no communication between the two.

[-]

aggracc@reddit

It's extremely unlikely that would ever be the case since inference is thousands of times cheaper than fine tuning.

[-]

SeymourBits@reddit

Agree. I think clever RAG is actually pretty close to how longer-term memory actually works. A human brain with 100% perfect memory access would probably malfunction.

[-]

aggracc@reddit

A human brain can learn but the inference cost is ridiculously high for it. A better comparison is insects whose brains are essentially hardwired, just like computer models, and basically free when it comes to inference. Again just like computer models. Until we start using some sort of neuromorphic hardware it is highly unlikely that computer models will be anything at all like the human brain. Now the question is if we want models to be anything like the human brain. Do we want street cleaning robots to learn that if they murder all humans they will have to clean so much less often? I think that for the majority of tasks dumb models which can't learn in the real world are what we want.

[-]

Paradigmind@reddit

They would need to learn murdering without leaving a mess.