Any idea how Meta did this?

Posted by QuackerEnte@reddit | LocalLLaMA | View on Reddit | 8 comments

Any idea how Meta did this?

Hey, any idea what is meant by compression here and how they did it?

Is it intelligent summarizing? Or actual Test-Time-Training on the reasoning traces using special layers? Something else? And what do they mean by "[...]the length penalty CAUSES thought compression [...]"? I can't imagine how this is a cause of RL training penalty rather than fundamentally different architecture to normal LLMs.

I could not find any meaningful research papers on the topic that could reveal the exact inner workings of this. This seems like a genuinely useful feature for local use. Any ideas?

I'd love if someone would link a paper or two or something like that to help figure this out.

Thank you.

[source](https://ai.meta.com/blog/introducing-muse-spark-msl/)