Improving Language Models through Latent Reasoning?

Posted by ISeeThings404@reddit | LocalLLaMA | View on Reddit | 9 comments

Found this tweet online and wanted to see if anyone here had any opinions on it.

I'm an AI Researcher and have been exploring Latent Space Reasoning for a bit (mid-2024, really got into it when Meta published Coconut. This would check out in a few ways--

The perfdormance mentioned here.
The order-of-magnitude reduction when comparing Mythos and Opus 4.6 for BrowseComp.
General discussions from researchers in the space.

I've personally done some research into it, and I think it will be the future of AI and reasoning models. Too many reasons for it not to be (especially if we create a unified reasoning plane that models can plug in and out of). Too many reasons for it not to be. Wanted to get your thoughts on it, espcially if anyone else has tried it.

Did a bunch of experiments on it here, incase anyone is interested (would love to hear your experiences with it as well)- https://github.com/dl1683/Latent-Space-Reasoning/tree/main

[-]

DinoAmino@reddit

I heard Coconut wasn't able to generalize on OOD and can't scale. What has changed in this space since then? If nothing, then I fail to see a future for it.

[-]

ISeeThings404@reddit (OP)

There's beenb a lot of work, also Coconut wasn't the vest approach, more of a PoC.

If you see the experiments linked, we were able to sample from a much larger set of reasoning space, creating much richer outputs

We also did a legal specific show case over here-- https://github.com/dl1683/Latent-Space-Reasoning/blob/main/experiments/legal_showcase.json. some very interesting outpiuts

[-]

crantob@reddit

Thanks for sharing this.

[-]

Plenty_Coconut_1717@reddit

Latent space reasoning > traditional CoT. Models thinking in continuous space instead of tokens = faster + better. Coconut showed promise. Future is heading there

[-]

ISeeThings404@reddit (OP)

Yessir. Very agree

[-]

DeepOrangeSky@reddit

Can you explain more about what Latent Space Reasoning is/how it works, and why it might be the next big thing in AI, etc, in layman's terms (I am new to AI). I know I can just look it up, but if you spent the past 2 years on it, I would rather hear your version.

[-]

ISeeThings404@reddit (OP)

instead of forcing the model to pick one fragile reasoning path and commit to it immediately, what if we surfaced a few different internal states, gave them room to breathe, and then found a way to score/combine them?

Eseentially current deciding is most likely for one path. Latent Space Reasoning does several paths together and then finds ways to reason throgh them all before combining. by skipping the encode/decode phase multiple times, you get something that has the benefits of "critic" based agentic systems (having one LLM critique anotgher) but you have the efficiency

[-]

DeepOrangeSky@reddit

That sounds awesome. Ever since I watched the think-blocks of Qwen3.5 for the first time (which are very long and elaborate, and kind of broken into what looks sort of like phases when you watch it unfold, even though it's not actually doing what we're talking about here, but, still, it got me thinking), it caused me to start wondering about what might be possible if LLMs were able to do some kind of exotic multi-phase types of reasoning processes of various sorts.

I am still too big of a noob to come up with any actual formal, concrete ideas, or know what makes sense vs what would be idiotic/implausible, though, so, I mostly haven't shared the random thoughts or ideas I wondered about, about it.

Although one that I did ask about a few days ago, that is maybe somewhat related to this, in that it would need a model's reasoning to be broken up into different phases, is whether they could make a model with variable temperature. Here was my question about it in this thread. My question was whether, if let's say you had reasoning for a model broken into 4 phases, if you could have it where you set the temperature to different temperature levels for each phase. Like you could set it to 0.2 for phase 1, then 1.0 for phase 2, then 0.6 for phase 3, and then 0.2 for phase 4, or something. Rather than just having it set to 0.7 for the whole entire reasoning process from start to finish.

So, if this idea enables actual phases to exist in reasoning, I wonder if it might enable something like that, too.

[-]

ISeeThings404@reddit (OP)

That's an interesting approach. Temperature sampling for more diversity would be an interesting exoerunebt,

You might like this overview of the idea we did here