Cloud the 360M model learn reasoning ?
Posted by absurd-dream-studio@reddit | LocalLLaMA | View on Reddit | 9 comments
Show your perspective :)
Posted by absurd-dream-studio@reddit | LocalLLaMA | View on Reddit | 9 comments
Show your perspective :)
LagOps91@reddit
if models with billion parameters can't learn reasoning, neither can small models. i wouldn't call what models can do right now "reasoning".
absurd-dream-studio@reddit (OP)
Just want to know the limit of the transformer model , if we train it with a extremely high quality reasoning data , will it know how to do real inference ? and then it will know how to reach to correct answer and use this data to improve itself
LagOps91@reddit
i don't think that the current architecture can actually ever do reasoning the way a human does. we humans think in concepts before we say or type anything, but an llm is entirely focussed on predicting the next token with no higher level thinking involved. a new architecture is needed, maybe something along the line of "large concept models" that have been proposed, which work on concepts, not tokens.
absurd-dream-studio@reddit (OP)
but token just some kind of compression of concepts , and reasoning in this architecture is using the compressed concept to find the truth by not using random guess , I think this may be the same of human thinking ?
LagOps91@reddit
no it's not the same as human thinking at all. LLMs put out token after token, but don't have any plan as to how the sentence they write is supposed to look like or what it's even supposed to mean. They just chain together what they deem probable tokens.
absurd-dream-studio@reddit (OP)
when I solving math problem , I will try to roll out the first step and observe the result then decide what should I do next , this whole process just like what llm does. I think plan can also construct by predict next token.
The_GSingh@reddit
I’d say the absolute minimum is 32b to get anywhere good. Look around at all the reasoning models like qwen’s. Most are 32b. It’s a good mix of actually working and actually being able to run locally.
Anything below those params, the performance degrades a lot. Rn qwen’s 0.5b models don’t even seem to fully understand English/language. I’d be surprised if a model smaller than that could actually do cot/reasoning.
lavilao@reddit
360M? bro am waiting for the smollm2-135m-reasoning variant 🤣
absurd-dream-studio@reddit (OP)
waiting for someone use "O1-OPEN/OpenO1-SFT-Ultra" to fine tune the model , the result will be interesting