Deepseek announces R1 lite , beats o1 preview on some benchmarks
Posted by TheLogiqueViper@reddit | LocalLLaMA | View on Reddit | 62 comments
Posted by TheLogiqueViper@reddit | LocalLLaMA | View on Reddit | 62 comments
SadWolverine24@reddit
I need Qwen 2.5 with reasoning.
TheLogiqueViper@reddit (OP)
Test time training is also cool , i cant imagine if these models get test time compute and test time training both
junior600@reddit
I really hope they are going to release a version that also fits in my rtx 3060 12 gb lol
__Maximum__@reddit
This one probably wouldn't even fit even if you quantise to max, but I can see that smaller models will get this extended inference ability in near future. I don't see any reason why this can't apply to 3b, 7b, 32b models. With those, even long context wouldn't be a big problem.
Outrageous-Ad5641@reddit
R1 lite is based on the v2 lite MoE with 16B params and 2.4 active params
Fusseldieb@reddit
If we have 4o-level models on 8GB I'll already be jumping from happiness. But this will certainly take a few more years, at least, I guess.
saraba2weeds@reddit
I've tried deepseek 33B once, and then go back to llama 70b. It seemed that deepseek have little idea what I want it to do.
jeffwadsworth@reddit
Haha. Deepseek is amazingly good.
opi098514@reddit
Good thing this isn’t deepseek 33B
Redoer_7@reddit
Also they promise a release of the full version model
fiery_prometheus@reddit
leme just dig out my 8x4090 rig from my shed
In all seriousness, I really hope the q2 and q3 will be runnable without too much loss with whatever state of the art method is available at the time for quantization.
acc_agg@reddit
Just wait for the 5090 version. You can save a whole two cards.
mycall@reddit
I would be fine with slow inference if it means it works on anything.
Inspireyd@reddit
I didn't know that. That's amazing. This is just the Lite version.. there will be another full version. This is awesome
DarkArtsMastery@reddit
Impressive. Makes me even more curious about upcoming Qwen 3, which will hopefully be able to do the same thinking feature and given how awesome Qwen 2.5 was, I think we will be very happy with what we have in just a few short weeks.
Compare this to a situation just a year ago when we did not have anything remotely competing with GPT4 locally, now these models appear so fast that it is ridiculous. I would never pay for AI these days as open-source just proves to be the way forward.
XForceForbidden@reddit
I pay deepseek for use their api, but the total spent can not exceed $3.
Inspireyd@reddit
you don't pay for any?
DarkArtsMastery@reddit
Zero. Not even Sonnet.
Inspireyd@reddit
That's amazing. I used to pay, but I won't pay anymore. Open source models are getting more and more capable. And about Qwen 3, it seems to me that they are already developing a reasoning version as well.
jeffwadsworth@reddit
Wow. It solves the Aunt Agatha riddle. Only o1 could do this before. Let’s analyze the situation step by step:
Let’s denote: - ( A ) for Agatha - ( B ) for the Butler - ( C ) for Charles
Step-by-Step Analysis:
( A ) does not hate ( B ).
The butler hates everyone not richer than Aunt Agatha:
If ( C ) is not richer than ( A ), then ( B ) hates ( C ).
The butler hates everyone whom Agatha hates:
Since ( A ) hates ( C ), ( B ) hates ( C ).
Charles hates no one that Agatha hates:
Since ( A ) hates ( A ), ( C ) does not hate ( A ).
No one hates everyone:
Conclusion:
Given the conditions: - The killer must hate the victim and be no richer than the victim. - ( A ) hates ( C ) and ( A ), but not ( B ). - ( B ) hates ( C ) and possibly ( A ) if ( B ) is not richer than ( A ). - ( C ) does not hate ( A ).
Since ( C ) does not hate ( A ), ( C ) cannot be the killer.
If ( B ) is not richer than ( A ), then ( B ) hates ( A ), and ( B ) could be the killer.
However, if ( B ) is richer than ( A ), then ( B ) does not hate ( A ), and ( B ) cannot be the killer.
Since ( A ) hates ( A ) and is no richer than herself, ( A ) could be the killer of herself.
Final Answer:
Agatha killed herself.
lolwutdo@reddit
How many parameters is this model?
zjuwyz@reddit
If they follows the DeepSeek-V2 and DeepSeek-V2-Lite pattern, that's a 2.4B/16B MoE。
RevoDS@reddit
Looks like there is still no moat in LLMs after all
No_Step3864@reddit
there will only be a chip-moat.
Healthy-Nebula-3603@reddit
models are getting better and better and smaller
yuicebox@reddit
Yes but there are still a lot of rumors of a scaling wall at the major LLM companies
Healthy-Nebula-3603@reddit
So far we don't see it
I think such rumors is kind of how people are cope.
yuicebox@reddit
Small models are the area where the most improvement happens. Why is opus 3.5 not out? OpenAI has made statements about not being happy with performance of their latest models considering the data volume and training time.
When was the last major leap in large SoTA model performance? Not small model performance or open source model performance, top tier model performance. Was it o1 preview?
Healthy-Nebula-3603@reddit
"about not being happy with performance of their latest models considering the data volume and training time. "
Can you link me that official statement?
Opus is/was just too heavy and expensive..
Do you remember how expensive is 1m tokens from it?
I think they are preparing something similar to o1 currently like sonnet 4.
Opus is too heavy and too expensive currently.
Maybe in few years with better hardware...
Zephandrypus@reddit
And we'll beat that wall through new architectures and techniques. Mixture of experts and reasoning LLMs are still hot off the press with drastic improvements, and we'll just keep exploring.
yuicebox@reddit
I completely agree that new architectures can fix the problem. That said, more is not new
Nyghtbynger@reddit
When your cutting edge flagship model gets beaten by "lite"
RabbitEater2@reddit
Isn't o1 preview a 'lite' version of o1 though?
HMikeeU@reddit
Isn't that o1-mini?
eposnix@reddit
From what I understand, o1-preview is their first test of the o1 architecture. Then they created o1 and o1-mini is a distilled version of that.
Single_Ring4886@reddit
I think you are right and reason they did not released ful o1 is well that it "thinks" and whe you start thinking you sometime stray off not completing task...
opi098514@reddit
This statements makes me feel heard, and attacked at the same time.
Healthy-Nebula-3603@reddit
soon will release o1 .... are literally forced now ... lol
my_name_isnt_clever@reddit
Why are these companies so bad at names?? I hate that we have to compare R1 with o1.
estebansaa@reddit
o1 already feeling outdated! Gemini is kicking their ass now, without even CoT.
Caladan23@reddit
Try with 3000 lines of code as input and multiple iterations real-world scenario, instead of a 1 message test riddle, before you judge.
CheatCodesOfLife@reddit
+1
I fine tuned some models to do which gave outputs similar to the screenshot, solved some riddles. But for anything serious they were useless.
If I've been stuck on a problem for a while with sonnet/qwen, I often toggle over to o1-mini (in open-webui) and tell it "Claude and I have been stuck on for a while. What are we doing wrong?"
A lot of the time, it finds the issue with 1 reply.
Inspireyd@reddit
This is something I noticed... the current Gemini Exp 1114 is able to do reasoning exercises that only the o1 could do. Now, as if that weren't enough, DeepSeek is launching an R1 to further reduce the importance of the o1.
estebansaa@reddit
yes, that last Gemini model is great. One issue, the context window is much smaller than their usual 2M conttext.
LienniTa@reddit
managed to infinite loop it right with the first question to come up with a nickname idea
fairydreaming@reddit
I checked it on 25 quizes from farel-bench benchmark (aunt or uncle relationship) and it got 92% right (two answers were wrong). That's better than all LLMs I benchmarked so far except o1-mini and o1-preview (they got 100% on full set of 50 quizes for this relationship). Definitely a step in the right direction. This is going to put some serious pressure on OpenAI. Can't wait to see it applied to the large DeepSeek model!
Aymanfhad@reddit
It's astonishing that free and open-source AI will soon and possibly surpass O1 Preview, which is closed and paid AI. Then they claim that training AI is expensive. They just want your damn money.
Healthy-Nebula-3603@reddit
Is expensive ....
SquashFront1303@reddit
Finally an opensource O1 wesee more high quality reasoning models trained on its CoT what a time to be alive 🎉
Enough-Meringue4745@reddit
Once open source we can make that distinction
Dorkits@reddit
Where is the link to test it?
TheLogiqueViper@reddit (OP)
chat.deepseek.com
Dorkits@reddit
Thanks
Different-Effect-724@reddit
Rei1003@reddit
I hate people can’t name their products independently.
No_Afternoon_4260@reddit
Isn't nous released a similar api, was it in beta?
hapliniste@reddit
I think it wasn't actually the same thing. It was a standard LLM with test time compute using known techniques
No_Afternoon_4260@reddit
What do you call test time compute ?
hapliniste@reddit
Doing more calculations at inference. In the case of nous research it's things like mixture of agents that are used while for o1 and deepseek it's chain of thoughts (but with a special training method).
tucnak@reddit
Think; there's a reason why not a single lab in the West had released o1 of their own. It's because they're not convinced that this particular approach is worthwhile. Since the o1-preview release, Sonnet had outperformed it where it matters—using traditional autoregression. In the measures where it didn't, that could be attributed to the datasets. In the mind of /r/LocalLLaMA stans, RL is fucking pixie dust. The industry is not convinced. Everybody experiments with RL, it's just that OpenAI are the only ones to whom it made financial sense to release a "RL wonder-model."
Just the other day, so-called llava-o1 was discussed. If you had actually read the paper, you would know that the o1 connection is made through Evaluation of openai o1: Opportunities and challenges of AGI—yet another paper mill product with 50 or so authors. They created that 280-page monstrosity less than two weeks after the o1 release. We don't know what o1 is doing, but it seems the Chinese have figured it out in the matter of days... They say their model performs well on visual benchmarks, but it's probably owing to the fact that they're overfitting these benchmarks in the first place.
Educational_Gap5867@reddit
The best/worst part is that 4o has started beating it too in the more non gameable benchmarks. It does consistently better on strictly reasoning tasks though and this part IS true still.
tucnak@reddit
I suspect that 4o and o1 fine-tuning is handled by two non-overlapping teams, with much compartmentation between them, and they're probably not getting the same data. It seems like they'd initially starved 4o off most recent data months before release, and had only trickled it fully weeks after the release—so that o1 would appear stronger momentarily. I stand by my initial assessment that o1 is a revolution in accounting, not capability; the industry agrees, it seems.