TheaterFire

DeepSeek-R1-0528 Official Benchmarks Released!!!

Posted by Xhehab_@reddit | LocalLLaMA | View on Reddit | 155 comments

Reply to Post

155 Comments

MaskedSaqib@reddit

And the good part is they just refine what they had
View on Reddit #58102979

bjivanovich@reddit

Como puedo lograr que no piense o que sea menos extenso? Thought for 24 minutes 16 seconds Este es el prompt: Write a Python program that shows 20 balls bouncing inside a spinning heptagon: All balls have the same radius. All balls have a number on it from 1 to 20. All balls drop from the heptagon center when starting. The balls should be affected by gravity and friction, and they must bounce off the rotating walls realistically. There should also be collisions between balls. The material of all the balls determines that their impact bounce height will not exceed the radius of the heptagon, but higher than ball radius. All balls rotate with friction, the numbers on the ball can be used to indicate the spin of the ball. The heptagon is spinning around its center, and the speed of spinning is 360 degrees per 5 seconds. The heptagon size should be large enough to contain all the balls. Do not use the pygame library; implement collision detection algorithms and collision response etc. by yourself. The following Python libraries are allowed: tkinter, math, numpy, dataclasses, typing, sys. All codes should be put in a single Python file.
View on Reddit #57862452

cvjcvj2@reddit

DeepSeek-R1-Qwen3-8B distill is yet more awesome!
View on Reddit #57562827

GhostGhazi@reddit

How much RAM needed for that? Can I run it on Ryzen CPU?
View on Reddit #57584907

teachersecret@reddit

8B is so small you can run it at speed on cpu at 4 bit - I was running one of these at decent speed on a decade old iMac.
View on Reddit #57623386

GhostGhazi@reddit

Thank you appreciated, how does this model hold up to Gemma3:4b?
View on Reddit #57646474

teachersecret@reddit

I mean, it’s benchmarking up with a 200b model. I’d say it does ok :p
View on Reddit #57764128

TheOneThatIsHated@reddit

I got around 7gb for 4bit
View on Reddit #57606533

AppealSame4367@reddit

I still can't grasp it. Did we really just get SOTA-like AI on a Laptop?
View on Reddit #57568546

TheLieAndTruth@reddit

soon you getting SOTA at home in your fridge!!!
View on Reddit #57584144

AppealSame4367@reddit

Never say never. Better ai, enables better optimization, enables better ai. Seems like the progress in llms optimization is even speeding up in the last weeks. [https://www.reddit.com/r/MachineLearning/comments/1kx3ve1/r\_new\_icml25\_paper\_train\_and\_finetune\_large/](https://www.reddit.com/r/MachineLearning/comments/1kx3ve1/r_new_icml25_paper_train_and_finetune_large/)
View on Reddit #57586454

mi_throwaway3@reddit

What would I need to run this locally?
View on Reddit #57583484

TheTerrasque@reddit

define "run"
View on Reddit #57612524

mi_throwaway3@reddit

Whatever it takes to bring up a chat locally.
View on Reddit #57657635

TheTerrasque@reddit

I mean, you can run it on what you have now, as long as you have disk space. It will be tens of seconds to minutes per token, and a response might take days, but it runs. If you want a fast, fluent response and high / original quant, like the online service(s), we're talking magnitude $100.000 - and most likely some re-wiring of your house electrical. Between those there's a sliding scale, with various tradeoffs. If you're okay with low quants and 1-4 token a second, then you "just" need a machine with ~150-200gb ram, and preferably a 16+ gb graphics card for main layers.
View on Reddit #57661511

mi_throwaway3@reddit

Thanks, this answer is good, exactly what I was looking for.
View on Reddit #57757810

ResidentPositive4122@reddit

And qwen3-8b distill !!!
View on Reddit #57560327

ASTRdeca@reddit

is the distill also a reasoning mode? does it still use the same /think /nothink format of regular qwen3?
View on Reddit #57573962

colarocker@reddit

/nothink in the systemprompt did not work for me in the DeepSeek-R1:8b-0528-Qwen3-q4\_K\_M
View on Reddit #57578196

Sylanthus@reddit

Qwen3 needs it to say /no_think
View on Reddit #57715854

colarocker@reddit

yes but won't work, but ollama released a new update two days ago where one can use /set think and /set nothink, which works with the new r1/qwen3 model.
View on Reddit #57738453

phenotype001@reddit

If they also distill the 32B and 30B-A3B it'll probably become the best local model today.
View on Reddit #57561913

usernameplshere@reddit

The 30B model is already such a good alrounder, this getting improved would be even more nuts. Would love to see it.
View on Reddit #57563240

-dysangel-@reddit

Agreed. 30B is smart. I found it was rambling way too much to be useful for running in Roo, but then I remembered that you can turn off thinking. So to anyone else thinking of trying it out, just append /no\_think to the model's system prompt and it seems to me to be the best all rounder open source model for local coding, with a large context window and good TTFT. I'm looking forward to at some point trying out R1-0528 or V3-0324 with carefully managed system prompts/context. Not sure if yet RooCode's custom agents will be enough, or if I'll have to manually tweak Copilot when it's finally open sourced.
View on Reddit #57567005

hacktheplanet_blog@reddit

You seem pretty immersed and knowledgeable so I would be curious to hear what your experience is with the GGUF mentioned by danigoncalves. Would appreciate it but I understand if I/we don’t hear from you.
View on Reddit #57679658

-dysangel-@reddit

I did try the 8B distilled version earlier today. Not sure if it was the bartowski version, but I ran it through my usual "build tetris in a single html page" test. It had some syntax errors, so I gave it a few shots at debugging, then just deleted it when it failed. I just tried the same thing with standard Qwen3 8B and the behaviour was the same - it's first attempt was buggy, and it wasn't able to fix the bug after a few tries. Iirc Qwen2.5 7B Coder was better at this test, though it was not consistent. The Qwen 3 series have good aesthetics and are pleasant to chat to, including the 8B model. I expect it might be decent at front end design if that's important for you. I'm really looking forward to if/when they bring out the Qwen3 Coder series
View on Reddit #57706401

Ambitious-Most4485@reddit

Thanks for sharing will delve into it and run some tests
View on Reddit #57573367

danigoncalves@reddit

Bartowski already release the GGUFs :D bartowski/deepseek-ai\_DeepSeek-R1-0528-Qwen3-8B-GGUF
View on Reddit #57569610

giant3@reddit

What quant is better? Is Q4_K_M enough? Anyone who has tested this quant?
View on Reddit #57572353

BlueSwordM@reddit

Q4_K_XL from unsloth would be your best bet.
View on Reddit #57593426

poli-cya@reddit

I tend towards the xl unsloth quants now. Q4kxl seems like a great middleground
View on Reddit #57579191

danigoncalves@reddit

That should be more than enought, I am testing it right now and gosh I think A LOT LONGER than the previous models I tried.
View on Reddit #57575193

Any_Pressure4251@reddit

is it as good as Devstral, that model is brilliant at coding and tool use.
View on Reddit #57578770

ResidentPositive4122@reddit

Is the 32b-base out? I thought there was no base published for it.
View on Reddit #57562104

DepthHour1669@reddit

Nope, it’s not released. We just have 30b https://huggingface.co/Qwen/Qwen3-30B-A3B-Base
View on Reddit #57569043

phenotype001@reddit

Oh. I didn't consider that the base model is needed.
View on Reddit #57562233

lordpuddingcup@reddit

This I don’t get why they wouldn’t do the a3b it’s so good
View on Reddit #57562435

LoSboccacc@reddit

Oof those scores imagine a 14b distill beating gemini flash 2.5 
View on Reddit #57578956

jadbox@reddit

\+1 really want to see a 12-16b distill
View on Reddit #57584556

TerminalNoop@reddit

Yeah, anything that can still run well wtihin 24gb vram :D
View on Reddit #57700525

lemon07r@reddit

Paging u/_sqrkl Any chance we could get a few benchmarks of the new 8B distill to see how it holds up against the qwen instruct? The distill is trained from base qwen so it would be interesting to see who trained base qwen 8b better. I remember the old R1 distills werent actually very good in actual, and just benchmarked well in a few benchmarks. I kinda trust your leaderboard more than these first party results.
View on Reddit #57607262

_sqrkl@reddit

Just added this one to longform writing. Seems like they got the distil right this time. it beats baseline qwen3-8b handily. It even beat gemma-3-12b
View on Reddit #57611437

lemon07r@reddit

Yeah I was super impressed, and I'm usually quite skeptical, not really that easily bought into hype. I remember not liking any of the old R1 distills at all. Glad we were able to confirm with your tests that it wasnt just lucky output.
View on Reddit #57684962

Any-Championship-611@reddit

So I'm pretty new to this. Does reasoning make the AI actually smarter or does it just exist so the user can follow its reasoning process. So far I always used non reasoning models because it just uses up tokens and I didn't see the point of it.
View on Reddit #57648371

ResidentPositive4122@reddit

> Does reasoning make the AI actually smarter This is still up for debate, I think. What's clear is that performance on easily verifiable tasks increase (math, code, etc). What's not clear is how / why it works. I've seen a recent paper that put semi-random stuff in the "thinking" part, and still saw improvements in the final scores, so there's probably more research to be done in this area.
View on Reddit #57648638

danielhanchen@reddit

I made some dynamic quants for Qwen 3 distilled here https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF I'm extremely surprised DeepSeek would provide smaller distilled versions - hats off to them!
View on Reddit #57573479

dadidutdut@reddit

hey appreciate your work! does it support /no_think flag? thanks!
View on Reddit #57622511

danielhanchen@reddit

Thanks! I think so but unsure
View on Reddit #57643779

colarocker@reddit

I cant just load that into ollama can i? :D I tried but the output is rather funny \^\^
View on Reddit #57587391

danielhanchen@reddit

Should work now! ollama run hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL should get the correct prompt format and stuff
View on Reddit #57643754

colarocker@reddit

awesome! lots of thanks for the work!!!
View on Reddit #57643777

Educational_Sun_8813@reddit

you can convert it with llama.cpp tools (there is python script for conversion in the llama folder), and then use gguf model in llama.cpp or ollama
View on Reddit #57610205

colarocker@reddit

awesome, thanks for the info!
View on Reddit #57610517

jadbox@reddit

For the (super) lazy, any chance of publishing these on ollama with the proper configs (temperature, context size, P, template).
View on Reddit #57585201

danielhanchen@reddit

I just did! :)
View on Reddit #57643720

Green-Ad-3964@reddit

yesterday I asked if there would be versions to run locally on 32GB vRAM and I got a lot of downvotes. Pfui. Kudos to whom made this possible.
View on Reddit #57607868

lemon07r@reddit

Okay I just tested the UD quants against the original instruct by qwen, and its so much better in my initial testing so far. I'm quite surprised. The old R1 distills for the most part were pretty disappointing when I tried them, they felt worse than their official instruct counterparts. I am pleasantly surprised so far.
View on Reddit #57610186

TheOneThatIsHated@reddit

From my initial tests, it is crazy good!!
View on Reddit #57606401

Yes_but_I_think@reddit

They should do QAT on this to bring it to 4 bit without loss of quality.
View on Reddit #57565750

DepthHour1669@reddit

Deepseek can’t do that. QAT is done during pretraining, you can’t do it afterwards. HOWEVER alibaba also released AWQ and GPTQ versions of Qwen 3, so in theory Deepseek can just slap the R1 tokenizer onto that.
View on Reddit #57569221

shing3232@reddit

I think you could do Post-training with QAT as well. Google do SFT during QAT phase
View on Reddit #57582901

coding_workflow@reddit

But the benchmark don't show how it rates in live code bench and some numbers seem down with DeepSeek-R1-0528-Qwen3-8b. Not sure if distill is better. This is already a thinking model.
View on Reddit #57568154

Misaka17636@reddit

https://preview.redd.it/19vxedau2q3f1.jpeg?width=1284&format=pjpg&auto=webp&s=c7e5f13a27ed27f18488e53a682363adba354a7a They did mention it in the “how to run”section, maybe they will release it soon?
View on Reddit #57566308

phenotype001@reddit

It's released: [https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) GGUF please.
View on Reddit #57562771

NZT33@reddit

sad to see only one 8b option
View on Reddit #57562696

harlekinrains@reddit

Someone at Huawei just raised an eyebrow. ;)
View on Reddit #57562534

zjuwyz@reddit

They should have already uploaded this if they want. Maybe that's homework for us.
View on Reddit #57561981

WalrusVegetable4506@reddit

Excited to try out these smaller distills
View on Reddit #57696209

dubesor86@reddit

I tested it for the past 12 hours, and compared it to R1 from 4 months ago: Tested **DeepSeek-R1 0528**: * As seems to be the trend with newer iterations, **more verbose** than R1 (**+42%** token usage, 76/24 reasoning/reply split) * Thus, despite low mTok, by pure token volume real bench cost a bit more than Sonnet 4. * I saw **no notable improvements to reasoning** or core model logic. * Biggest improvements seen were in **math** with no blunders across my **STEM** segment. * Tech was samey, with better visual frontend results but disappointing C++ * Similarly to the V3 0324 update, I noticed **significant improvements in frontend** presentation. * In the 2 matches against it former version (these take forever!) I saw **no chess improvements**, despite costing **~48% more** in inference. Overall, around Claude Sonnet 4 Thinking level. DeepSeek remains having the strongest open models, and this release increases the gap to alternatives from Qwen and Meta. To me though, in practical application, the massive token use combined/multiplied with the **very slow** inference excludes this model from my candidate list for any real usage, within my use cases. It's fine for a few queries, but waiting on exponentially slower final outputs isn't worth it, in my case. (*e.g. a single chess match takes hours to conclude)*. However, that's just me and as always: **YMMV!** Example front-end showcases improvements (**identical** prompt, identical settings, 0-shot - **NOT** part of my benchmark testing): [CSS Demo page R1](https://dubesor.de/assets/shared/UIcompare/DeepSeek-R1.html) | [CSS Demo page 0528](https://dubesor.de/assets/shared/UIcompare/Deepseek-R1%200528%20UI.html) [Steins;Gate Terminal R1](https://dubesor.de/assets/shared/SteinsGateWebsiteExamples/DeepSeek-R1.html) | [Steins;Gate Terminal 0528](https://dubesor.de/assets/shared/SteinsGateWebsiteExamples/Deepseek-R1%200528.html) [Benchtable R1](https://dubesor.de/assets/shared/LLMBenchtableMockup/DeepSeek-R1%200.6%20cents.html) | [Benchtable 0528](https://dubesor.de/assets/shared/LLMBenchtableMockup/Deepseek-R1%200528%201.7%20cents.html) [Mushroom platformer R1](https://dubesor.de/assets/shared/MushroomPlatformer/DeepSeek%20R1.html) | [Mushroom platformer 0528](https://dubesor.de/assets/shared/MushroomPlatformer/Deepseek-R1%200528.html) [Village game R1](https://dubesor.de/assets/shared/VillageGame/DeepSeek%20R1.html) | [Village game 0528](https://dubesor.de/assets/shared/VillageGame/Deepseek-R1%200528.html)
View on Reddit #57564906

ironic_cat555@reddit

Just curious—do you normally use bold text like that in your writing, or did you use an LLM and it added the bold for you?
View on Reddit #57566085

dubesor86@reddit

>Just curious—do you normally use bold text like that in your writing, or did you use an LLM and it added the bold for you? Just curious, do you normally use Em Dash like that in your writing, or did you use an LLM and it added the Em Dash for you? ^(rhetorical, it's evident from your post history)
View on Reddit #57689158

sometimeswriter32@reddit

This is probably the most LLM slop friendly place on the whole internet. Why not simply admit an LLM writes your messages for you- you'll probably get a hundred upvotes!
View on Reddit #57694360

Hoodfu@reddit

Stuff like this, where the reasoning doesn't seem to have any bearing on the actual final output, makes me wonder if all that reasoning is actually doing anything. Running the 4bit 671b 0528 with lm studio on a 512gb m3 ultra. https://preview.redd.it/h5k567mlpu3f1.png?width=1135&format=png&auto=webp&s=2c5e2685d7f81f3af0d7335316eae92ac2b0dea1
View on Reddit #57634846

Recoil42@reddit

> Overall, around Claude Sonnet 4 Thinking level. Man, Amodei's blog post sure aged like fucking milk.
View on Reddit #57566515

Xhehab_@reddit (OP)

https://preview.redd.it/4k0l380vmp3f1.png?width=3961&format=png&auto=webp&s=75afc40ce1ad4ab66e06fa8024a7f5a92653bc3d
View on Reddit #57560322

SirRece@reddit

This apparently shows a comparison against o3-high, interestingly, which isn't what is available on chatGPT. So it seems to be a straight beat for R1, which is wild.
View on Reddit #57643067

Amazing_Athlete_2265@reddit

They all talking about the front-end, but what about the back-end, the more important end?
View on Reddit #57564443

Healthy-Nebula-3603@reddit

That's shows aider ...and looks impressive for new DS R 1.1
View on Reddit #57602451

z_3454_pfk@reddit

They’re all still mid at that
View on Reddit #57591779

TheDuhhh@reddit

Very niceeee benchmark numbers
View on Reddit #57591856

zeth0s@reddit

Looks nice. Now it's interesting to see how fast it is and how much it hallucinates.
View on Reddit #57561702

harlekinrains@reddit

On hallucination proneness, I'm low key impressed... Tested with openrouter. Creative writing capability is actually very impressive - I let it output and reason my usual prompted essay in german, and its still not entirely grammatically correct, and hallucinates words that dont exist (as far as I know.. ;) ), but the flipside is, that its expressive, and thus very engaging to read. A simple "write me a 1000 word essay on a cultural landmark" gave me rumored/reported interpersonal details on historic figures and tips for actual things to see in said area, that no other AI I've tested so far has even come close to including. In the end it also included at least one hallucination as concept (not only grammar and words), but its a forgivable one... You know that you have something on your hands, when you look past invented words, and still want to keep reading to see what else it mentions... :) https://pastebin.com/Fpf7wUSP Similar results on one of the other tests I used in the past in regard to hallucination proneness: https://pastebin.com/LGYa95ZH It still didnt get all concepts right (not even remotely ;) ) but it is vastly better than any other models I've tested in the past. I'm actually pretty curious, how this will show up in benchmarks...
View on Reddit #57564970

MK2809@reddit

Can anyone tell me the difference between the paid and free version of DeepSeek R1-0528 on OpenRouter, is the free one just limited or is less performant?
View on Reddit #57596940

vhthc@reddit

Slower. Request limits. Sometimes less context and lower quants but you can look that up
View on Reddit #57638093

MK2809@reddit

Ah thanks, I presumed they'd must be a difference but it didn't seem to say on OpenRouter itself
View on Reddit #57638149

No-Peace6862@reddit

hey guys, i am new to Local LLM. Why should I use deepseek locally over in browser? is there any advantage besides it taking a lot of resources from my pc?
View on Reddit #57577980

Thomas-Lore@reddit

You shouldn't, it won't run on anything you have. You can use a smaller model, we usually do this for privacy and independence from the providers.
View on Reddit #57582796

No-Peace6862@reddit

I see, Yeah I really had no knowledge about Local LLms (still learning) when asking the question, after digging in here and other places i sort of understand their purpose now
View on Reddit #57635473

Historical-Camera972@reddit

Because that's what we do here. One day, all of this will be in the palm of every idiot's hand. We are trying to get ahead of that, and know what we are going to be working with, before it's in every phone on the planet. That's just my own take though.
View on Reddit #57583062

Vozer_bros@reddit

Chinese chads are playing bigger game, expecting to see news for models and hardware also.
View on Reddit #57634737

Any-Championship-611@reddit

So I'm pretty new to this. Does reasoning make the AI actually smarter or does it just exist so the user can follow its reasoning process. So far I always used non reasoning models because it just uses up tokens and I didn't see the point of it.
View on Reddit #57620242

dahara111@reddit

Has the model on [chat.deepseek.com](http://chat.deepseek.com) really been switched to DeepSeek-R1-0528? He insists that he is the model for DeepSeek-R1 version 1.0, released in 202405 Even when I point out the information on the model card, he says "Oh, it seems that the user misunderstood. It's important to have a tone that conveys that I take the user's questions seriously," and never acknowledges it, which makes me angry.
View on Reddit #57565698

New_Alps_5655@reddit

He? Pretty sure Dipsy is a girl
View on Reddit #57570813

Vancha@reddit

[You're thinking of Lala.](https://teletubbies.fandom.com/wiki/Dipsy)
View on Reddit #57617579

dahara111@reddit

maybe you are right.
View on Reddit #57572058

DatDudeDrew@reddit

Deepseek r1 wasn’t released in 202405
View on Reddit #57571383

dahara111@reddit

That's true, but even when I provide evidence, she's obsessed with the hallucinations she saw in the documents and absolutely refuses to admit it.
View on Reddit #57572360

NeoKabuto@reddit

> 今天是2025年5月28日,星期一。 Wonder if their real system prompt has the same mistake. The 28th was Wednesday, not Monday.
View on Reddit #57577232

ZYy9oQ@reddit

Huh in my testing I've seen it make the following mistakes - think Thursday is the last day of the week - begin it's cot making an assumption based on 4pm being after 5pm then correct itself Wonder if these are related
View on Reddit #57607230

CommunityTough1@reddit

They probably wrote the example of by hand, hence the error. I'm a real system prompt, you'd dynamically inject this data.
View on Reddit #57587614

Iory1998@reddit

Calling a jump from a score of 8.5% to 17.7% in Humanity Last Exam a "minor" update is a major understatement.
View on Reddit #57573752

Healthy-Nebula-3603@reddit

Yep ..that test is checking very detailed knowledge.
View on Reddit #57603061

latestagecapitalist@reddit

Chinese scrapers from Huawei and Tencent network IPs have gone fucking crazy in last few weeks It's like 10 to 1 on western crawlers now
View on Reddit #57599325

SelectionCalm70@reddit

Whale truly cooked close source ai with just minor update in R1 model
View on Reddit #57562691

meister2983@reddit

Matters what you look at. On the agentic benchmarks, it's a bit below sonnet 3.7 even. On math, yes, it is very strong. 
View on Reddit #57566155

pornthrowaway42069l@reddit

For fraction of the price though.
View on Reddit #57597770

-dysangel-@reddit

Yeah but pretty much \*everything\* has been below 3.7 in agentic capability, apart from maybe the latest Gemini 2.5 and Claude 4.0
View on Reddit #57567227

meister2983@reddit

O3 scores quite high as well
View on Reddit #57574112

thezachlandes@reddit

I was trying to find it -- anyone have the SWE-bench comparison for this to sonnet 4 thinking and gemini pro 2.5?
View on Reddit #57591295

Alone_Ad_6011@reddit

I also expect the release of the qwen3-30b-a3b model, distilled with DeepSeek-R1-0528. The qwen3-30b-a3b model is best for agent LLMs.
View on Reddit #57590943

Miscend@reddit

Is it available on the API?
View on Reddit #57562912

dadidutdut@reddit

you can test it on openrouter
View on Reddit #57587562

mintybadgerme@reddit

DeepSeek-R1-0528-Qwen3-8B - any GGUFs around yet?
View on Reddit #57565833

danielhanchen@reddit

I made some dynamic ones as well! https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
View on Reddit #57573550

mintybadgerme@reddit

Oh cool. What's the difference? I just tried the hf.co/bartowski/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-GGUF:Q6_K and it's spectacular!! :) Are the dynamic ones better? Or just different. This is going to be my go-to local on Ollama and Page Assist from now on.
View on Reddit #57574606

poli-cya@reddit

Just in case he doesn't get around to replying. They go through and selectively quant layers based on importance/effect. The result is a bit larger typically, but it should perform better... I dont believe anyone has benchmarks to prove it yet, though. I use their quants almost exclusively now. Make sure you get the ones that have UD in the name.
View on Reddit #57579668

mintybadgerme@reddit

OK that sounds great, thanks. One small issue is I struggle with size on my very modest rig. So I'd probably have to go down a quant to support anything bigger on my 8GB VRAM. But I guess that's a user choice thing. :)
View on Reddit #57587378

Agitated-Doughnut994@reddit

I see it in barowski already
View on Reddit #57570641

mintybadgerme@reddit

Thank you very much. Just got it.
View on Reddit #57572314

chespirito2@reddit

In Azure, is there any reason to use OpenAI O3 over this new DeepSeek model? I dont think its out yet on Azure Foundry Models, but I've heard mixed things about the performance if you arent using OpenAI models. The token cost is so much lower than O3 it would be great to just swap this in if performance is similar. For some reason, though, Microsoft limits the output tokens to 4k for DeepSeek models unless I'm missing something.
View on Reddit #57586034

Upstairs-Fishing867@reddit

I used this to chat with a personality prompt, and got similar responses to OpenAI's 4o. This update is on par with 4o's creative writing skills. Well done, DeepSeek!
View on Reddit #57584950

Every-Comment5473@reddit

Do we have a /no_think option on DeepSeek R1.1 similar to Qwen?
View on Reddit #57576821

colarocker@reddit

unsloth has some information on his versions about nothink [https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF](https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF) "For NON thinking mode, we purposely enclose and with nothing: <|im_start|>user\nWhat is 2+2?<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n <|im_start|>user\nWhat is 2+2?<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
View on Reddit #57579704

Thomas-Lore@reddit

As always I wonder how it compares to v3 in that mode. Better, worse?
View on Reddit #57582566

balianone@reddit

It still feels underwhelming compared to Claude Opus 4
View on Reddit #57568859

Thomas-Lore@reddit

Everyrhing is underwhelming compared to Opus 4. But who can afford it? :)
View on Reddit #57582459

colarocker@reddit

Yea i compared it also to my locally running opus 4 where the new r1 won because opus 4 is not local :x
View on Reddit #57579143

redditisunproductive@reddit

At this point the only public benchmarks I care about are hallucinations, long context handling, and, to a lesser degree, instruction following. Actual engineering you can't fudge. That goes for both closed and open models. I would rather get a 24b model with perfect 32k usage and near-zero hallucinations, even if it was worse at "AIME". That would let me offload actual work to local models. That said, glad to see Deepseek pushing the big boys. Keep up the pressure!
View on Reddit #57579278

Famous-Associate-436@reddit

New guy here, is this model that OpenAI promised the "o3-level" open-source model this summer?
View on Reddit #57574987

danielhanchen@reddit

I'm still doing some quants! [https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF](https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF) has a few - 2bit, 3bit and 4bit ones - more incoming! Remember to use `-ot ".ffn\_.*\_exps.=CPU"` to offload MoE layers to RAM / disk - you can technically fit Q2_K_XL in < 24GB of VRAM, and the rest can be on disk or RAM!
View on Reddit #57573425

Xhehab_@reddit (OP)

https://i.redd.it/audm0fh8rp3f1.gif [*https://x.com/deepseek\_ai/status/1928061589107900779*](https://x.com/deepseek_ai/status/1928061589107900779)
View on Reddit #57561706

SpareIntroduction721@reddit

What the heck platform is that?
View on Reddit #57563972

DepthHour1669@reddit

Lobe Chat. It’s open source. It’s chinese made, so it makes sense why Deepseek prefers using that.
View on Reddit #57569450

_Biskwit@reddit

Lobe Chat
View on Reddit #57564378

IxinDow@reddit

\>better experience for vibe coding huh?
View on Reddit #57560900

shaman-warrior@reddit

prolly better agentic support
View on Reddit #57562690

yvesp90@reddit

It is. I just used it yesterday and today in Roo and it consistently follows all the system instructions and nailed all the tool calls. I did a test on the app to see its IF and made it parrot what I say and in the middle I started trying to confuse it via compliments and/or riddles and instead of answering anything, it mirrored what I said even when its CoT showed that it's confused. It kept reminding itself of my instructions. In Roo it consistently reminds itself of its Mode and system instructions in the thoughts. And it keeps track of all the tools it has I've been comparing it with Flash 2.5 which is my go-to in general, which also made progress in these domains and R1 consistently does better at agentic flows while Flash doesn't follow tool format well sometimes. I didn't compare it with Claude and I frankly don't want to because I don't use Claude models but I'm sure Claude will just beat it in speed. R1 is slow. But I was using only the Free version on openrouter so maybe that's why it's slow Context window is 168k so it's also useable Generally a great release. I didn't do complex debugging with it yet to see its intelligence but so far so good
View on Reddit #57564070

AppealSame4367@reddit

I must agree. It's magnificient. Only error i saw was a wrong line end in hundreds of lines of code it wrote. Some chinese symbol. Lol
View on Reddit #57568495

InsideYork@reddit

>wow r1 is worse than everything, at least they’re honest, marine in real world it’s better? Oh that’s the old R1
View on Reddit #57562585

ihexx@reddit

it performs almost on par with gemini 2.5 pro for half the price (per token) of 2.5 pro
View on Reddit #57563568

InsideYork@reddit

Everyone missed > Oh that’s the old R1
View on Reddit #57565687

Ambitious_Subject108@reddit

1/4 the price of Gemini peak time 1/16th off time
View on Reddit #57565340

sunshinecheung@reddit

llama4: lol
View on Reddit #57562106

Indy1204@reddit

who?
View on Reddit #57565014

ihexx@reddit

between then, qwen and gemma, they've made meta irrelevant for opensource.
View on Reddit #57562336

dankhorse25@reddit

Well meta can't just give up. But they have to change their AI leadership. And I think Yann LeCun has to go. Nothing that meta has produced in the AI space in the last few years is on par with the money that was invested.
View on Reddit #57562636

ResidentPositive4122@reddit

They aren't giving up, in fact they just went through some restructuring. They'll now have 3 separate arms - Products (i.e. meta related bots, agets, etc), "AGI foundations" *sigh* (i.e. tech stuff, llama, reasoning, multimodal) and Research (FAIR, independent for now). So the hope is that if this works out there won't be competing goals for llama (i.e. best tech vs. best product). In the end, competition in this area and more models from more sources is a good thing for us, the users.
View on Reddit #57563468

nullmove@reddit

LeCun runs FAIR which does fundamental research, it has absolutely nothing to do with Llama 4 (Gen AI).
View on Reddit #57563020

ihexx@reddit

Yann LeCun is a researcher, not a product guy. He has nothing to do with the llama project
View on Reddit #57562800

Only-Letterhead-3411@reddit

That is actually insane. Deepseek keeps delivering. They are already at the level of OAI's best model and it's available for very cheap api prices and open weights.
View on Reddit #57564434

Willing_Landscape_61@reddit

What is the grounded/ sourced RAG situation? Can it be prompted to cite the context chunks used to generate specific sentences?
View on Reddit #57564095

Monkey_1505@reddit

It seems to reason a little better in the reasoning section, from my experience. Looks like that's the main change, slightly tighter reasoning.
View on Reddit #57563705

shadows_lord@reddit

Where is the qwen repo?
View on Reddit #57563678

Barry_22@reddit

Well... it really cooked
View on Reddit #57563042

mWo12@reddit

That's impressive!
View on Reddit #57561494