Como puedo lograr que no piense o que sea menos extenso?
Thought for 24 minutes 16 seconds
Este es el prompt:
Write a Python program that shows 20 balls bouncing inside a spinning heptagon: All balls have the same radius. All balls have a number on it from 1 to 20. All balls drop from the heptagon center when starting. The balls should be affected by gravity and friction, and they must bounce off the rotating walls realistically. There should also be collisions between balls. The material of all the balls determines that their impact bounce height will not exceed the radius of the heptagon, but higher than ball radius. All balls rotate with friction, the numbers on the ball can be used to indicate the spin of the ball. The heptagon is spinning around its center, and the speed of spinning is 360 degrees per 5 seconds. The heptagon size should be large enough to contain all the balls. Do not use the pygame library; implement collision detection algorithms and collision response etc. by yourself. The following Python libraries are allowed: tkinter, math, numpy, dataclasses, typing, sys. All codes should be put in a single Python file.
Never say never. Better ai, enables better optimization, enables better ai. Seems like the progress in llms optimization is even speeding up in the last weeks.
[https://www.reddit.com/r/MachineLearning/comments/1kx3ve1/r\_new\_icml25\_paper\_train\_and\_finetune\_large/](https://www.reddit.com/r/MachineLearning/comments/1kx3ve1/r_new_icml25_paper_train_and_finetune_large/)
I mean, you can run it on what you have now, as long as you have disk space. It will be tens of seconds to minutes per token, and a response might take days, but it runs.
If you want a fast, fluent response and high / original quant, like the online service(s), we're talking magnitude $100.000 - and most likely some re-wiring of your house electrical.
Between those there's a sliding scale, with various tradeoffs. If you're okay with low quants and 1-4 token a second, then you "just" need a machine with ~150-200gb ram, and preferably a 16+ gb graphics card for main layers.
yes but won't work, but ollama released a new update two days ago where one can use /set think and /set nothink, which works with the new r1/qwen3 model.
Agreed. 30B is smart.
I found it was rambling way too much to be useful for running in Roo, but then I remembered that you can turn off thinking. So to anyone else thinking of trying it out, just append /no\_think to the model's system prompt and it seems to me to be the best all rounder open source model for local coding, with a large context window and good TTFT.
I'm looking forward to at some point trying out R1-0528 or V3-0324 with carefully managed system prompts/context. Not sure if yet RooCode's custom agents will be enough, or if I'll have to manually tweak Copilot when it's finally open sourced.
You seem pretty immersed and knowledgeable so I would be curious to hear what your experience is with the GGUF mentioned by danigoncalves. Would appreciate it but I understand if I/we don’t hear from you.
I did try the 8B distilled version earlier today. Not sure if it was the bartowski version, but I ran it through my usual "build tetris in a single html page" test. It had some syntax errors, so I gave it a few shots at debugging, then just deleted it when it failed.
I just tried the same thing with standard Qwen3 8B and the behaviour was the same - it's first attempt was buggy, and it wasn't able to fix the bug after a few tries. Iirc Qwen2.5 7B Coder was better at this test, though it was not consistent.
The Qwen 3 series have good aesthetics and are pleasant to chat to, including the 8B model. I expect it might be decent at front end design if that's important for you. I'm really looking forward to if/when they bring out the Qwen3 Coder series
Paging u/_sqrkl
Any chance we could get a few benchmarks of the new 8B distill to see how it holds up against the qwen instruct? The distill is trained from base qwen so it would be interesting to see who trained base qwen 8b better. I remember the old R1 distills werent actually very good in actual, and just benchmarked well in a few benchmarks. I kinda trust your leaderboard more than these first party results.
Yeah I was super impressed, and I'm usually quite skeptical, not really that easily bought into hype. I remember not liking any of the old R1 distills at all. Glad we were able to confirm with your tests that it wasnt just lucky output.
So I'm pretty new to this. Does reasoning make the AI actually smarter or does it just exist so the user can follow its reasoning process.
So far I always used non reasoning models because it just uses up tokens and I didn't see the point of it.
> Does reasoning make the AI actually smarter
This is still up for debate, I think. What's clear is that performance on easily verifiable tasks increase (math, code, etc). What's not clear is how / why it works. I've seen a recent paper that put semi-random stuff in the "thinking" part, and still saw improvements in the final scores, so there's probably more research to be done in this area.
I made some dynamic quants for Qwen 3 distilled here https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
I'm extremely surprised DeepSeek would provide smaller distilled versions - hats off to them!
Okay I just tested the UD quants against the original instruct by qwen, and its so much better in my initial testing so far. I'm quite surprised. The old R1 distills for the most part were pretty disappointing when I tried them, they felt worse than their official instruct counterparts. I am pleasantly surprised so far.
Deepseek can’t do that. QAT is done during pretraining, you can’t do it afterwards.
HOWEVER alibaba also released AWQ and GPTQ versions of Qwen 3, so in theory Deepseek can just slap the R1 tokenizer onto that.
But the benchmark don't show how it rates in live code bench and some numbers seem down with DeepSeek-R1-0528-Qwen3-8b. Not sure if distill is better. This is already a thinking model.
https://preview.redd.it/19vxedau2q3f1.jpeg?width=1284&format=pjpg&auto=webp&s=c7e5f13a27ed27f18488e53a682363adba354a7a
They did mention it in the “how to run”section, maybe they will release it soon?
I tested it for the past 12 hours, and compared it to R1 from 4 months ago:
Tested **DeepSeek-R1 0528**:
* As seems to be the trend with newer iterations, **more verbose** than R1 (**+42%** token usage, 76/24 reasoning/reply split)
* Thus, despite low mTok, by pure token volume real bench cost a bit more than Sonnet 4.
* I saw **no notable improvements to reasoning** or core model logic.
* Biggest improvements seen were in **math** with no blunders across my **STEM** segment.
* Tech was samey, with better visual frontend results but disappointing C++
* Similarly to the V3 0324 update, I noticed **significant improvements in frontend** presentation.
* In the 2 matches against it former version (these take forever!) I saw **no chess improvements**, despite costing **~48% more** in inference.
Overall, around Claude Sonnet 4 Thinking level.
DeepSeek remains having the strongest open models, and this release increases the gap to alternatives from Qwen and Meta.
To me though, in practical application, the massive token use combined/multiplied with the **very slow** inference excludes this model from my candidate list for any real usage, within my use cases. It's fine for a few queries, but waiting on exponentially slower final outputs isn't worth it, in my case. (*e.g. a single chess match takes hours to conclude)*.
However, that's just me and as always: **YMMV!**
Example front-end showcases improvements (**identical** prompt, identical settings, 0-shot - **NOT** part of my benchmark testing):
[CSS Demo page R1](https://dubesor.de/assets/shared/UIcompare/DeepSeek-R1.html) | [CSS Demo page 0528](https://dubesor.de/assets/shared/UIcompare/Deepseek-R1%200528%20UI.html)
[Steins;Gate Terminal R1](https://dubesor.de/assets/shared/SteinsGateWebsiteExamples/DeepSeek-R1.html) | [Steins;Gate Terminal 0528](https://dubesor.de/assets/shared/SteinsGateWebsiteExamples/Deepseek-R1%200528.html)
[Benchtable R1](https://dubesor.de/assets/shared/LLMBenchtableMockup/DeepSeek-R1%200.6%20cents.html) | [Benchtable 0528](https://dubesor.de/assets/shared/LLMBenchtableMockup/Deepseek-R1%200528%201.7%20cents.html)
[Mushroom platformer R1](https://dubesor.de/assets/shared/MushroomPlatformer/DeepSeek%20R1.html) | [Mushroom platformer 0528](https://dubesor.de/assets/shared/MushroomPlatformer/Deepseek-R1%200528.html)
[Village game R1](https://dubesor.de/assets/shared/VillageGame/DeepSeek%20R1.html) | [Village game 0528](https://dubesor.de/assets/shared/VillageGame/Deepseek-R1%200528.html)
>Just curious—do you normally use bold text like that in your writing, or did you use an LLM and it added the bold for you?
Just curious, do you normally use Em Dash like that in your writing, or did you use an LLM and it added the Em Dash for you?
^(rhetorical, it's evident from your post history)
This is probably the most LLM slop friendly place on the whole internet.
Why not simply admit an LLM writes your messages for you- you'll probably get a hundred upvotes!
Stuff like this, where the reasoning doesn't seem to have any bearing on the actual final output, makes me wonder if all that reasoning is actually doing anything. Running the 4bit 671b 0528 with lm studio on a 512gb m3 ultra.
https://preview.redd.it/h5k567mlpu3f1.png?width=1135&format=png&auto=webp&s=2c5e2685d7f81f3af0d7335316eae92ac2b0dea1
This apparently shows a comparison against o3-high, interestingly, which isn't what is available on chatGPT. So it seems to be a straight beat for R1, which is wild.
On hallucination proneness, I'm low key impressed...
Tested with openrouter.
Creative writing capability is actually very impressive - I let it output and reason my usual prompted essay in german, and its still not entirely grammatically correct, and hallucinates words that dont exist (as far as I know.. ;) ), but the flipside is, that its expressive, and thus very engaging to read.
A simple "write me a 1000 word essay on a cultural landmark" gave me rumored/reported interpersonal details on historic figures and tips for actual things to see in said area, that no other AI I've tested so far has even come close to including. In the end it also included at least one hallucination as concept (not only grammar and words), but its a forgivable one...
You know that you have something on your hands, when you look past invented words, and still want to keep reading to see what else it mentions... :)
https://pastebin.com/Fpf7wUSP
Similar results on one of the other tests I used in the past in regard to hallucination proneness:
https://pastebin.com/LGYa95ZH
It still didnt get all concepts right (not even remotely ;) ) but it is vastly better than any other models I've tested in the past.
I'm actually pretty curious, how this will show up in benchmarks...
Can anyone tell me the difference between the paid and free version of DeepSeek R1-0528 on OpenRouter, is the free one just limited or is less performant?
hey guys, i am new to Local LLM. Why should I use deepseek locally over in browser? is there any advantage besides it taking a lot of resources from my pc?
I see,
Yeah I really had no knowledge about Local LLms (still learning) when asking the question,
after digging in here and other places i sort of understand their purpose now
Because that's what we do here.
One day, all of this will be in the palm of every idiot's hand.
We are trying to get ahead of that, and know what we are going to be working with, before it's in every phone on the planet.
That's just my own take though.
So I'm pretty new to this. Does reasoning make the AI actually smarter or does it just exist so the user can follow its reasoning process.
So far I always used non reasoning models because it just uses up tokens and I didn't see the point of it.
Has the model on [chat.deepseek.com](http://chat.deepseek.com) really been switched to DeepSeek-R1-0528?
He insists that he is the model for DeepSeek-R1 version 1.0, released in 202405
Even when I point out the information on the model card, he says "Oh, it seems that the user misunderstood. It's important to have a tone that conveys that I take the user's questions seriously," and never acknowledges it, which makes me angry.
Huh in my testing I've seen it make the following mistakes
- think Thursday is the last day of the week
- begin it's cot making an assumption based on 4pm being after 5pm then correct itself
Wonder if these are related
Oh cool. What's the difference? I just tried the hf.co/bartowski/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-GGUF:Q6_K and it's spectacular!! :) Are the dynamic ones better? Or just different. This is going to be my go-to local on Ollama and Page Assist from now on.
Just in case he doesn't get around to replying. They go through and selectively quant layers based on importance/effect. The result is a bit larger typically, but it should perform better... I dont believe anyone has benchmarks to prove it yet, though. I use their quants almost exclusively now. Make sure you get the ones that have UD in the name.
OK that sounds great, thanks. One small issue is I struggle with size on my very modest rig. So I'd probably have to go down a quant to support anything bigger on my 8GB VRAM. But I guess that's a user choice thing. :)
In Azure, is there any reason to use OpenAI O3 over this new DeepSeek model? I dont think its out yet on Azure Foundry Models, but I've heard mixed things about the performance if you arent using OpenAI models. The token cost is so much lower than O3 it would be great to just swap this in if performance is similar.
For some reason, though, Microsoft limits the output tokens to 4k for DeepSeek models unless I'm missing something.
I used this to chat with a personality prompt, and got similar responses to OpenAI's 4o. This update is on par with 4o's creative writing skills. Well done, DeepSeek!
unsloth has some information on his versions about nothink [https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF](https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF) "For NON thinking mode, we purposely enclose and with nothing:
<|im_start|>user\nWhat is 2+2?<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n
<|im_start|>user\nWhat is 2+2?<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
At this point the only public benchmarks I care about are hallucinations, long context handling, and, to a lesser degree, instruction following. Actual engineering you can't fudge. That goes for both closed and open models.
I would rather get a 24b model with perfect 32k usage and near-zero hallucinations, even if it was worse at "AIME". That would let me offload actual work to local models.
That said, glad to see Deepseek pushing the big boys. Keep up the pressure!
I'm still doing some quants! [https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF](https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF) has a few - 2bit, 3bit and 4bit ones - more incoming!
Remember to use `-ot ".ffn\_.*\_exps.=CPU"` to offload MoE layers to RAM / disk - you can technically fit Q2_K_XL in < 24GB of VRAM, and the rest can be on disk or RAM!
It is. I just used it yesterday and today in Roo and it consistently follows all the system instructions and nailed all the tool calls. I did a test on the app to see its IF and made it parrot what I say and in the middle I started trying to confuse it via compliments and/or riddles and instead of answering anything, it mirrored what I said even when its CoT showed that it's confused. It kept reminding itself of my instructions. In Roo it consistently reminds itself of its Mode and system instructions in the thoughts. And it keeps track of all the tools it has
I've been comparing it with Flash 2.5 which is my go-to in general, which also made progress in these domains and R1 consistently does better at agentic flows while Flash doesn't follow tool format well sometimes. I didn't compare it with Claude and I frankly don't want to because I don't use Claude models but I'm sure Claude will just beat it in speed. R1 is slow. But I was using only the Free version on openrouter so maybe that's why it's slow
Context window is 168k so it's also useable
Generally a great release. I didn't do complex debugging with it yet to see its intelligence but so far so good
Well meta can't just give up. But they have to change their AI leadership. And I think Yann LeCun has to go. Nothing that meta has produced in the AI space in the last few years is on par with the money that was invested.
They aren't giving up, in fact they just went through some restructuring. They'll now have 3 separate arms - Products (i.e. meta related bots, agets, etc), "AGI foundations" *sigh* (i.e. tech stuff, llama, reasoning, multimodal) and Research (FAIR, independent for now). So the hope is that if this works out there won't be competing goals for llama (i.e. best tech vs. best product).
In the end, competition in this area and more models from more sources is a good thing for us, the users.
That is actually insane. Deepseek keeps delivering. They are already at the level of OAI's best model and it's available for very cheap api prices and open weights.
155 Comments
MaskedSaqib@reddit
bjivanovich@reddit
cvjcvj2@reddit
GhostGhazi@reddit
teachersecret@reddit
GhostGhazi@reddit
teachersecret@reddit
TheOneThatIsHated@reddit
AppealSame4367@reddit
TheLieAndTruth@reddit
AppealSame4367@reddit
mi_throwaway3@reddit
TheTerrasque@reddit
mi_throwaway3@reddit
TheTerrasque@reddit
mi_throwaway3@reddit
ResidentPositive4122@reddit
ASTRdeca@reddit
colarocker@reddit
Sylanthus@reddit
colarocker@reddit
phenotype001@reddit
usernameplshere@reddit
-dysangel-@reddit
hacktheplanet_blog@reddit
-dysangel-@reddit
Ambitious-Most4485@reddit
danigoncalves@reddit
giant3@reddit
BlueSwordM@reddit
poli-cya@reddit
danigoncalves@reddit
Any_Pressure4251@reddit
ResidentPositive4122@reddit
DepthHour1669@reddit
phenotype001@reddit
lordpuddingcup@reddit
LoSboccacc@reddit
jadbox@reddit
TerminalNoop@reddit
lemon07r@reddit
_sqrkl@reddit
lemon07r@reddit
Any-Championship-611@reddit
ResidentPositive4122@reddit
danielhanchen@reddit
dadidutdut@reddit
danielhanchen@reddit
colarocker@reddit
danielhanchen@reddit
colarocker@reddit
Educational_Sun_8813@reddit
colarocker@reddit
jadbox@reddit
danielhanchen@reddit
Green-Ad-3964@reddit
lemon07r@reddit
TheOneThatIsHated@reddit
Yes_but_I_think@reddit
DepthHour1669@reddit
shing3232@reddit
coding_workflow@reddit
Misaka17636@reddit
phenotype001@reddit
NZT33@reddit
harlekinrains@reddit
zjuwyz@reddit
WalrusVegetable4506@reddit
dubesor86@reddit
ironic_cat555@reddit
dubesor86@reddit
sometimeswriter32@reddit
Hoodfu@reddit
Recoil42@reddit
Xhehab_@reddit (OP)
SirRece@reddit
Amazing_Athlete_2265@reddit
Healthy-Nebula-3603@reddit
z_3454_pfk@reddit
TheDuhhh@reddit
zeth0s@reddit
harlekinrains@reddit
MK2809@reddit
vhthc@reddit
MK2809@reddit
No-Peace6862@reddit
Thomas-Lore@reddit
No-Peace6862@reddit
Historical-Camera972@reddit
Vozer_bros@reddit
Any-Championship-611@reddit
dahara111@reddit
New_Alps_5655@reddit
Vancha@reddit
dahara111@reddit
DatDudeDrew@reddit
dahara111@reddit
NeoKabuto@reddit
ZYy9oQ@reddit
CommunityTough1@reddit
Iory1998@reddit
Healthy-Nebula-3603@reddit
latestagecapitalist@reddit
SelectionCalm70@reddit
meister2983@reddit
pornthrowaway42069l@reddit
-dysangel-@reddit
meister2983@reddit
thezachlandes@reddit
Alone_Ad_6011@reddit
Miscend@reddit
dadidutdut@reddit
mintybadgerme@reddit
danielhanchen@reddit
mintybadgerme@reddit
poli-cya@reddit
mintybadgerme@reddit
Agitated-Doughnut994@reddit
mintybadgerme@reddit
chespirito2@reddit
Upstairs-Fishing867@reddit
Every-Comment5473@reddit
colarocker@reddit
Thomas-Lore@reddit
balianone@reddit
Thomas-Lore@reddit
colarocker@reddit
redditisunproductive@reddit
Famous-Associate-436@reddit
danielhanchen@reddit
Xhehab_@reddit (OP)
SpareIntroduction721@reddit
DepthHour1669@reddit
_Biskwit@reddit
IxinDow@reddit
shaman-warrior@reddit
yvesp90@reddit
AppealSame4367@reddit
InsideYork@reddit
ihexx@reddit
InsideYork@reddit
Ambitious_Subject108@reddit
sunshinecheung@reddit
Indy1204@reddit
ihexx@reddit
dankhorse25@reddit
ResidentPositive4122@reddit
nullmove@reddit
ihexx@reddit
Only-Letterhead-3411@reddit
Willing_Landscape_61@reddit
Monkey_1505@reddit
shadows_lord@reddit
Barry_22@reddit
mWo12@reddit