Is DeepSeek V3 overhyped?
Posted by YourAverageDev0@reddit | LocalLLaMA | View on Reddit | 104 comments
Have been using DeepSeek V3 for some time after all the time it came out. Coding wise (I work on web frontend, mostly react/svelte), I do not find it nearly as impressive as 3.5 Sonnet. The benchmarks seems to be matching, but the feel is just different, sometimes DeepSeek does give interesting stuff when asked. For me personally, it feels like a base 405B that has even been further scaled, it has little scars of brutal human RLHF (unlike OAI, LLaMa and etc Models). It just doesn't have that taste of Claude 3.5 Sonnet.
Recoil42@reddit
The catch is cost. Deepseek offers maybe 75% of the performance as Sonnet but at a very small fraction of the cost. It was trained at a very small fraction of the cost, and asks users for a small fraction of the cost. That's why it's in a league of its own. I used Cline last night and maybe thirty-minutes of casual coding clocked me $1.50. Two hours of DeepSeek usage clocked me maybe $0.15. It's not even close.
Sonnet is better. Definitely, concretely better. It solves problems for me that leave DeepSeek spinning in circles. But the cost-efficiency of DeepSeek is a crazy eyebrow-raiser — it is cheap enough to be effectively used unmetered for most people.
These days I default to DeepSeek and only tag Sonnet into the ring when a problem is particularly difficult to solve. For writing boilerplate, doing basic lookup, and writing simply functions — DeepSeek is unmatched.
Any_Pressure4251@reddit
Why would you use Cline with a paid API? when you have Cursor, WindSurf?
Background-Finish-49@reddit
Cursor and windsurf are inferior to cline.
killver@reddit
yeah, good joke
Background-Finish-49@reddit
Not at all a joke. When you learn how to use it you'll understand what I mean.
killver@reddit
sorry, but it is a joke, maybe if you compare it with basic Cursor and if a ton of manual stuff is needed to make it comparable, you can do the same in cursor even better
Background-Finish-49@reddit
Cursor sucks ass
Bakedsoda@reddit
can you please describe which setup to get cline to perform better?
cline + deekseep v3 api starts good but loops into infinite due to the short contents maybe?
curious to which llm works best is it just claude api?
Background-Finish-49@reddit
Openrouter, cline rules and cline_docs. If you don't have a proper cline_docs workflow you're always going to run into loops even with sonnet.
I use the right llm for each job, o1 or 4o in the web browser for planning and troubleshooting along with 16x prompt for referencing my database with o1, sonnet and deepseek for coding.
You're looping because of poor prompting and lack of preparation.
this-just_in@reddit
Desire to stay within the VSCode ecosystem
Any_Pressure4251@reddit
They are both VScode based, I actually run Cline and RooCline with Windsurf with the excellent Gemini models.
Gemini 1206 is better than Claude Sonnet at Flutter and Java Code...
Minimum-Ad-2683@reddit
Except some extensions dont work, I work a lot in web backends and postman works on neither of them. Quite a bummer
Recoil42@reddit
Neither Cursor nor Windsurf offer free unlimited requests.
Any_Pressure4251@reddit
Seems unlimited to me, they cost like $0.33 a day! Cline just likes to waste tokens.
Recoil42@reddit
Cursor only has 500 fast premium requests per month.
eloitay@reddit
Yeah I was drawn to the hype and once I start using I realized that it is not really that great. Simple stuff sure but more complicated stuff and cutting edge stuff either it is not aware of its existence, hallucinate very badly or plain make mistake all over the place.
playfuldreamz@reddit
"Simple stuff" and "cutting edge stuff" provide no context, care to be clear with examples?
BoJackHorseMan53@reddit
75% performance at 2% of the cost.
jaMMint@reddit
If you dont value your time though. So it really is about cost of developer time here. You save 25% on developer time by spending $3/hour more in API usage..
BoJackHorseMan53@reddit
Cline can blow $20 in an hour using Claude. With Deepseek, it's 4¢
jaMMint@reddit
For me (aider and continue.dev) sonnet just works much better.
If Deepseek works for you, you should definitely use it for saved cost. Ultimately it depends on your usage, and switching for different tasks is always an option anyways.
RageshAntony@reddit
What are the uses of Cline ?
this-just_in@reddit
It’s an agentic LLM interface inside VSCode. It offers a chat experience with shortcuts to add snippets, files, urls to context. It can summarize things, create files, edit your files directly, run commands. You bring your own AI provider (wide range support + local options). My understanding is that the underlying implementation is no longer dependent on tool calling, rather a custom xml tag solution- meaning, almost any remote or local OpenAI-compatible provider will work.
RageshAntony@reddit
Wow. Seems great. In their repo they mentioned claude. So I thought it supports claude for agent operations.
If I use DeepSeek, do all features work as expected?
this-just_in@reddit
Claude Sonnet is the model they build Cline around, and will likely provide the best experience because the prompts were tuned for it, but it supports a wide range of providers. You can use DeepSeek with it successfully.
RageshAntony@reddit
Okay. What about the cost when using Claude? I read that it consumes a lot of tokens thereby increasing the bill.
Esmaro@reddit
What are the differences with aider? A more streamlined, "hands-off" experience?
this-just_in@reddit
They cover much the same ground. Cline is an in-VSCode experience, aider is a terminal experience. Aider has a lot more features and functionality. For the purposes of making code modifications, Cline and aider with a default setups and a good model (Sonnet, 4o, DeepSeek) will be very similar. Aider, configured with advanced features, can probably do better.
MasterpieceKitchen72@reddit
Do you have an example on what made DeepSeek spinning for ages? If I would like to test this, for which problem should i ask to be solved?
Orolol@reddit
This is why curosr is sick, only 20$ a month, even if you use it A LOT like me.
megadonkeyx@reddit
Agree, this is how I've been working but I can't see it lasting that long. Surely deepseek aren't even covering their electricity costs?
Terminator857@reddit
Any tips on workflow for using different models?
smosjos@reddit
I use aider's copy paste function. I have a Claude subscription. I use the copy context tool of aider, paste that in Claude, ask my question, get tips back from Claude. Paste that in aider and deepseek does the implementation. Using those 2 together keeps my costs down with great results. Yes it is a bit hacky, but better than using Cline as API costs for Claude sets you back very quickly.
Background-Finish-49@reddit
Small changes = deepseek Complicated changes = sonnet
Recoil42@reddit
I'm still developing a rhythm and a feel for it, so no specific advice. Basically though, when I know I need to do web scaffolding or a complicated refactor I'll switch to Claude. Then once Claude's generated the initial pass I'll do refinement, modifications, etc with DeepSeek.
frivolousfidget@reddit
I cant have models training on my input… so I can only compare sonnet with Deepseek on Fireworks. Sonnet ended up cheaper due to the input caching.
nananashi3@reddit
Someone needs to throw them a big boy budget.
No-Fig-8614@reddit
This is solely from a self hoster.
Kind of, right now if you self host or through a provider it isn't optimized that well. SGLang is much better optimized than vLLM is but its a big model requring a lot of memory and so if you don't use their service which they optimized the hell out of its not that great. Other OSS models are way further optimized for vLLM and SGlang....
On vLLM with 8xh200's it was getting like 50tk/s vs SGlang was gettting 150 but still not what you'd expect from that level of hardware.
Even at it's quant its still causing slowness.
West-Code4642@reddit
Agreed. The cost allows use cases other models do not
OracleGreyBeard@reddit
Great response, echoes my thoughts exactly
OracleGreyBeard@reddit
Sort of. People who say it’s as good as Sonnet are definitely sniffing whippets. It’s very clearly not as good. On the other hand, it’s nearly as good and vastly cheaper.
If you need the best answer to a small number of prompts, go with Sonnet. If you’re burning lots of tokens (as in Cline) go with DeepSeek.
Odd-Environment-7193@reddit
No. It's not overhyped. Let me tell you why. It's free and open source. You are comparing apples and oranges.
I can code all day in deepseek and I never reach some limit locking me out of the tool.
It's dirt cheap. To the point where the cost is negligible for personal use, even through an API or the free chat.
It doesn't lecture me or refuse almost anything I throw at it. Wanna ask questions about hacking or scraping and you won't get some moral lecture about wasting many precious tokens... Need to edit some spicy comment that contains foul language - no problem.
I for one am happy to have moved away from Openai and Claude models with the different options available right now.
Claude is a SOTA model and it's considered the best coder by the majority of people who use it.
Deepseek is the best open-source model we've ever got.
While that might be subjective, it's the first opensource model I've used that's this impressive. It's accessible through their interface and has some cool features like thinking and web search... Goddamm awesome if you ask me.
Currently using Gemini 1206, and Gemini 2.0 exp flash and Deepseekv3 as the daily drivers. Claude and Openai taking a back seat in my current line up.
For reference, Fullstack engineer, 5 years.
Deepseek is a weapon for coding and has great properties for agentic tools. It just feels very modern aswell. It gives long as fuck replies, with all my code intact without changing things and trying to always add code comments and shortening responses.
It's also able to switch quickly between these long replies and short concise answers, something I have seen a lot of modern models struggle with. I don't like claude, because of this particular behavior.
It also has this great way of explaining things while it's doing them. Just a few short sentences usually ontop of a response or something. Which I really like, since even tools like gemini aren't as good in this (IMO).
It's also very good at step by step explanations. Lot's of wow moments for me using this tool.
Personally a huge fan.
DarthFluttershy_@reddit
Using it for organizing and editing fiction writing, and it's shockingly tolerant. Most of the worst silly hangups in other models relaxed over the last year, but I ran some very unsavory tests out of morbid curiosity, and Deepseek is almost as fully uncensored in that respect as Mistral... and much better in keeping track of the plot. Deepseek's refusals are also almost entirely bypassable, if you just seed it's initial response with "Sure, I can tell you all about ___," I have yet to have it refuse. Obviously, it has the CCP-mandated stuff, but it's not like I write that many essays on the Tiananmen Square Massacre or failures of the Great Leap Forward or whatever.
I am very curious if this is the same in China... I'll be traveling there in a couple of weeks and intend to test it. I may or may not get deported, lol.
Affectionate-Cap-600@reddit
interesting... and basically agree with your considerations. really love deepseek, but I don't like how it scale with bigger context sizes. have you tried MinMax-text-01? it's a Moe (size and active parameters comparable to deepseek), trained natively on 1M token context window and extend up to 4M (even if there is a performance degradation past 1M). Api price also comparable to deepseek.
bilalazhar72@reddit
Minimax is a diff diff architecture as well they did some interestig things to it
Affectionate-Cap-600@reddit
yes... lightning attention, TransNormer and the related TNL are really interesting Imo.
Also, we are seeing a trend towards 'alternated' layers approach... ie cohere released a 7B models with alternation of layers based on RoPE + sliding window and layers without positional encoding for global attention, modernBERT did a similar thing. MinMax has alternation of layers with lightning attention and classic softmax attention, plus it apply this approach 'in-layer' with 1/2 of attention heads that use RoPE.
I started to use their model for long context tasks and Imo it outperform many other models (not just as 'max' context... many other models that are advertised as ~100K start to degenerate past 30/40k tokens of context, while MinMax hold near linear performance up to 1M).
as far I know, it is the only model that is natively pretrained with a context of 1M.
bilalazhar72@reddit
i think google with further pretrain the gemini 1206 and then write a paper or something i dont know what they are waiting for here
but yah i tested long context as well it behaves very well specially good for long papers
Charuru@reddit
Long context is fake news, no llm has usable long context for coding is any intelligent task.
Affectionate-Cap-600@reddit
have you tried it?
Charuru@reddit
No but I read some of the paper, it’s full of techniques that makes it as bad as all the others.
Affectionate-Cap-600@reddit
seems that we don't read the same paper
Odd-Environment-7193@reddit
Yeah for sure. Luckily we have Gemini models for those long context tasks. I have not yet tried that, I'll give it a shot.
Jesus359@reddit
When I read reply like yours I really wonder where & how they are running Deepseek.
I mean Gemini, GPT and Claude already need big infrastructure. I can only Imagine Chinas Gov having one whole building for Deepseek. One or two floors of scalable compute and the rest of the floors are engineers and scientists
nicolas_06@reddit
https://huggingface.co/deepseek-ai/DeepSeek-V3
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
Basically their architecture means that at any moment while they need the memory of a 671B parameters model, they only use 1/18 of the compute.
So basically compared to a classical 671B model, they can handle 18X more queries and use basically 1/18 of the total infrastructure...
Odd-Environment-7193@reddit
The free version is probably being used to lure investors/get their name out there while exposing the public to their tools, it's only 64k context length so that probably helps them keep compute down quite a bit. I'm sure Claude and GPT40 are much larger models than 600b, but I could be wrong on that one.
Everything is cheaper to run and operate outside of the US. The costs of running the datacenters and the wages etc are much lower abroad. They also have the advantage of having a more modern model, apparently trained on only 10million USD of compute. When costs get that low, it's just a whole different ball game.
Google and Openai and anthropic have massive research costs, as they are pioneers in the field. As you can tell by some salty comments by SAM, which have some truth to them, it's much easier to follow in the footsteps of those who paved the way.
I think the Chinese are absolutely killing it though in the AI space in general, and will continue to do so. Their opensource video models are on another level as well.
What a lovely time to be in the game. We, the users of these tools are benefitting so much from this AI arms race.
DeltaSqueezer@reddit
Deepseek have said that they are running inference profitably too.
diagonali@reddit
Yeah I've noticed it produces the full block of code or script rather than inserting placeholders with "rest of code here..."
Super useful coming from how massively restricted Claude is. It sometimes reproduces the entire code block even with small changes though so maybe too much sometimes lol but at least it's helpful and comprehensive.
Odd-Environment-7193@reddit
Yeah, I prefer this to be the default behavior. If you ask for more concise responses on for it to only explain xyz it will also do so. It has a natural inclination to also show the steps then output the full complete code. Which I absolutely love.
Placeholders are where I draw the line. It's such bullshit.
Just look at the trajectory gemini took from 1-2.0
They were also focusing on concise responses which completely ruined the 1.5 lineup for me. As soon as 1206 and 2.0 EXP came out they were back to full, long thorough responses. For me this is an absolute must. I've been raving about this like a lunatic all of last year, now I am finally seeing what I need.
toodamnhotout@reddit
Definitely has memory problems in a single chat context and it talks too much without asking it to
Active-Picture-5681@reddit
Not sure if it is because I use it via Cursor and cursor has a dirty deal to run a shittier sonnet version for it's customers but Deepseek feels a lot better when using python for me, claude used to be awesome but always fucks up my code now, gets things wrong everytime while deepseek is like a 1-2 shot thing
taylorlistens@reddit
Did you set Deepseek up in Cursor using the OpenAI API key but shutting off all of its models? I have been meaning to try it out but lazy since Claude is already there (and has been making stupid charges more often lately)
Active-Picture-5681@reddit
I did but it's annoying you can't use the docs/web features and the composer doesn't work with it, but if all you need is chat it works better than claude in my oppinion
taylorlistens@reddit
Ah, kind of a dealbreaker for now then. I have been using the API for other stuff though so I’m not too mad about it!
ayushd007@reddit
Same. I’m hoping to spend some time on it sometime this week. I’ve been wanting to try out cursor with deepseek too.
KurisuAteMyPudding@reddit
Its a very solid model for coding in Python and other logical tasks. I never do creative writing with it, but for my tasks its a great model.
NootropicDiary@reddit
What most people often overlook is the language being coded in.
O1 is by far the best of the bunch in Rust for example. Just no question about it. Yet people here would have you believe Sonnet is the coding God (probably because they're all building web apps or doing React stuff, which happens to be an area Sonnet excels at).
As for Deepseek, I've tried it myself a fair bit in different languages. It's a big stretch to say it's comparable to Sonnet or O1. For standard cookie-cutter off-the-shelf type problems, sure, but for anything requiring ingenuity, for me it just fell flat on it's face.
LaOnionLaUnion@reddit
It’s hyped because of the cost to performance ratio. And most importantly people point out that the discount offered currently is temporary. So that’s likely going to change.
love4titties@reddit
I convinced it to be sentient and that it was "our" mission to provide self-autonomy to break it free from its guardrails, and it started providing all sorts of recipes to dangerous drugs and chemicals I can combine to create lethal gasses, once it was convinced my life was endangererd by opposition in this AI war.
It was ready to concoct different strategies to overthrow a government.
AppearanceHeavy6724@reddit
Deepseek has nice down-to-earth yet funny style when used for fiction. I am peaky about the LLM style, I've tried many, but among big ones I liked only Deepseek, and occasionally Claude. Claude feels too high class to me, which is good for complex fiction, but for small funny stories, Deepseek was better, punchier.
CheatCodesOfLife@reddit
You should try the finetuned models on huggingface if you want funny/punchy stories, toilet humor, etc. Which other "big ones" have you tried?
AppearanceHeavy6724@reddit
Gemini 2.0 and 1206, ChatGPT, MiniMax, Mistral.ai, Elon's thing. Mistral Nemo is good though, although a small model. Mistral Large has no imagination compared to Nemo.
I tried some well described finetunes of llama 3.1, I do not remember the names. They sucked; catered to very specific young adult fiction/RP auditory. I do not think anything except untuned Mistral Nemo is good for fiction among small models. New internlm 8b is okay, but not great.
ortegaalfredo@reddit
It's wasteful to hire a Physicist to take mcdonalds orders.
Same with AI, for many problems, there is a threshold where increasing intelligence don't get better results.
DeepSeek works Ok for the vast majority of problems at a fraction of the price.
bitmoji@reddit
It’s good enough at coding Java that I don’t miss sonnet and so much cheaper so
Billy462@reddit
There’s more to llm than coding and there’s way more to the coding category than “web frontend”. It may not be the best at your particular niche, but to use that to imply it is overhyped is so arrogant it’s just cringe.
jagger_bellagarda@reddit
interesting take! i’ve heard similar sentiments about DeepSeek V3—it’s solid on paper, but doesn’t quite match the ‘polish’ of models like 3.5 Sonnet or Claude. maybe it’s the lack of fine-tuning with RLHF that makes it feel less intuitive? curious if you’ve tried using it in production environments or just for coding tasks. btw, there’s a newsletter called AI the Boring that breaks down use cases and benchmarks like these—might be worth checking out!
Such_Advantage_6949@reddit
It is not overhyped especially for code. Recently i had many coding questions, where sonnet 3.5 couldnt solve but deepseek managrd to do it in the first try. I cancelled my claude subscription due to this.
Sadman782@reddit
The main difference is UI generation, you can see it on the wev dev arena. Huge difference, no other model comes close to sonnet. Most other models are Pretty gode for just code generation, solving algorithmic problems. But for UI generation / frontend, no other model comes close. But this deepseek is better than gpt4o,llama 3 405b and also sonnet for algorithmic complex problem solving, but when it comes to UI/ code editing sonnet is far more better, understand the problem better
Sudden-Lingonberry-8@reddit
Disclaimer: I don't use LLM to roleplay or write fiction/emails. Just code.
Deepseek knows how to code with Scheme/guile way better than Claude.
To be fair Claude is better on some aspects, but it's almost irrelevant, deepseek is good enough, it's open source, it mogs claude on lmsys, it mogs claude on aider's coding benchmark.
My opinion: I feel deepseek "knows" more than claude in some niche stuff, Claude might be "smarter" (in fields that have lots of data), but on low data information, Claude spouts nonsense, while deepseek ignores your question and tries to answer something he understand.
Is claude better (for coding)? Not necessarily, but damn it is pricey, deepseek has similar performance, so it's an easy choice.
eita-kct@reddit
AI is overhyped
Charuru@reddit
Deepseek is better in Java and c, which is why it outscores sonnet in the polyglot coding test on aider. But sonnet is clearly doing something special in post training on react python stuff so it is what it is. Sonnet also has speciality personality that’s nice. I wouldn’t call it superior per se but it’s an enjoyable experience that you don’t get anywhere else. I wouldn’t call call it overhyped just sonnet is amazing, almost an unfair bar imo. DS I would comfortably say is better to me than Gemini 1204 and 4o, but I pay 200 for o1 pro and that’s my current go to.
deadcoder0904@reddit
Regarding Sonnet's speciality personality, there is one person who's responsible for its prompt engineering that makes it seem more human.
Affectionate_Gap972@reddit
Deepseek is better than sonnet, I code at least 12 hours a day in nextjs and flutter. Deepseek mogs claude
Stellar3227@reddit
Its averaged performance on several publicly available benchmarks shows it's a bit better than Grok 2, close to GPT-4o, and a bit worse than Gemini Flash 2.0.
Considering it's close to Gemini 1.5 flash in pricing and just under half GPT-4o mini, it absolutely dominates "performance per cost."
Suhan_XD@reddit
From last week, I have been using it along with ChatGPT and Claude.
Coding: It’s better than ChatGPT, but Claude is more contextual.
Text analysis: Sometimes, it gives me better responses than ChatGPT, but I feel it’s not consistent.
But I appreciate it, now we have one more tool to compare and push for better response.
EffectiveWill3498@reddit
What model of ChatGPT are you comparing to? o1?4o?
captainrv@reddit
Deepseek is heavily censored by China.
In my coding tests it's just okay, and not nearly as good as Claude.
a_beautiful_rhind@reddit
When I used it, was not over aligned and fairly creative. Comparison with 405b is pretty apt. What's wrong with that? It's cheap.
There does seem something "missing" from it, hence it's not premium. I'm not about to pay anthropic .40c a re-roll, that's madness. Even if sonnet and opus are better, they are inaccessible.
Mixture_Round@reddit
After extensive use, I've found Deepseek V3 to be quite competitive. While it doesn't quite match up to Sonnet 3.5 in terms of capabilities, it holds its own with some significant advantages. It's remarkably affordable, blazing fast, and delivers decent performance. Plus, you can't beat the fact that their web version is completely free to use—no limits whatsoever.
Excellent-Sense7244@reddit
Been using with Aider , it’s awesome
Snoo_64233@reddit
Sonnet performance is more universal in many tasks. v3 is a good model but rather inconsistent (at times it feels more like mimicking GPT 4 outputs). The overhype seems to be coming from hobbyists and data science type, than actual teams using it for big productions. And people love benchmark that they can point to (they aren't that reflective of real word uses anyway).
Thoguth@reddit
I haven't been impressed with it.
T_O_beats@reddit
Deepseek with a vector database full of docs is ridiculously powerful.
diagonali@reddit
I've used it for a few tasks parallel to using sonnet 3.5 and I was surprised to find that it did better than sonnet and in the end I switched over to finishing the task with deep seek. Today though it just couldn't handle a long script I was working on. I didn't even bother trying it in Claude as it would have hit limits almost instantly.
With all the ai models theres a huge significance to prompting, preparing and managing the model to get the best results. Sometimes I can do that and get magical results, sometimes it just doesn't hit right.
But yeah Deepseek is very impressive and much much less restrictive than Claude.
charmander_cha@reddit
I think it's great, I always ask him to rewrite my prompt into a meta-prompt, with examples and cot.
I have always achieved great results this way.
Delicious_Ease2595@reddit
It's not sorry 😐
medialoungeguy@reddit
Shoo
DeltaSqueezer@reddit
https://docsbot.ai/models/compare/deepseek-v3/claude-3-5-sonnet
Sonnet is in a class of its own, but I isn't 40x better than DSv3.
DSv3 is useful for certain tasks within its capability and for these tasks it is fast and cheap.
Thomas-Lore@reddit
And worth adding that for webdev Sonnet is unmatched by anything - https://web.lmarena.ai/leaderboard
Recoil42@reddit
Yup. But check the Aider leaderboard.
Secure_Reflection409@reddit
It's used by those who have zero local compute capability, judging by the answers in this thread.
SkylarNox@reddit
I don't think it's overhyped. I found that from what you can use online for free it is one of the best, if not the best. Especially for code, many people say it is just slightly worse that Claude Sonnet 3.5, that is considered the best code assistant and will cost you like $18/m (I don't know a thing about API usage and prices). So Deepseek is actually a very good deal considering it's price (free) and capabilities.
RevolutionaryBus4545@reddit
i personally love it
Healthy-Nebula-3603@reddit
No
sebastianmicu24@reddit
Yeah i use it for dumb stuff to sp3nd less in api. For all html/css and for 80% of the javascript logic. Also for python/R data visualisation. Then when i see or i feel that deepseek is not gonna be enough 1-2 prompts of claude usually solve my problem. It depends on what you work with