anyone actually tried deepseek v4 pro for coding?
Posted by Plenty_Extent_9047@reddit | LocalLLaMA | View on Reddit | 32 comments
so v4 pro dropped and barely anyone is talking about it. feels weird since when kimi k2.6 came out i seen post about it everywhere
anyone here daily driving v4 pro for actual code work? hows it compare to k2.6 or glm 5.1 in real use?
VeterinarianOk3948@reddit
I used it all day yesterday. It works great! It’s no Opus or GPT 5.5, but it does a great job.
FutureSailor1994@reddit
I don’t know what’s going on with the benchmarks, but Deepseek V4 Pro has honestly blown me away compared to the other Chinese models like GLM. For many topics, it’s even more helpful than ChatGPT or Claude.
I saw some places Kimi K2.6 has a higher benchmark than Deepseek, but this did not align with my real world experience. Kimi was pretty good for offloading light work for cheap tokens, but for harder tasks — Deepseek consistently made more sense when I compared the responses and logic. lol, I wanted to try out the competition, so I’ve been playing around with a few Chinese models lately and this one has impressed me the most.
US is still in the lead, but it’s tight next year imo.
GCoderDCoder@reddit
Ok this is interesting to hear. I saw an xcreates video today where flash was performing well it seemed. I get benchmarks arent everything but my experiences do tend to align with artificial analysis which my understanding is a 3rd party so compared to the options especially comparing token usage I lost interest in deepseek v4. Im willing to accept benchmarks may not tell the whole story of people are having good experiences.
2Norn@reddit
i use v4 pro and flash via openrouter as worker subagents. so they dont design, plan, discuss or research don't do anything other than implementations. u get gpt 5.4 level performance almost for 5.4 nano pricing, in fact its even cheaper than nano. flash is even crazier. i dont think there is any model out there which can compete with this in cost effectiveness. its by far the cheapest sota model. like by far. the reason I dont fully use them is becuz i still prefer claude/gpt models for design etc, not that i tested chinese ones yet on that front. i probably should.
BriefImplement9843@reddit
pro is not cheap at all. it's under a temporary 75% price cut until the 5th.
2Norn@reddit
well its cheap right now.
Gregory-Wolf@reddit
what about others saying that DS4 are too "thinking"?
666666thats6sixes@reddit
Not when used properly. If used in a harness with a real system prompt and tools, it will be straight to the point and efficient. If literally all input it gets is a "hi", then yes, it will spend a lot of time covering all possibilities before converging on an answer. Same as Qwen.
_IAlwaysLie@reddit
you can reduce qwen think with sys prompt?
look@reddit
V4 Pro has some nice positives but also some serious negatives. Primary issue for most, I imagine, is it’s relatively expensive compared to others options (a combination of both per token price and its reasoning token use is off the charts).
The high verbosity also makes it slower than others as well as costing more to actually use.
Additionally, it has an extremely high hallucination rate. It knows a lot of things, but when it doesn’t know the answer, it makes something up. The rate is even worse than Gemini 3 Flash, I believe.
However, the V4 Flash model is interesting, and people are talking about it. It has almost all the same positives and negatives as Pro, but it’s a sixth the price. That makes it very useful anywhere hallucination isn’t a dealbreaker.
Final-Rush759@reddit
They are not expensive, running 75% discount now.
BriefImplement9843@reddit
until the 5th. it's expensive.
Odd-Environment-7193@reddit
Thisa is true. Pretty crazy. I would say almost cheaper than v3 going off my last batch. Or a bit more but super cheap nonetheless.
LittleYouth4954@reddit
I am testing it extensively in the last 2 days and for my use cases (scientific coding) it is performing superbly. It is consistently finding bugs, flaws and gaps not detected by glm 5.1 and kimi k2.6.
breadfruitcore@reddit
Out of curiosity what domain of scientific coding are you working on? I'm implementing numerical algorithms too and would love a Claude replacement.
LittleYouth4954@reddit
Bayesian ecological modeling
breadfruitcore@reddit
Very cool!
CalligrapherFar7833@reddit
How are you running it locally ?
snmnky9490@reddit
I would assume they are not running it locally unless specifically mentioned
FyreKZ@reddit
They're not.
Diligent-Builder7762@reddit
I tested it a bit. It was a bit expensive but a joy to see it roll.
guiopen@reddit
Using pro for planning and flash for building in opencode, the thing that surprises me the most is how it keeps coherence in long context, they are also very obedient in the sense that, if you tell it to forget path x and focus on path y, it will do that, it will not fight you saying path x is the correct
It does incredibly well at code review and catching bugs, and it's output is easy to read in contrad twitch other models like gpt that respond only in bullet points
Overall, incredible models, for coding they replaced every other models for me.
Lissanro@reddit
V4 is not supported in llama.cpp yet, so I did not yet get to try it on my rig.
As of Kimi-K2.6 and GLM-5.1, GLM-5.1 seems to solve better complicated tasks, like resolving complex git rebase conflicts, while K2.6 can get stuck. But K2.6 is faster and overall still smart enough for most tasks, so I probably will use it most often.
flobernd@reddit
Interesting. Most people seem to have the exact opposite experience with GLM51 vs. K26 (but they are pretty comparable; I personally like that K26 has vision capabilities).
DS4P seems to be the weakest of these three if it comes to pure coding / agentic tasks. It however shines when it comes to world knowledge.
Plenty_Extent_9047@reddit (OP)
i wish i could run this locally xd, yeah so far i been using glm 5.1 mostly exclusively
nmfisher@reddit
I tried v4 Pro over the weekend (via DeepSeek API & Claude Code) and it was flawless for me. These weren’t the hardest tasks but they did require knowledge of the entire codebase, and its recommendations & implementations were on par with Opus. If the price was right, I’d switch to this as a daily driver in an instant.
Flash does seem to require a bit more hand-holding, though that’s expected.
Dizzy_Humor4220@reddit
Using the deepseek api and it’s performing very well for me (until they get flooded with users I guess). Even flash seems to follow instructions very well for very large contexts. Flash already seems smart enough but I’ll use Pro for complex planning and then switch for execution. Pro is too slow for me for simple execution
008bits@reddit
I tried it via the API and OpenCode. Honestly, I was very surprised by the quality of its code and its design. I'm working on a large codebase. It follows the defined rules very well.The flash version is also very good. A sort of light sonnet. And the price is amazing. A morning's work with professionals cost me less than $0.30.
AnomalyNexus@reddit
Yeah on api though reckon it’s better suited for reviewing code than writing necessarily imo.
It writes these huge walls of text about what it’s doing. Like just solid blocks of just essays. Other models are more sentence or two then a tool call. The result seems good but think using it as main coding model would irritate in long run.
-dysangel-@reddit
I've been trying to run flash locally on mlx, but the implementation is still squirrelly. And to run V4 Pro I'd need a Q2 quant, which I haven't seen available for download yet. I'd consider quanting myself if the models weren't "preview" versions.
Evening_Ad6637@reddit
I think Deepseek is increasingly becoming one of the most or the most important lab for foundational research (in ML/DL) in the OSS world, while labs like Moonshot and Z-AI have are more specialized in training these foundation models.
Ofc another one of the most important labs is Qwen. But qwen builds smaller models and my impression is that they still are capable of handling both jobs.
I think we will see more about Deepseek v4 when other labs build their models on top of it.
qwen_next_gguf_when@reddit
Can't run it locally or else I would be all over it.