anyone actually tried deepseek v4 pro for coding?
Posted by Plenty_Extent_9047@reddit | LocalLLaMA | View on Reddit | 60 comments
so v4 pro dropped and barely anyone is talking about it. feels weird since when kimi k2.6 came out i seen post about it everywhere
anyone here daily driving v4 pro for actual code work? hows it compare to k2.6 or glm 5.1 in real use?
nmfisher@reddit
I tried v4 Pro over the weekend (via DeepSeek API & Claude Code) and it was flawless for me. These weren’t the hardest tasks but they did require knowledge of the entire codebase, and its recommendations & implementations were on par with Opus. If the price was right, I’d switch to this as a daily driver in an instant.
Flash does seem to require a bit more hand-holding, though that’s expected.
argumentnull@reddit
Is it expensive than Opus? Why do you think the price isn't right?
nmfisher@reddit
When I made that comment, V4 Pro was still being charged at full price so it wasn't clear if it was cheaper than Opus (via a $100 Max subscription). They've since discounted the price by 75% which makes it much more attractive.
EntertainmentCold604@reddit
Been using Deepseek v4 Pro and local Qwen3.6-35B-A3B for code audits on a number of local projects. Local Qwen caught bugs Deepseek missed and vice-versa but working together I feel I have very good coverage and burned les than $1 in about 4 hours usage running them both across 6 different interlinked code bases.
Impressive
look@reddit
V4 Pro has some nice positives but also some serious negatives. Primary issue for most, I imagine, is it’s relatively expensive compared to others options (a combination of both per token price and its reasoning token use is off the charts).
The high verbosity also makes it slower than others as well as costing more to actually use.
Additionally, it has an extremely high hallucination rate. It knows a lot of things, but when it doesn’t know the answer, it makes something up. The rate is even worse than Gemini 3 Flash, I believe.
However, the V4 Flash model is interesting, and people are talking about it. It has almost all the same positives and negatives as Pro, but it’s a sixth the price. That makes it very useful anywhere hallucination isn’t a dealbreaker.
Final-Rush759@reddit
They are not expensive, running 75% discount now.
BriefImplement9843@reddit
until the 5th. it's expensive.
Bitter-Magazine2081@reddit
The discount has been changed to be permanent.
Active-Play7630@reddit
They also slashed input cache hits to 1/10th the cost and that's permanent as far as I understand it.
Odd-Environment-7193@reddit
Thisa is true. Pretty crazy. I would say almost cheaper than v3 going off my last batch. Or a bit more but super cheap nonetheless.
happyilyrednow@reddit
How good is it at Rust, I wonder. hmmm.
guiopen@reddit
Using pro for planning and flash for building in opencode, the thing that surprises me the most is how it keeps coherence in long context, they are also very obedient in the sense that, if you tell it to forget path x and focus on path y, it will do that, it will not fight you saying path x is the correct
It does incredibly well at code review and catching bugs, and it's output is easy to read in contrad twitch other models like gpt that respond only in bullet points
Overall, incredible models, for coding they replaced every other models for me.
bleakj@reddit
How long do you find it takes pro to do the actual planning / how large is the project?
I'm just trying Deepseek v4 pro for the first time this morning and while /init was reasonably quick (Even if it was like 20k tokens?) - but then asking it to create a plan to fix what I thought should be a fairly simple bug that I told it what file / lines it was to look at, and it's been .. 45 minutes or more now? (It's not a huge project)
2Norn@reddit
i use v4 pro and flash via openrouter as worker subagents. so they dont design, plan, discuss or research don't do anything other than implementations. u get gpt 5.4 level performance almost for 5.4 nano pricing, in fact its even cheaper than nano. flash is even crazier. i dont think there is any model out there which can compete with this in cost effectiveness. its by far the cheapest sota model. like by far. the reason I dont fully use them is becuz i still prefer claude/gpt models for design etc, not that i tested chinese ones yet on that front. i probably should.
BriefImplement9843@reddit
pro is not cheap at all. it's under a temporary 75% price cut until the 5th.
AKCanon_@reddit
Is there a source to read regarding the temporary 75% price cut until the 5th? Also, they did mention for the v4:
"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."
But for now, it's more then enough for any task, and enjoyable for coding mostly.
seunosewa@reddit
the 75% discount is until the 31st
2Norn@reddit
well its cheap right now.
Gregory-Wolf@reddit
what about others saying that DS4 are too "thinking"?
666666thats6sixes@reddit
Not when used properly. If used in a harness with a real system prompt and tools, it will be straight to the point and efficient. If literally all input it gets is a "hi", then yes, it will spend a lot of time covering all possibilities before converging on an answer. Same as Qwen.
_IAlwaysLie@reddit
you can reduce qwen think with sys prompt?
666666thats6sixes@reddit
Yes, the amount of thinking is directly proportional to how constrained the solution space already is. If you're 50k tokens into an agentic task, the thinking is just a few sentences per turn. If you're at the beginning and don't have a long system prompt that constrains the possible ways the conversation could take, the model will think a lot longer. But after that long initial thought, it will continue with short thinking (assuming you don't strip them from messages).
_IAlwaysLie@reddit
very interesting, thank you, that makes sense. I was kind of avoiding qwen because of how long it thought when I typed "test" or "hello"
666666thats6sixes@reddit
That's it, a hello with no system prompt (or just a generic "you are a helpful assistant") will result in a wall of text, but if you do the same with a system prompt that establishes the model's role and goals, e.g. a tech support for a company with an outline of typical tasks and desired outcomes, it will think for just a little moment before greeting you professionally and asking what you need help with.
FutureSailor1994@reddit
I don’t know what’s going on with the benchmarks, but Deepseek V4 Pro has honestly blown me away compared to the other Chinese models like GLM. For many topics, it’s even more helpful than ChatGPT or Claude.
I saw some places Kimi K2.6 has a higher benchmark than Deepseek, but this did not align with my real world experience. Kimi was pretty good for offloading light work for cheap tokens, but for harder tasks — Deepseek consistently made more sense when I compared the responses and logic. lol, I wanted to try out the competition, so I’ve been playing around with a few Chinese models lately and this one has impressed me the most.
US is still in the lead, but it’s tight next year imo.
nyonor@reddit
Almost all heavy refactoring tasks in a big code base assigned to deepseek4pro has significant amount of flaws which I check by gpt5.5 or opus 4.7... So IMO it's overthinking and loosing some details...
GCoderDCoder@reddit
Ok this is interesting to hear. I saw an xcreates video today where flash was performing well it seemed. I get benchmarks arent everything but my experiences do tend to align with artificial analysis which my understanding is a 3rd party so compared to the options especially comparing token usage I lost interest in deepseek v4. Im willing to accept benchmarks may not tell the whole story of people are having good experiences.
Evening_Ad6637@reddit
I think Deepseek is increasingly becoming one of the most or the most important lab for foundational research (in ML/DL) in the OSS world, while labs like Moonshot and Z-AI have are more specialized in training these foundation models.
Ofc another one of the most important labs is Qwen. But qwen builds smaller models and my impression is that they still are capable of handling both jobs.
I think we will see more about Deepseek v4 when other labs build their models on top of it.
The-Singular@reddit
Qwen also builds models as big as GLM 4.7-5 (roughly), but it stopped open-sourcing their bigger models with qwen3.6's release.
SeyAssociation38@reddit
They have always done that, Never open sourcing their largest model
The-Singular@reddit
The plus variant was just the 1M context window plus AliBaba provided tools available, until 3.6.
Quote from Qwen 3.5 announcement:
>Qwen3.5-Plus is the hosted model available via Alibaba Cloud Model Studio, featuring:
> *a 1M context window by default
> *official built-in tools and adaptive tool use
LeTanLoc98@reddit
Almost no one uses it
tombdweller@reddit
The providers on openrouter don't give the 75% discount they have on their platform, anyone using it is wasting money
LeTanLoc98@reddit
Please show ignored
tombdweller@reddit
ah, I wasn't aware of that... thanks! I can now use my credits from OR too lol
Longjumping_Elk6089@reddit
Am testing it now and as others have mentioned, seems pretty solid all around.
Reasonable-Climate66@reddit
too long to explain, worst model for coding. - period
codegolf-guru@reddit
i've been running some tests on v4 pro for a refactor project over the last 2 days. I also felt it being a bit quiet, but i think people are just still figuring out the new reasoning passes. And still, compared to kimi k2.6, v4 pro feels a lot more stable on long-horizon planning tbh. The direct deepseek api tho has been hitting some speed inconcistency in peak-hours. One of the first who were providing APIs for other models were DeepInfra so im sticking to them.
It also depends on the reasoning modes you choose for each, because if I use think_max mode, i catch edge cases in concurrent code, more than when using glm 5.1 for that
in general Its nice that you ask for a real use because what we see in benchmarks sometimes is not exactly the real case scenario for some of the cases
julianfromstagewise@reddit
just started using it to create launch videos with code (remotion) and i'm absolutely impressed
i probably wouldn't have noticed the difference between sonnet 4.6 and deepseek if i hadn't seen the label on the model switch
Flaky_Pay_2367@reddit
I've tried with OpenCode + Fireworks
damn it's much slower than kimi k2.6 and kimi k2.5 turbo
but the reasoning is better (solves problem Gemini-Side-By-Side Chat that kimi and glm couldn't)
and it doesn't "forever loop" like MiniMax 2.7
tortangtalong88@reddit
Im using deepseek v4 flash for my bot agents works really well!
Tate-s-ExitLiquidity@reddit
I wonder how it performs compared to Kimi 2.6
Django_McFly@reddit
The discounted price is good and it'll be my daily driver until it ends. When it ends though, I'll probably just go back to the the usual suspects for big OSS like Kimi, MiniMax, GLM, etc.
It feels better for my use case, which is pretty casual coding, computer use/setup and making interfaces for things I use, but not like 3X-5X the price of MiniMax better. If it was only like 20-30 % more, I'd pivot. For what I do, if Kimi or MiniMax can't do it, handing it to DeepSeek probably won't help much either and I need like an Opus 4.6 level model to actually stop spending hours and burning through tokens.
I haven't tried flash yet though. If it feels pretty much as good as the other stuff I mentioned for more casual use, I'd switch to it as my daily one.
VeterinarianOk3948@reddit
I used it all day yesterday. It works great! It’s no Opus or GPT 5.5, but it does a great job.
LittleYouth4954@reddit
I am testing it extensively in the last 2 days and for my use cases (scientific coding) it is performing superbly. It is consistently finding bugs, flaws and gaps not detected by glm 5.1 and kimi k2.6.
breadfruitcore@reddit
Out of curiosity what domain of scientific coding are you working on? I'm implementing numerical algorithms too and would love a Claude replacement.
LittleYouth4954@reddit
Bayesian ecological modeling
breadfruitcore@reddit
Very cool!
CalligrapherFar7833@reddit
How are you running it locally ?
snmnky9490@reddit
I would assume they are not running it locally unless specifically mentioned
FyreKZ@reddit
They're not.
Diligent-Builder7762@reddit
I tested it a bit. It was a bit expensive but a joy to see it roll.
Lissanro@reddit
V4 is not supported in llama.cpp yet, so I did not yet get to try it on my rig.
As of Kimi-K2.6 and GLM-5.1, GLM-5.1 seems to solve better complicated tasks, like resolving complex git rebase conflicts, while K2.6 can get stuck. But K2.6 is faster and overall still smart enough for most tasks, so I probably will use it most often.
flobernd@reddit
Interesting. Most people seem to have the exact opposite experience with GLM51 vs. K26 (but they are pretty comparable; I personally like that K26 has vision capabilities).
DS4P seems to be the weakest of these three if it comes to pure coding / agentic tasks. It however shines when it comes to world knowledge.
Plenty_Extent_9047@reddit (OP)
i wish i could run this locally xd, yeah so far i been using glm 5.1 mostly exclusively
Dizzy_Humor4220@reddit
Using the deepseek api and it’s performing very well for me (until they get flooded with users I guess). Even flash seems to follow instructions very well for very large contexts. Flash already seems smart enough but I’ll use Pro for complex planning and then switch for execution. Pro is too slow for me for simple execution
008bits@reddit
I tried it via the API and OpenCode. Honestly, I was very surprised by the quality of its code and its design. I'm working on a large codebase. It follows the defined rules very well.The flash version is also very good. A sort of light sonnet. And the price is amazing. A morning's work with professionals cost me less than $0.30.
AnomalyNexus@reddit
Yeah on api though reckon it’s better suited for reviewing code than writing necessarily imo.
It writes these huge walls of text about what it’s doing. Like just solid blocks of just essays. Other models are more sentence or two then a tool call. The result seems good but think using it as main coding model would irritate in long run.
-dysangel-@reddit
I've been trying to run flash locally on mlx, but the implementation is still squirrelly. And to run V4 Pro I'd need a Q2 quant, which I haven't seen available for download yet. I'd consider quanting myself if the models weren't "preview" versions.
qwen_next_gguf_when@reddit
Can't run it locally or else I would be all over it.