anyone actually tried deepseek v4 pro for coding?

[-]

nmfisher@reddit

I tried v4 Pro over the weekend (via DeepSeek API & Claude Code) and it was flawless for me. These weren’t the hardest tasks but they did require knowledge of the entire codebase, and its recommendations & implementations were on par with Opus. If the price was right, I’d switch to this as a daily driver in an instant.

Flash does seem to require a bit more hand-holding, though that’s expected.

[-]

argumentnull@reddit

Is it expensive than Opus? Why do you think the price isn't right?

[-]

nmfisher@reddit

When I made that comment, V4 Pro was still being charged at full price so it wasn't clear if it was cheaper than Opus (via a $100 Max subscription). They've since discounted the price by 75% which makes it much more attractive.

[-]

EntertainmentCold604@reddit

Been using Deepseek v4 Pro and local Qwen3.6-35B-A3B for code audits on a number of local projects. Local Qwen caught bugs Deepseek missed and vice-versa but working together I feel I have very good coverage and burned les than $1 in about 4 hours usage running them both across 6 different interlinked code bases.

Impressive

[-]

look@reddit

V4 Pro has some nice positives but also some serious negatives. Primary issue for most, I imagine, is it’s relatively expensive compared to others options (a combination of both per token price and its reasoning token use is off the charts).

The high verbosity also makes it slower than others as well as costing more to actually use.

Additionally, it has an extremely high hallucination rate. It knows a lot of things, but when it doesn’t know the answer, it makes something up. The rate is even worse than Gemini 3 Flash, I believe.

However, the V4 Flash model is interesting, and people are talking about it. It has almost all the same positives and negatives as Pro, but it’s a sixth the price. That makes it very useful anywhere hallucination isn’t a dealbreaker.

[-]

Final-Rush759@reddit

They are not expensive, running 75% discount now.

[-]

BriefImplement9843@reddit

until the 5th. it's expensive.

[-]

Bitter-Magazine2081@reddit

The discount has been changed to be permanent.

[-]

Active-Play7630@reddit

They also slashed input cache hits to 1/10th the cost and that's permanent as far as I understand it.

[-]

Odd-Environment-7193@reddit

Thisa is true. Pretty crazy. I would say almost cheaper than v3 going off my last batch. Or a bit more but super cheap nonetheless.

[-]

happyilyrednow@reddit

How good is it at Rust, I wonder. hmmm.

[-]

guiopen@reddit

Using pro for planning and flash for building in opencode, the thing that surprises me the most is how it keeps coherence in long context, they are also very obedient in the sense that, if you tell it to forget path x and focus on path y, it will do that, it will not fight you saying path x is the correct

It does incredibly well at code review and catching bugs, and it's output is easy to read in contrad twitch other models like gpt that respond only in bullet points

Overall, incredible models, for coding they replaced every other models for me.

[-]

bleakj@reddit

How long do you find it takes pro to do the actual planning / how large is the project?

I'm just trying Deepseek v4 pro for the first time this morning and while /init was reasonably quick (Even if it was like 20k tokens?) - but then asking it to create a plan to fix what I thought should be a fairly simple bug that I told it what file / lines it was to look at, and it's been .. 45 minutes or more now? (It's not a huge project)

[-]

2Norn@reddit

i use v4 pro and flash via openrouter as worker subagents. so they dont design, plan, discuss or research don't do anything other than implementations. u get gpt 5.4 level performance almost for 5.4 nano pricing, in fact its even cheaper than nano. flash is even crazier. i dont think there is any model out there which can compete with this in cost effectiveness. its by far the cheapest sota model. like by far. the reason I dont fully use them is becuz i still prefer claude/gpt models for design etc, not that i tested chinese ones yet on that front. i probably should.

[-]

BriefImplement9843@reddit

pro is not cheap at all. it's under a temporary 75% price cut until the 5th.

[-]

AKCanon_@reddit

Is there a source to read regarding the temporary 75% price cut until the 5th? Also, they did mention for the v4:

"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."

But for now, it's more then enough for any task, and enjoyable for coding mostly.

[-]

seunosewa@reddit

the 75% discount is until the 31st

[-]

2Norn@reddit

well its cheap right now.

[-]

Gregory-Wolf@reddit

what about others saying that DS4 are too "thinking"?

[-]

666666thats6sixes@reddit

Not when used properly. If used in a harness with a real system prompt and tools, it will be straight to the point and efficient. If literally all input it gets is a "hi", then yes, it will spend a lot of time covering all possibilities before converging on an answer. Same as Qwen.

[-]

_IAlwaysLie@reddit

you can reduce qwen think with sys prompt?

[-]

666666thats6sixes@reddit

Yes, the amount of thinking is directly proportional to how constrained the solution space already is. If you're 50k tokens into an agentic task, the thinking is just a few sentences per turn. If you're at the beginning and don't have a long system prompt that constrains the possible ways the conversation could take, the model will think a lot longer. But after that long initial thought, it will continue with short thinking (assuming you don't strip them from messages).

[-]

_IAlwaysLie@reddit

very interesting, thank you, that makes sense. I was kind of avoiding qwen because of how long it thought when I typed "test" or "hello"

[-]

666666thats6sixes@reddit

That's it, a hello with no system prompt (or just a generic "you are a helpful assistant") will result in a wall of text, but if you do the same with a system prompt that establishes the model's role and goals, e.g. a tech support for a company with an outline of typical tasks and desired outcomes, it will think for just a little moment before greeting you professionally and asking what you need help with.

[-]

FutureSailor1994@reddit

I don’t know what’s going on with the benchmarks, but Deepseek V4 Pro has honestly blown me away compared to the other Chinese models like GLM. For many topics, it’s even more helpful than ChatGPT or Claude.

I saw some places Kimi K2.6 has a higher benchmark than Deepseek, but this did not align with my real world experience. Kimi was pretty good for offloading light work for cheap tokens, but for harder tasks — Deepseek consistently made more sense when I compared the responses and logic. lol, I wanted to try out the competition, so I’ve been playing around with a few Chinese models lately and this one has impressed me the most.

US is still in the lead, but it’s tight next year imo.

[-]

nyonor@reddit

Almost all heavy refactoring tasks in a big code base assigned to deepseek4pro has significant amount of flaws which I check by gpt5.5 or opus 4.7... So IMO it's overthinking and loosing some details...

[-]

GCoderDCoder@reddit

Ok this is interesting to hear. I saw an xcreates video today where flash was performing well it seemed. I get benchmarks arent everything but my experiences do tend to align with artificial analysis which my understanding is a 3rd party so compared to the options especially comparing token usage I lost interest in deepseek v4. Im willing to accept benchmarks may not tell the whole story of people are having good experiences.

[-]

Evening_Ad6637@reddit

I think Deepseek is increasingly becoming one of the most or the most important lab for foundational research (in ML/DL) in the OSS world, while labs like Moonshot and Z-AI have are more specialized in training these foundation models.

Ofc another one of the most important labs is Qwen. But qwen builds smaller models and my impression is that they still are capable of handling both jobs.

I think we will see more about Deepseek v4 when other labs build their models on top of it.

[-]

The-Singular@reddit

Qwen also builds models as big as GLM 4.7-5 (roughly), but it stopped open-sourcing their bigger models with qwen3.6's release.

[-]

SeyAssociation38@reddit

They have always done that, Never open sourcing their largest model

[-]

The-Singular@reddit

The plus variant was just the 1M context window plus AliBaba provided tools available, until 3.6.

Quote from Qwen 3.5 announcement:

>Qwen3.5-Plus is the hosted model available via Alibaba Cloud Model Studio, featuring:

> *a 1M context window by default

> *official built-in tools and adaptive tool use

[-]

LeTanLoc98@reddit

Almost no one uses it

[-]

tombdweller@reddit

The providers on openrouter don't give the 75% discount they have on their platform, anyone using it is wasting money

[-]

LeTanLoc98@reddit

Please show ignored

[-]

tombdweller@reddit

ah, I wasn't aware of that... thanks! I can now use my credits from OR too lol

[-]

Longjumping_Elk6089@reddit

Am testing it now and as others have mentioned, seems pretty solid all around.

[-]

Reasonable-Climate66@reddit

too long to explain, worst model for coding. - period

[-]

codegolf-guru@reddit

i've been running some tests on v4 pro for a refactor project over the last 2 days. I also felt it being a bit quiet, but i think people are just still figuring out the new reasoning passes. And still, compared to kimi k2.6, v4 pro feels a lot more stable on long-horizon planning tbh. The direct deepseek api tho has been hitting some speed inconcistency in peak-hours. One of the first who were providing APIs for other models were DeepInfra so im sticking to them.

It also depends on the reasoning modes you choose for each, because if I use think_max mode, i catch edge cases in concurrent code, more than when using glm 5.1 for that

in general Its nice that you ask for a real use because what we see in benchmarks sometimes is not exactly the real case scenario for some of the cases

[-]

julianfromstagewise@reddit

just started using it to create launch videos with code (remotion) and i'm absolutely impressed

i probably wouldn't have noticed the difference between sonnet 4.6 and deepseek if i hadn't seen the label on the model switch

[-]

Flaky_Pay_2367@reddit

I've tried with OpenCode + Fireworks
damn it's much slower than kimi k2.6 and kimi k2.5 turbo
but the reasoning is better (solves problem Gemini-Side-By-Side Chat that kimi and glm couldn't)
and it doesn't "forever loop" like MiniMax 2.7

[-]

tortangtalong88@reddit

Im using deepseek v4 flash for my bot agents works really well!

[-]

Tate-s-ExitLiquidity@reddit

I wonder how it performs compared to Kimi 2.6

[-]

Django_McFly@reddit

The discounted price is good and it'll be my daily driver until it ends. When it ends though, I'll probably just go back to the the usual suspects for big OSS like Kimi, MiniMax, GLM, etc.

It feels better for my use case, which is pretty casual coding, computer use/setup and making interfaces for things I use, but not like 3X-5X the price of MiniMax better. If it was only like 20-30 % more, I'd pivot. For what I do, if Kimi or MiniMax can't do it, handing it to DeepSeek probably won't help much either and I need like an Opus 4.6 level model to actually stop spending hours and burning through tokens.

I haven't tried flash yet though. If it feels pretty much as good as the other stuff I mentioned for more casual use, I'd switch to it as my daily one.

[-]

VeterinarianOk3948@reddit

I used it all day yesterday. It works great! It’s no Opus or GPT 5.5, but it does a great job.

[-]

LittleYouth4954@reddit

I am testing it extensively in the last 2 days and for my use cases (scientific coding) it is performing superbly. It is consistently finding bugs, flaws and gaps not detected by glm 5.1 and kimi k2.6.

[-]

breadfruitcore@reddit

Out of curiosity what domain of scientific coding are you working on? I'm implementing numerical algorithms too and would love a Claude replacement.

[-]

LittleYouth4954@reddit

Bayesian ecological modeling

[-]

breadfruitcore@reddit

Very cool!

[-]

CalligrapherFar7833@reddit

How are you running it locally ?

[-]

snmnky9490@reddit

I would assume they are not running it locally unless specifically mentioned

[-]

FyreKZ@reddit

They're not.

[-]

Diligent-Builder7762@reddit

I tested it a bit. It was a bit expensive but a joy to see it roll.

[-]

Lissanro@reddit

V4 is not supported in llama.cpp yet, so I did not yet get to try it on my rig.

As of Kimi-K2.6 and GLM-5.1, GLM-5.1 seems to solve better complicated tasks, like resolving complex git rebase conflicts, while K2.6 can get stuck. But K2.6 is faster and overall still smart enough for most tasks, so I probably will use it most often.

[-]

flobernd@reddit

Interesting. Most people seem to have the exact opposite experience with GLM51 vs. K26 (but they are pretty comparable; I personally like that K26 has vision capabilities).

DS4P seems to be the weakest of these three if it comes to pure coding / agentic tasks. It however shines when it comes to world knowledge.

[-]

Plenty_Extent_9047@reddit (OP)

i wish i could run this locally xd, yeah so far i been using glm 5.1 mostly exclusively

[-]

Dizzy_Humor4220@reddit

Using the deepseek api and it’s performing very well for me (until they get flooded with users I guess). Even flash seems to follow instructions very well for very large contexts. Flash already seems smart enough but I’ll use Pro for complex planning and then switch for execution. Pro is too slow for me for simple execution

[-]

008bits@reddit

I tried it via the API and OpenCode. Honestly, I was very surprised by the quality of its code and its design. I'm working on a large codebase. It follows the defined rules very well.The flash version is also very good. A sort of light sonnet. And the price is amazing. A morning's work with professionals cost me less than $0.30.

[-]

AnomalyNexus@reddit

Yeah on api though reckon it’s better suited for reviewing code than writing necessarily imo.

It writes these huge walls of text about what it’s doing. Like just solid blocks of just essays. Other models are more sentence or two then a tool call. The result seems good but think using it as main coding model would irritate in long run.

[-]

-dysangel-@reddit

I've been trying to run flash locally on mlx, but the implementation is still squirrelly. And to run V4 Pro I'd need a Q2 quant, which I haven't seen available for download yet. I'd consider quanting myself if the models weren't "preview" versions.

[-]

qwen_next_gguf_when@reddit

Can't run it locally or else I would be all over it.