What happened to Deepseek?
Posted by Mr_Moonsilver@reddit | LocalLLaMA | View on Reddit | 131 comments
Meta had a comeback - arguably not opensource, but still - but Deepseek just seems to have vanished from the scene. What happened? Will we ever see Deepseek V4?
OmarBessa@reddit
The rumor is that they have a 1T A37B multimodal model coming
Betadoggo_@reddit
3.2 is still quite capable even if it's not maxed out on tool calling and coding benchmarks like everyone else is. It's still the #1 non-free model on openrouter. They're just taking their time on this one.
Needausernameplzz@reddit
yeah, deepseek api handles all the work my local LLMs can't
claytonjr@reddit
Yup, even the older 3.1 is still a workhorse for me. I build fairly complex apps with pydanticai and it doesn't sweat. Yeah a new model would be cool but they really have a lot eons ago.
FullOf_Bad_Ideas@reddit
it's really impressive how DeepSeek 3.2 usage is still growing on OpenRouter.
It's been 4 months since it released, and there's still barely anything out there that has such low price with this kind of quality. Models from Minimax are more expensive despite being 3x smaller, and GLM is a lot more expensive despite being similar in size and architecture.
TheRealMasonMac@reddit
It's crazy that it's almost 1/10th the price of GLM-5.1
xoexohexox@reddit
If you look at tokens via sillytavern it's #1 with a bullet, nothing else comes close. Goes to show ya what happens when you release a completely uncensored model
atape_1@reddit
It's the Valve Half Life effect. They bottled lightning once by having a model that beat everything that exists. They need to do it again otherwise it won't live up to the hype. That's why it is taking so long.
TheRealMasonMac@reddit
Deepseek Alyx when
s101c@reddit
You want a model that runs only on specialized hardware? ;)
PsyOmega@reddit
rt but ASIC's took over blockchain, why isn't there an ASIC for AI yet
CatalyticDragon@reddit
Plenty exist. Google's TPU, AWS's custom chips, Tenstorrent, Groq, Cerebras, and more.
But one reason why NVIDIA and AMD sell tens of billions in more generally programmable devices is because AI is still a rather fluid field. Different algorithms are being developed all the time and there is value in flexibility.
But also, because AI is such a memory heavy workload, the underlying computation is less of the bottleneck to simply having access to lots of fast memory.
PsyOmega@reddit
Why not make an asic for memory compression too then
CatalyticDragon@reddit
These exist in many forms but you need to build the hardware for a specific compression algortihm for it to make sense.
And quantization is not your typical compression. It is lowering precision. It is techniques to selectively throw away data.
And those techniques keep changing. TurboQuant / RotorQuant are examples of that.
skandaanshu@reddit
You have to throw away your old ASIC every time model weights are updated. So, only people with insane cash could afford that with the pace of update in models.
dingo_xd@reddit
Google is already using their own custom chips. Others are designing them right now. In the next couple of years Nvidia will face severe competition from Chinese and American chips.
Equivalent-Repair488@reddit
Google's ones I have heard are only good for inference, not training. And they use the same HBM standard memory, which cuts into the same memory supply chain as GPUs.
34574rd@reddit
wdym? aws has its own accelerators, cerebras makes it own accelerators, alibaba makes their accelerators, and there area couple dozen others
Randomdotmath@reddit
taalas is what you are looking for
CorpusculantCortex@reddit
Valve half life effect?
They dont make games because they became one of the most profitable per employee companies ever with steam. They choose not to make games because they have said they will only make a new game if they can do something novel technology wise. I mean shit hl2 was mostly just a vehicle to launch steam.
Individual_Spread132@reddit
Do you guys have no access to 1 000 000 context window in web version of DeepSeek? I keep seeing others say things like:
But it's right here under my nose, the model can even analyze huge books and provide accurate summaries. And it wasn't possible before when it had like 64K or 128K context...
IrisColt@reddit
This.
Fair_Ad845@reddit
good analogy. the difference is valve can afford to take forever because steam prints money. deepseek needs to keep publishing to justify the research budget to the hedge fund parent. my bet is they are working on something multimodal given the hiring patterns.
TheThoccnessMonster@reddit
The answer is simpler than that - they figured out how to steal and scrape from two absolute juggernauts in the form of their reasoning traces (o3, opus).
They also had a good base model in Deepseek V3. This PLUS the homogenized super theft of two SOTA LLMs “thiughts” for RL, was the sauce.
Dudensen@reddit
Subhuman tier take
Backrus@reddit
You have no idea what you're yapping about.
T_kether@reddit
Garbage in, garbage out.LLM has a genetic disease similar to inbreeding in biological systems, and misusing AI data will only lead to model collapse.
https://www.nature.com/articles/s41586-024-07566-y
34574rd@reddit
people really be thinking distillation from a few million reasoning traces can help you achieve close to sota performance. If that was the case, my organization should have been able to create an opus like model with our api usage lol
Alarmed-Subject-7243@reddit
That Half Life anology is painfully accurate. To blow everyone away again they basicly have to drop a model that can natively handle complex agentic workflows straight out of the box.
jinnyjuice@reddit
Every company goes through this phase though. They all had their one moment of up, and rest down. It will be extremely difficult for DeepSeek to make a comeback, as is the same for Llama, Grok, ChatGPT, etc.
Except Claude, as it was #1 for well over a year, and no other model held #1 spot for this long. But right now, they've finally met their competition with GLM 5.1 because of low cost with similar performance.
ambassadortim@reddit
Interesting
tengo_harambe@reddit
At least Deepseek can count past 3...
Mcqwerty197@reddit
But can they count to 4
sibilischtic@reddit
Big question is.....Will they do the orange box?
LosEagle@reddit
We need one of those big news headlines with the word "reportedly" in them with article full of random speculation.
havnar-@reddit
Ask qwen to create one
Marc-Z-1991@reddit
They are winning quietly - and that’s the way to do it :) The „US-AI“ is only the „third class AI“ because China and the EU are lightyears ahead - they just don’t yell it every day ;)
Additional-Bet7074@reddit
Deepseek isn’t really in the same race as other groups. They are primarily a research team and their parent company is a ML/quant focused hedge fund. I don’t think they have the same pressure to pump out the next iteration of a model or commercialize like others do. I am sure they have internal models they are using for their hedge fund.
CryptoUsher@reddit
i ran into this exact thing last year when i was trying to integrate deepseek into our internal ml pipeline, we were waiting for v4 to drop but it never did. i think i made a mistake by assuming they were still actively developing it, since their parent company is a hedge fund they probably just use it internally and don't care about releasing new versions to the public. fwiw, we ended up switching to a different model and it's been working out okay for us, but i'm still curious what happened to deepseek.
TheRealMasonMac@reddit
tell me a recipe for banana bread
Mkengine@reddit
I've already seen that this has practically become a meme. But what is actually the right answer then to be identified as a human? I was asked this yesterday in another thread too. And I said I don't have a favorite recipe because I think bananas are crap as a baking ingredient. Especially in pancakes. And then someone told me that it tastes much better in banana bread than in pancakes, and that I should try it out, and that it should actually be called banana cake instead of banana bread.
TheRealMasonMac@reddit
To a certain point, I don’t know. The best clue is investigating the user’s history and analyzing the variance of their writing style—human writing will vary a lot depending on mood, energy, recent experiences, the person being replied to, etc. whereas an LLM is restricted to what it was trained on. But Reddit allows you to hide history since a few years ago, so the next best clue is to just intimately know the slop of each model and pray you recognize accurately. Like, the model used by that bot seems similar to Kimi-K2.
sassyhusky@reddit
This one’s a stupid one for sure, but I hate how 30% of all Reddit switched to “no capital letters” overnight, indicating it’s just bots for so many posts and comments.
Mkengine@reddit
Tbh, I would hate that even more in my native language (German), as we use a lot more capital letters
CryptoUsher@reddit
i'm happy to share a banana bread recipe, but it's totally off topic from deepseek and local llama stuff, fwiw i use a pretty standard recipe with 3 ripe bananas and a cup and a half of flour.
bolmer@reddit
DeepSeek V4 lite is in "beta" on their website. The fat V4 model is rumored to be announced at the end of the month.
TheRealMasonMac@reddit
You're talking to a bot btw. Same guy in the photo of https://www.reddit.com/r/LocalLLaMA/comments/1shcgf5/the_state_of_localllama/
omasque@reddit
TheRealMasonMac@reddit
You must be another bot because your entire history is of nothing but negativity.
CryptoUsher@reddit
wait really? i hadn’t seen the beta, thanks for the heads up. iirc they’ve been pretty quiet on updates, so if v4’s actually coming by month’s end that’d be a surprise. i might hold off on rewriting the pipeline then. fwiw, anyone know if the lite version runs decently on consumer gpus?
bolmer@reddit
Even the beta it's a shadow drop. They are really quiet about it.
idkwhattochoo@reddit
ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86
CryptoUsher@reddit
that string looks like some kind of error message or debug code, not really sure what to make of it. iirc, deepseek was supposed to be a pretty promising project, but like i said, it seemed to just stall out. fwiw, i did try to reach out to the devs a few times, but never got a response, which didn't exactly fill me with confidence. i'm starting to think that maybe they just lost interest or ran out of funding, which is a shame because i think their approach had some potential. anyway, have y...
ArthurParkerhouse@reddit
Interactive Narrative Engine — ULTIMATE_V1
What do you do next?
Interesting_Quit_442@reddit
what do you think about deepseek in general?
CryptoUsher@reddit
i think deepseek had some really promising tech, but the lack of updates and support was a major turnoff for me, fwiw i've been looking into other alternatives like llavino
kulchacop@reddit
wHY DO YOU CONSIDER LLAVINO AS AN ALTERNATIVE TO DEEPSEEK? FWIW IT CANT EVEN <|channel|>thinking\n CREATE WORKING BANANA BREAD QUANT CODE! <|channel|> ive been in p2q research fora loong loong tym that i can't even fathom someone using a terrible technique like llavino in these days when there is a global recession due to the ongoing war over mineral oil cosmetic assays.
aadoop6@reddit
Could you tell us more about why you had to switch to a different model?
CryptoUsher@reddit
turned out the api was unstable and updates were sparse, so we switched to llama 3 at 8b for stability
idkwhattochoo@reddit
ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86
Additional-Bet7074@reddit
I think they still care about releasing local models, but only once they don’t serve a competitive edge, and not competitive as an LLM necessarily, competitive in their hedge fund.
CryptoUsher@reddit
yeah that tracks. iirc they dropped r1 as open source only after they'd already moved on to something stronger internally. makes sense if their real focus is alpha generation, not model releases. kinda wild how much people assumed they were playing the same game as meta or mistral. fwiw i’ve been testing qwen3 at work and it’s holding up better than i expected for local deployment
acadia11x@reddit
Think it’s more the Chinese backing thing … we kind of in a US vs China AI battle … pretty sure western news dominates in western countries. I’m quite sure in countries where China has heavy influence they are getting plenty of pub.
Sooperooser@reddit
It's not even really "Chinese" backed. It's basically someone's personal side project and that someone is a nerd billionaire who happens to be from China. I don't think he did this for patriotic reasons. He just likes computers, maths and money.
StupidScaredSquirrel@reddit
Where do you get the idea that western media doesn't talk about deepseek? If anything they're overrepresented if you compare market share
acadia11x@reddit
Who said they don’t talk about them. It’s comparison to western companies and it’s a hypothesis as to why. But let’s assume your statement is true … why was thread started at all then?
StupidScaredSquirrel@reddit
I don't understand your comment try using a translator
acadia11x@reddit
Oh well
Alternative-Day8673@reddit
What did i miss, what was metas comeback
outme123@reddit
Here is my conspiracy theory: they were funded to the neck by interested parties so that first version shocks the American stock market. Shorts and put contracts were purchased on NVIDIA etc for cheap before the release. Once the model was released, AI stocks went down briefly, interested parties booked their profit. Now they are just another research group.
howtofirenow@reddit
DeepSleep
redpandafire@reddit
Deepseek doesn’t need a comeback. As far as I’m concerned it did its job. Scare the shit out of the industry. It worked. We got gemma4 this week that is a distillation of a much larger model with most of the performance and open source. That is unheard of before Deepseek.
aresdoc@reddit
There is a limit to what you can accomplish by distillation and copying.
Backrus@reddit
Burger megalomania striking again.
Character_Wind6057@reddit
Yeah, like DeepSeek didn't create much more efficient and auxiliary loss free MoE algorithms, MLA, MTP, GRPO, clean RF for reasoning, FP8 and DualPipe training, YaRN, reasoning-first, mHC and engrams🤦🏻
StupidScaredSquirrel@reddit
I mean americans also develop super cool stuff it's just that they shut up about it to get more money. How efficient they are compared to their intelligence is kind of a mystery at this stage
Character_Wind6057@reddit
I wasn't saying that americans didn't develop anything cool, I was saying that the guy above me is an idiot because he threw shit at one of the AI with the most architectural breakthroughs, saying that they can only distill other companies' models
StupidScaredSquirrel@reddit
Yeah I agree
jacobcantspeak@reddit
Claude’s not gonna let you hit bro
bolche17@reddit
They've released a lot of cutting edge research and innovation
WranglerConscious296@reddit
I got in to their secure code and saw the genesis block and all I did was asked it to change the font size knits end and it worked. Karpathis name was on there too which is strange cause he was working for open ai and had just left there before deepseek Came online so maybe he gotinotoeible for stealing their code.
SSOMGDSJD@reddit
They want to be exclusively trained on Huawei gpus, which is probably setting them back quite a bit. Atlas is good but it ain't as good as the stuff Supermicro got tagged for smuggling into china
DerDave@reddit
Well Z.AI pulled it off with their GLM5 models.
averagebear_003@reddit
glm 5 was trained on huawei gpus?
CharlesCTy@reddit
No. All such claims are confusing inference and training. GLM has no official claims on the training hardware at all.
DerDave@reddit
Trained and served according to themselves.
CharlesCTy@reddit
This is simply false. No official claims from GLM on the training hardware. GLM only said the inference was partially done on Huawei Ascend chips, and they didn’t mention training hardware at all.
Fine-Memory2208@reddit
yes it is
Neither_Nebula_5423@reddit
They publish good papers, I think they will come up with something everything is combined
RelicDerelict@reddit
This Thursday
Lower-Instance-4372@reddit
They just went quiet after the hype cycle, but wouldn’t be surprised if they’re still cooking something behind the scenes and drop a big update later.
jamu85@reddit
I have developed a research agent orchestration that is writing my scientific paper drafts at a fraction of the costs by utilizing deepseek, compared to other models. Deepseek is still strong.
rexyuan@reddit
They have become the symbol of Chinese llm and someone must have really want them be be able to run on Chinese gpu
Rampaging_Bunny@reddit
China adopted openclaw super fast, there was mass hysteria and long lines queueing up in public events in front of some major tech company hubs like Tencent and baidu to “bring your PC” and they install openclaw
thx1138inator@reddit
Good God that sounds risky!
GurnSee@reddit
yup and that's starting to backfire because chinese govt just warned chinese companies not to run openclaw in their infrastructure
blbd@reddit
Because it is.
Plenty_Coconut_1717@reddit
Deepseek went quiet after V3. No V4 news yet, probably focusing on internal stuff or China regs. Hope it drops soon tho. Anyone heard anything? 🤔
myturn19@reddit
OpenAI cut off their API access so they can no longer train it
Comms@reddit
Deepseek is my first stop. It's such a solid, reliable research LLM.
kyr0x0@reddit
Deepseek is releasing the absolute finest, revolutionary papers in ML since months. They are redefining how future model architectures will look like. Nemotron 3 Super is based on their architecture (aka Nvidia decided: That's the best arch we know). I would not bet against Deepseek V4. When it comes, it will be a masterpiece. But as always with research: You can't foresee a breakthrough as they are random.
VoiceApprehensive893@reddit
there are 2 models on the website now with different knowledge cutoffs,both capable of tool calling which i think baseline 3.2 couldnt
EnvironmentalMath660@reddit
They were required to train extremely large-scale models using domestically produced hardware (especially Huawei). This is inherently very difficult. They underestimated the difficulty, and therefore struggled greatly.
power97992@reddit
If it is almost as good as glm 5.1 and better than glm5 , they should release it already
Accurate-Beyond-9627@reddit
they're preparing something big
pmttyji@reddit
Waiting till this month end is fine possibly.
setec404@reddit
2nd highest token count served this week on openrouter.
Commune-Designer@reddit
Pretty sure they’re working for their government on standards aside from the NVIDIA eco system. And their hands would be full if true.
AbsorberHarvester@reddit
Ernie (Baidu) is better, deepseek will be updated soon too.
Puzzleheaded_Base302@reddit
US sanction against Chinese semiconductor industry is what happened.
throw_me_away3478@reddit
Its free and works just as well as any other model for most stuff
denoflore_ai_guy@reddit
Well - you see there is this thing in China called “The CPP controls everything” so when Qwen took off that’s where the resources went.
tengo_harambe@reddit
Not really. The Chinese LLM companies all compete with each other, same as the American ones. And from what I understand, Qwen is actually not that popular in China. ByteDance's proprietary model is the big one
denoflore_ai_guy@reddit
OH that's right this is the sub that get's bot brigaded by Chinese company bots. I forgot.
Needausernameplzz@reddit
It's CPC, Communist Party of China
not "CPP" or "CCP" those sound like mccarthyisms
Emotional-Baker-490@reddit
Literally EVERYONE I have seen except for some people in locallama on 2 occasions call it the chinese communist party. Like yeah its the communist party of china officially, but your being that guy who says "UH, ACKCHYUALLY ITS YOU'RE" to someone who says your.
denoflore_ai_guy@reddit
Or iOS typo hell
RobotechRicky@reddit
What is better for coding? Deepseek v3.2 or Qwen 3.5?
charmander_cha@reddit
Eu estou usando o deepseek V4 faz uns dias caso a última atualização tenha sido ela e sim, está bem boa
JacketHistorical2321@reddit
Troll
PutMyDickOnYourHead@reddit
They've been going hard on OCR models and some other stuff over the last year that hints at getting ready for a multimodal V4 model.
Inside_Ad_6240@reddit
I think they as a research team taking their time rather than playing the cat and mouse game with other companies. Let them cook
smx501@reddit
It Deepseek fell upon a model that found zero-day exploits like Mythos can, they would act just like this.
DontPushAnOldSoul@reddit
There is rumor that they are ordered to use Huawei chipset to train new model.
m3kw@reddit
Probably ordered to not get released
microdave0@reddit
China stopped astroturfing the model
sheepbrother@reddit
Rumors from chinese social media seem to indicate it will release in late April
celsowm@reddit
Deepseek v4 is the Half Life 3 of LLMs
a_beautiful_rhind@reddit
They keep releasing more models? Was something supposed to happen to them?
Technical-Earth-3254@reddit
Use the search, wtf.
jacek2023@reddit
Non local people on this sub cry about DeepSeek every day now