I'm glad we have deepseek
Posted by guiopen@reddit | LocalLLaMA | View on Reddit | 72 comments
other companies are slowly going away from open weight, not releasing base models, delaying open weight distribution, not releasing top models (this one I think is fair, but still), and I also noticed they stopped publishing research (old Gemma and qwen had detailed papers about the models training and characteristics, now it's replaced by blog posts and model cards)
Kimi (no base model for Kimi k2.5), GLM (no base model for glm 5 and 5.1), minimax (delayed open weights and problematic license for m2.7) and qwen (qwen 3.5 397B was open weight, 3.6 is not)
Meanwhile, deepseek keeps publishing mind-blowing research every month, release their base models, release the open weight as soon as the model is officially launched and explain model training and architecture in detail with a launch paper
They are extremely important in the field and are the ones pushing the technology and efficiency forward
Unfortunately they don't release small models, but we can't have everything can we?
Negative_Attorney448@reddit
These kinds of posts come off as propaganda posts.
cutebluedragongirl@reddit
Yeah...
Strange_Assignment87@reddit
If it is facts, vibes do not matter. You have to appreciate what they contribute. Keeping positive reinforcement is at least what we can do.
AnOnlineHandle@reddit
I may be an outlier, but I find blog posts and model cards infinitely easier to learn the important info from than a PDF document which is trying to pad out the text to fit a format.
Sometimes a simple block model diagram is worth 1000 awkward attempts at scrolling through a PDF with split columns text.
Strange_Assignment87@reddit
The reason research papers are more important is that once research papers are out, someone will post good blog posts and model cards from those papers. But you can't reverse generate research papers from the level of detail of most blog posts.
AnOnlineHandle@reddit
I've read probably hundreds of ML research papers at this point and honestly think most of the words are pointless. Usually it can be condensed down to a snippet of python code, maybe a block diagram, maybe a few dot points about the data, and a few graphs / previews.
a9udn9u@reddit
OpenAI and Anthropic are the companies which should release good small models, it would help them lure users and developers away from Chinese models, leave no market to Chinese labs. But they are too short sighted to do so.
minglu10@reddit
what is wrong with Chinese labs that you don't want any market left to them?
Equivalent-Costumes@reddit
Censorship and risk of political influence? IMHO, I would not blame OAI and Anthropic for this. It's the job of the USA government to fund the development of open models.
a9udn9u@reddit
There's nothing wrong with them, I'm just saying the two American companies are shortsighted for giving up the opportunity presented to them.
AdamEgrate@reddit
Open AI is definitely better than Anthropic in that regard, at least they have something open source. Anthropic believes open source is the root of all evil
a9udn9u@reddit
I bet all my money on Anthropic built their entire business and operations on open source software
Both_Opportunity5327@reddit
Hmm I did not know that CUDA was open source...
a9udn9u@reddit
Their business depends on software infrastructure, not just CUDA, and even CUDA itself is built using open-source software. NVCC, for example, is based on LLVM, which is open source.
Both_Opportunity5327@reddit
You said entire business built on open source, I gave you an example of software that we all know they use that is not open source.
CUDA is the most important software when it comes to training LLM's, we don't know what other software they pay licenses for.
Your whole premise is false,,,,
AdamEgrate@reddit
The CC leak proved they took things from Open Code
NinduTheWise@reddit
But that’s always the downside of open source even if you release a product that’s technically more advanced than closed versions the closed versions can just take stuff from the open source one
xamboozi@reddit
Well that just reinforces my decision to avoid paying insane money for repackaged open source
EvilGuy@reddit
I believe it. They are shady. When they need something like data or a prototype for their CLI open source is great but when it comes to giving literally anything back they are like... LOL.
drwebb@reddit
We need real open source labs, beating out China on it, OpenAI and Anthropic aren't going to be it.
ttkciar@reddit
We have AllenAI and LLM360, but of the two I only have faith in AllenAI to stick around in the long term.
Definitely we need more open source labs.
nullmove@reddit
Think I recently saw that AllenAI just lost few key people to Microsoft including former CEO, and co-lead of OlMo. Need to see how they cope.
ttkciar@reddit
Yep, that did indeed happen, though the guy who left was mainly interested in vision technology.
My impression was that he felt he was spending too much time on administrative tasks and grant-chasing, and not enough on actual vision R&D, and left for Microsoft so he could focus entirely on R&D.
If he was involved directly in R&D for Olmo or FlexOlmo or Olmix (the most interesting technologies coming out of AllenAI recently, IMO) I haven't been able to find mention of it. I think his contribution was administrative, but could be wrong.
ANTIVNTIANTI@reddit
it would've helped them so much to keep that going and create a community around that vs fucking sora lololol like, they'd have the best resource there for sure, I know for a fact that, well, not a fact but I'm almost positive that those major astro turfed apps and other massive updates within tooling and memory structures etc. come from here, I know we have brought those things up almost to the damn T when discussing our own kit. And since we know from the leak that Anthropic takes from open source without a damned second thought, it's not unlikely every sub regarding LLMs especially the necessarily innovative ones like ours, are crawled, nonstop, for good ideas and feedback etc. lol Like I'll remember having read something 4-6 months before OAI or Claude come out with it or some random boosted rando "already had a successful product" creates something like OpenClaw etc. lol. I may be projecting my annoyance at the lack of model releases from the major AI labs especially when WE PAID FOR IT ALL AND GAVE THEM ALL THE DATAS lolololol it's such a selfish shitshow really. lolololol
brother_spirit@reddit
As a counterpoint: you're currently buying inference off OpenAI and Anthropic for pennies on the dollar. The user is winning enormously right now. The labs get the data and models so they're winning. Who loses? All the losses will accrue into the IPO vehicle that floats to the market and get absorbed into everyone's retirement funds; ie your current AI & Anthropic are being financed in real time using loans from people who don't even know they're underwriting said loan
StillVeterinarian578@reddit
Anthropic and OpenAI don't sell to Chinese customers (I know that doesn't matter much in the context of open models), and there are 1 billion people in China with internet access - there will still be plenty of market in China for home grown models from their labs even if OpenAI and Anthropic flooded the market with small good models.
a9udn9u@reddit
There's a need for the best models available, in China too. There are plenty of Chinese companies and individuals using OpenAI and Anthropic's services.
kevinlch@reddit
they don't need to. they already have low compute issue, so they don't need more customers😂 what they really need is to "distill" out the free users and keep the "premium" ones lol.
Pleasant-Shallot-707@reddit
Don’t distill the free users, train up a 1.58 bit model for them and use the free users as a test bed for how to lower costs
Sal7_one@reddit
Why would we want to lure people away from Chinese models? If it's open source it's all the same to me. You people are ungrateful and always have an agenda.
Durian881@reddit
They are chasing profits and want developers to use their cloud models. That said, the gpt-oss models released by OpenAI are actually pretty decent.
random-tomato@reddit
I hope OpenAI does release a v2 of GPT-OSS; the 120B still feels to me like one of the strongest/fastest 100B-range models there are currently, even though it's been 9 months.
JinPing89@reddit
We are actually having a very good base model with Apache license, Trinity large base, the size is sota level, 399b a12b.
Wonder if any groups can utilize this. I meant there are a bunch of high quality datasets available on hf too. Theoretically if you put these open sourced ingredients into a cauldron, powered by good GPUs, you can cook a sota model? Just guessing.
NNN_Throwaway2@reddit
We're well past the point where good datasets can cut it. All the frontier labs have been training on 30T+ tokens for the past couple of years. What matters is the pipeline and the architecture. Truly and fully open source models comparable with the frontier we have right now are probably 5 years out at a minimum, if not 10 years plus.
teachersecret@reddit
Qwen 27b 3.7 is already nearly sitting at frontier performance.
I don’t think 5-10 years if going to be needed. It looks to me like we’re less than a year behind mainstream, and the gap has been tightening.
Go back one year and you’ll find models like gpt 4.1. (Released April 14, 2025). Do you think gpt 4.1 is better than qwen 27b?
Having used both I’d say this is closer to a sonnet 4.5, a model released in September 2025.
NNN_Throwaway2@reddit
Qwen isn’t open source. It’s open weight. Once alibaba stops releasing open weight models (which they will) local users will have no way to reproduce that level of performance.
teachersecret@reddit
Fair, but for now, we've got the weights and they should continue to exist. The future is weird, but at least we'll have decent at-home intelligence floating around I guess?
Pleasant-Shallot-707@reddit
lol
UpAndDownArrows@reddit
To be fair, when model sizes are in 300B-1.6T range, datasets of 30T+ tokens don't look nearly big enough.
power97992@reddit
Trinit large’s benchmark is quite low
guiopen@reddit (OP)
Arcee aí (trinity creators) are goated too and seem very dedicated to open weight
silenceimpaired@reddit
Who? Everyone here is talking like they are popular. :/ Have a hugggingface link?
korino11@reddit
https://huggingface.co/arcee-ai/Trinity-Large-Thinking
LagOps91@reddit
Yeah, I really like their work. Trinity still seems a bit undercooked, but with some further training it could be a top model in the size category. Love how they release different checkpoints for base models too.
ortegaalfredo@reddit
Deepseek listened to users releasing a model that can be run on relatively small systems (Deepseek-Flash), Qwen also listened to users by releasing a model so good that it competes with their own offering. Honestly they all are great and that's why I pay for their APIs.
Due-Memory-6957@reddit
I don't remember that. I remember it from day one not being very useful.
dampflokfreund@reddit
What do you mean, can be run on small systems. Not even prosumers have more than 128 GB RAM. Most people have 32 to 64 GB. A model consumer hardware could run is 35B A3b.
power97992@reddit
Most people have 4-24 gb of ram
ortegaalfredo@reddit
It should be able to run (barely) on a nvidia spark at Q3 once support arrives in llama.cpp.
Pleasant-Shallot-707@reddit
That sounds like a terrible experience. A 256GB M series max system is probably the most affordable solution that can run that
Pleasant-Shallot-707@reddit
Game over for a constantly moving target? lol. Also…OAI is not going to release open weights again (IMO) they’re under way too much of a financial burden to use resources on something that can’t get revenue and used resources to create.
InformationSweet808@reddit
True, but let’s not romanticize it—DeepSeek is great for openness, but they’re also playing a different game (research-first, not product-first). The real issue is everyone else locking down and not compensating with better transparency. If you’re closed, at least give proper papers and evals—not just polished blog posts.
Glad-Programmer-5505@reddit
Yeah true
_derpiii_@reddit
> deepseek keeps publishing mind-blowing research every month,
What's your favorite way to keep up with the research, and consume in a palatable way?
paul_tu@reddit
I'd add that they can benefit from that as their integration in mobile sector by Chinese vendors require some verification before pushing into the masses. And openness is a necessary thing for that.
That's why I guess they keep pushing the progress In addition to being just an outstanding team
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Daemontatox@reddit
deepseek's contribution isnt just the models , alot of people forget the kernels and repos they open source which are insanely helpful
KeikakuAccelerator@reddit
They straight up open sourced a new file system to squeeze more training. They are efficiency goats
zdy132@reddit
And used PTX to write more efficient libraries than the Nvidia provided ones. That's a level of grit that only quant trading engineers would posses.
CheatCodesOfLife@reddit
Daemontatox@reddit
Not all of us need gpt to reply and comment for us , maybe you have been using clawdslop for too long to know the difference
CheatCodesOfLife@reddit
You're absolutely right
jacobcantspeak@reddit
Out here nitpicking spacing 😭 get a life fr
Plenty_Coconut_1717@reddit
Real ones recognize DeepSeek is the last big open-weight hero left. Everyone else is slowly closing the door.
Aaaaaaaaaeeeee@reddit
What they choose to do with Engram is impactful in the long term.
This research would accelerate (Volatile) memory-free inference.
If I believe overtraining 8B will never match 300B due to the difference in parameter size, I don't actually want to compromise. I want that critical parameter size that takes me the farthest.
Trade computed parameters for lookup processes then you actually might run the 300B from disc at lightning speed.
I have hope researchers can push this forward to the absolute limit. In all interesting directions too like video generation.
ttkciar@reddit
I think it's fine, because we have some excellent smaller models from other labs (most recently Qwen3.6 from Alibaba and Gemma4 from Google), some of which do have base models (Gemma, Olmo, K2-V2).
What we need are good large teacher models to help train those smaller models, and we have a wealth of those -- GLM-5.1, Kimi-K2.5, Minimax-M2.7, and Deepseek4, most recently.
Our options for community builds of large models are limited, but not nonexistent. We're going to be blocked on hardware resources for a while (years!), and that gives us time to construct next-generation synthetic datasets via self-improvement/curation pipelines. It will also give us time to get practiced at federated training of medium-large models (120B class).
There's a lot of work to do before the community can tackle Opus-successor class models, but I think we're well equipped with the foundational models we will need to do that work.
dinerburgeryum@reddit
Literally my interest in local hosting is to be the intelligence/efficiency frontier. When folks do the most with the least that's where all the best stuff happens. DeepSeek 4 is open weight, which means inference will be relatively inexpensive, so generating task-focused reasoning traces will be relatively easy which we can distill into models that fit into video cards made for playing games. It's a pipeline, and DeepSeek is a critical part of it.
nuclearbananana@reddit
THe base model is the same one they released for K2, there was nothing to release
guiopen@reddit (OP)
Actually it's a different model, K2 was just the checkpoint, but in their launch paper they mention that the continued the pretreating with additional 15 trillion tokens of mixed test and images
nuclearbananana@reddit
Yeah continued pre training on the base model and the result was K2.5. I'm not sure what you want, some intermediate checkpoint between the released base and K2.5?
Lissanro@reddit
I think they are referring to the base model as it was before final instruct fine-tuning. Given 15T tokens of further pretraining it is unlikely that it all was instruct fine-tuning on the previous base model.
ReasonableBenefit47@reddit
Kimi is better