Why doesn't deepseek release a smaller air model? Because they are focused at research?
Posted by power97992@reddit | LocalLLaMA | View on Reddit | 13 comments
Why doesn't deepseek release a smaller air model like a 120b A10b MoE model or a 32b dense model? It seems like they are mainly focused in research and doesn't frequently release small models unlike GLM and qwen
Spacefish008@reddit
V4 flash out now :)
power97992@reddit (OP)
Yeah, i tried it already , still too large to run locally
ps5cfw@reddit
I have absolutely 0 source on this, but as far as I've understood what is holding back DeepSeek massively is that they tried (or were forced to possibly) to use Huawei chips instead of nvidia's and it backfired spectacularly because those chips are not ready for the task, apparently.
So they're probably trying to do whatever takes the minimum amount of effort right now, and making a new model from scratch is not a minimum amount of effort, at all.
Ok_Warning2146@reddit
They are using llm to score political points not profits, so using Huawei chips is voluntary for them.
Pvt_Twinkietoes@reddit
Why does deepseek even release anything? They're a hedge fund.
Ok_Warning2146@reddit
Their founder was able to meet Chairman XI after they released DS. This is a big.deal in China that can be translated into $$$
elbiot@reddit
Small models are more expensive to train
Iory1998@reddit
Why don't you ask them directly on their blog or twitter account?
xxPoLyGLoTxx@reddit
More importantly, where is GLM-4.6-Air? Just saying.
ttkciar@reddit
Yeah, this!!
I've literally got disk space set aside for it, for like a month now!
createthiscom@reddit
I think they’re actually focused on competing with openai’s frontier models and claude and google’s frontier models. I don’t think they have time to do all of that AND give the world tiny models.
dark-light92@reddit
Deepseek is a pure research lab. It doesn't seek profits and thus doesn't need to optimize for different market segments or inference costs.
They released many of smaller models earlier before v3. But their smaller models were generally experiments, trying to prove hypothesis and tuning up the model architecture before scaling it up to V3. It seems they have managed to retrofit DSA on top of v3 model architecture and thus haven't needed to train a smaller model for a long time.
Maybe when they develop a new architecture, we'll again get smaller models.
StardockEngineer@reddit
Actually I would not be surprised to see such a thing in the next 6 months