Why doesn't deepseek release a smaller air model? Because they are focused at research?

Posted by power97992@reddit | LocalLLaMA | View on Reddit | 13 comments

Why doesn't deepseek release a smaller air model like a 120b A10b MoE model or a 32b dense model? It seems like they are mainly focused in research and doesn't frequently release small models unlike GLM and qwen

[-]

Spacefish008@reddit

V4 flash out now :)

[-]

power97992@reddit (OP)

Yeah, i tried it already , still too large to run locally

[-]

ps5cfw@reddit

I have absolutely 0 source on this, but as far as I've understood what is holding back DeepSeek massively is that they tried (or were forced to possibly) to use Huawei chips instead of nvidia's and it backfired spectacularly because those chips are not ready for the task, apparently.

So they're probably trying to do whatever takes the minimum amount of effort right now, and making a new model from scratch is not a minimum amount of effort, at all.

[-]

Ok_Warning2146@reddit

They are using llm to score political points not profits, so using Huawei chips is voluntary for them.

[-]

Pvt_Twinkietoes@reddit

Why does deepseek even release anything? They're a hedge fund.

[-]

Ok_Warning2146@reddit

Their founder was able to meet Chairman XI after they released DS. This is a big.deal in China that can be translated into $$$

[-]

elbiot@reddit

Small models are more expensive to train

[-]

Iory1998@reddit

Why don't you ask them directly on their blog or twitter account?

[-]

xxPoLyGLoTxx@reddit

More importantly, where is GLM-4.6-Air? Just saying.

[-]

ttkciar@reddit

Yeah, this!!

I've literally got disk space set aside for it, for like a month now!

[-]

createthiscom@reddit

I think they’re actually focused on competing with openai’s frontier models and claude and google’s frontier models. I don’t think they have time to do all of that AND give the world tiny models.

[-]

dark-light92@reddit

Deepseek is a pure research lab. It doesn't seek profits and thus doesn't need to optimize for different market segments or inference costs.

They released many of smaller models earlier before v3. But their smaller models were generally experiments, trying to prove hypothesis and tuning up the model architecture before scaling it up to V3. It seems they have managed to retrofit DSA on top of v3 model architecture and thus haven't needed to train a smaller model for a long time.

Maybe when they develop a new architecture, we'll again get smaller models.

[-]

StardockEngineer@reddit

Actually I would not be surprised to see such a thing in the next 6 months