Why don't Groq (with a q) and Cerebras add new models

Posted by AccomplishedRow937@reddit | LocalLLaMA | View on Reddit | 15 comments

Both Groq and Cerebras haven't really updated their provided model for a while, long enough to notice the difference between old and new models on the market.

So why don't they add any new models? Qwen3.5 or Gemma 4 for example

[-]

Daemontatox@reddit

It isnt as easy as you think , they cant just use vllm or sglang or wtv implementation of the model , they have to understand the model architecture and flow from the ground up , literally how each part is loaded and port that into their WSE structure using their own custom language CSL which isnt easy an easy task nor an easy language, and they need to get it right themselves because if they face a bug they cant just use AI and poof problem solved like most kernel languages.

Also it needs to be extremely optimized because it needs to gap other providers because thats their whole thing (speed) and finally, its really an expensive task and if their models are selling well already , then no need to upgrade , its really based on their high paying customers needs.

[-]

Specter_Origin@reddit

Isn't groq now with nvidia ? They did used to add models till they got absobed.

[-]

ArthurOnCode@reddit

The new hybrid groq+nvidia hardware looks very promising, but I haven't seen anyting about the future of GroqCloud. One can only hope.

[-]

dametsumari@reddit

Essentially all relevant people and tech were acquired. What remains is not much. Due to that I lost hope on Groq going anywhere.

[-]

tamerlanOne@reddit

Sicuramente andrà avanti ma verrà cannibalizzato dalle scelte aziendali di chi ne possiede i diritti e verrà privato della sua anima autentica e genuina

[-]

t_krett@reddit

Exactly. Groq was bought by Nvidia, Cerebras was bought by OpenAI.

I imagine they weren't making huge profits by serving fast inference in the first place, now they are probably focusing on giving leverage to their parent companies, not selling a fast competitor model api.

[-]

dipittydoop@reddit

Not sure on details but there might be memory limitations on what they can provide with current gen inference chips. It might be they're both avoiding effort until they get the new chips in production.

That said if its possible I'd love to see GLM 5.1 on Cerebras or literally any update from Groq for newer models - the TPS is a huge selling point and right now best option for that kind of TPS is GLM 4.7 on Cerebras which is getting old.

[-]

sn2006gy@reddit

Do you know if Cerebras does KV caching on their APIs or do you just have to suck it up and pay 50 to 200/month for their coder access?

[-]

gh0stwriter1234@reddit

They already stream the models so that should not be the issue the wafer scale engine itself only has about 18GB of sram... which is probalby used for some model variables and context most likely.

The model itself is streamed to the chip over 400Gbps ethernet links from a server farm.

[-]

AccomplishedRow937@reddit (OP)

hopefully they start manufacturing those chips soon, I'm hoping for Kimi K2.5 or Qwen3.5

[-]

bick_nyers@reddit

In both cases they are hardware-constrained, and likely custom enterprise deals are paying more margin than consumer usage.

I'm concerned that Cerebras will eventually sunset the Cerebras Code subscription that I've been clinging onto.

In a world where my options are lose the speed + subscription entirely vs. pay double so they make similar margins to their enterprise deals, I would take the latter option.

There would be probably be more public backlash from a price increase versus a sunsetting which is a shame.

[-]

gh0stwriter1234@reddit

No need to? The pretty much added the models to demonstrate it and any customers they have can work with them to get what they want including fintunes and or lora and such. Even custom models.

[-]

AccomplishedRow937@reddit (OP)

interesting, wouldn't customers ask them to support smarter models?

[-]

harpysichordist@reddit

They may already give customers private access to models.

[-]

ttkciar@reddit

Cerebras is one of the sponsors of the LLM360 R&D lab. In that sense, K2-V2-Instruct and other LLM360 models are Cerebras models.