Why don't Groq (with a q) and Cerebras add new models
Posted by AccomplishedRow937@reddit | LocalLLaMA | View on Reddit | 15 comments
Both Groq and Cerebras haven't really updated their provided model for a while, long enough to notice the difference between old and new models on the market.
So why don't they add any new models? Qwen3.5 or Gemma 4 for example
Daemontatox@reddit
It isnt as easy as you think , they cant just use vllm or sglang or wtv implementation of the model , they have to understand the model architecture and flow from the ground up , literally how each part is loaded and port that into their WSE structure using their own custom language CSL which isnt easy an easy task nor an easy language, and they need to get it right themselves because if they face a bug they cant just use AI and poof problem solved like most kernel languages.
Also it needs to be extremely optimized because it needs to gap other providers because thats their whole thing (speed) and finally, its really an expensive task and if their models are selling well already , then no need to upgrade , its really based on their high paying customers needs.
Specter_Origin@reddit
Isn't groq now with nvidia ? They did used to add models till they got absobed.
ArthurOnCode@reddit
The new hybrid groq+nvidia hardware looks very promising, but I haven't seen anyting about the future of GroqCloud. One can only hope.
dametsumari@reddit
Essentially all relevant people and tech were acquired. What remains is not much. Due to that I lost hope on Groq going anywhere.
tamerlanOne@reddit
Sicuramente andrà avanti ma verrà cannibalizzato dalle scelte aziendali di chi ne possiede i diritti e verrà privato della sua anima autentica e genuina
t_krett@reddit
Exactly. Groq was bought by Nvidia, Cerebras was bought by OpenAI.
I imagine they weren't making huge profits by serving fast inference in the first place, now they are probably focusing on giving leverage to their parent companies, not selling a fast competitor model api.
dipittydoop@reddit
Not sure on details but there might be memory limitations on what they can provide with current gen inference chips. It might be they're both avoiding effort until they get the new chips in production.
That said if its possible I'd love to see GLM 5.1 on Cerebras or literally any update from Groq for newer models - the TPS is a huge selling point and right now best option for that kind of TPS is GLM 4.7 on Cerebras which is getting old.
sn2006gy@reddit
Do you know if Cerebras does KV caching on their APIs or do you just have to suck it up and pay 50 to 200/month for their coder access?
gh0stwriter1234@reddit
They already stream the models so that should not be the issue the wafer scale engine itself only has about 18GB of sram... which is probalby used for some model variables and context most likely.
The model itself is streamed to the chip over 400Gbps ethernet links from a server farm.
AccomplishedRow937@reddit (OP)
hopefully they start manufacturing those chips soon, I'm hoping for Kimi K2.5 or Qwen3.5
bick_nyers@reddit
In both cases they are hardware-constrained, and likely custom enterprise deals are paying more margin than consumer usage.
I'm concerned that Cerebras will eventually sunset the Cerebras Code subscription that I've been clinging onto.
In a world where my options are lose the speed + subscription entirely vs. pay double so they make similar margins to their enterprise deals, I would take the latter option.
There would be probably be more public backlash from a price increase versus a sunsetting which is a shame.
gh0stwriter1234@reddit
No need to? The pretty much added the models to demonstrate it and any customers they have can work with them to get what they want including fintunes and or lora and such. Even custom models.
AccomplishedRow937@reddit (OP)
interesting, wouldn't customers ask them to support smarter models?
harpysichordist@reddit
They may already give customers private access to models.
ttkciar@reddit
Cerebras is one of the sponsors of the LLM360 R&D lab. In that sense, K2-V2-Instruct and other LLM360 models are Cerebras models.