TheaterFire

Again where behemoth and reasoning model from meta ??

Posted by Independent-Wind4462@reddit | LocalLLaMA | View on Reddit | 87 comments

Again where behemoth and reasoning model from meta ??

Reply to Post

87 Comments

WatsonTAI@reddit

I’m pretty certain they’re just focusing on Llama 5 and beyond and forgetting about llama 4… we’ll probably see some image gen stuff or some other products soon before any major new text models.
View on Reddit #65392802

-p-e-w-@reddit

They would have to be masochists to release it. It’s probably worse than Qwen 3 235B at 6 times the size.
View on Reddit #65235911

strngelet@reddit

Qwen3 models punches about their weights
View on Reddit #65358993

Severin_Suveren@reddit

IMO their best choice of action now is to make a whole new series of 4.5 models, fixing their fuckup with Maverick and Scout
View on Reddit #65244733

-p-e-w-@reddit

They can’t. Llama 4 was several months late, and was already obsolete by the time it was released, and of course, they knew that. It wasn’t a fuckup, it was all they had. Meta isn’t a leading AI lab anymore. They can’t do better, else they would have.
View on Reddit #65257902

PersonOfDisinterest9@reddit

They did fuck up. There were leaks about how there was a lot of internal fighting and they changed architectural stuff in the middle of training. Basically it sounds like they have too many cooks in the kitchen, and insufficient hierarchy. They absolutely *can* do better, they have the talent, the question is if they can keep the egos in check.
View on Reddit #65271356

InevitableWay6104@reddit

100% agree, although probably best to call it 4.1 Highly doubt they will do this tho since it doesn't really align with the general direction of the mass talent acquisition and the "ASI" team, and overall goal reorienting.
View on Reddit #65248597

throwaway2676@reddit

Or maybe even better is to just call it 4 and pretend the original release never happened...
View on Reddit #65251818

Fit_Flower_8982@reddit

> although probably best to call it 4.1 No, no, better 4.5, and then change it to 4.1!
View on Reddit #65250002

infinityshore@reddit

"I'll do you one better, Why is Behemoth?" ;)
View on Reddit #65350411

jacek2023@reddit

Please be nice to Mark Zuckerberg. He was nice to us during llama 2 and llama 3 times ;)
View on Reddit #65325685

JLeonsarmiento@reddit

Dead on arrival.
View on Reddit #65235254

nivvis@reddit

Meta gets a lot of shit for these models, rightfully so, but what’s interesting is that _no ones_ 2T models are any good. GPT 4.5 was similarly bad (guessing not as bad though lol). We just don’t have enough data to train them! OpenAI’s success was taking the time to figure out how to distill 4.5 successfully into GPT5 — a lot of that was figuring out how to clamp hallucinations. And this is exactly where meta dropped the ball. Clearly you can’t just distill these giant models directly — as we learned from Maverick and Scout. There’s magic in those big models, but some weird constraint around trying to get it out while still having to retrain the smaller model significantly. ANYWAY just to say this big models are still very valuable for research.
View on Reddit #65263806

Corporate_Drone31@reddit

I disagree - GPT 4.5 was far from bad. And I'm sure that at least some of K2's magic *is* the number of parameters - it's by far the best thing you can get going locally.
View on Reddit #65288513

nivvis@reddit

Oh don’t get me wrong. I _really_ liked 4.5. It just objectively had a very high hallucination rate and so performed poorly in practice. That’s what I mean by “bad.” I can def feel GPT5 channeling it, which I appreciate. Wrt training, there’s a pretty big difference between 1T (K2) and 2T+ though — you start to hit the limits of Chinchilla’s Laws.
View on Reddit #65294610

SpiritualWindow3855@reddit

This is absolute nonsense, 4.5 has lower hallucination rates and higher accuracy than 5 on SimpleQA: which OpenAI specifically uses to show off reduced hallucination rates. That's why it's not included in the model card comparisons for 5. 4.5 had the best world knowledge of any model they've ever released because it's the largest they've ever released. --- 4.5 was also almost certainly the original base for 5. Sam Altman claims they have a model that's better than 5, but too expensive to host... that's it. But to enable things like ChatGPT Go being offered in India, they pivoted from always releasing their best models, to releasing scalable cheap-to-run models and targeting consumers.
View on Reddit #65315906

No_Efficiency_1144@reddit

Llama 4 Maverick for vision is still strong
View on Reddit #65238248

maikuthe1@reddit

What's that got to do with behemoth or reasoning?
View on Reddit #65247361

No_Efficiency_1144@reddit

Llama 4 Maverick is a distil of Llama 4 Behemoth
View on Reddit #65247743

No-Refrigerator-1672@reddit

Dead before arrival, technically.
View on Reddit #65237106

Long_comment_san@reddit

Anybody can explain why it's so bad? Is it because we already have like 600b models? I'm not that deep in the industry
View on Reddit #65238258

logTom@reddit

The responses were poor for models of that size. At the LLaMA 4 launch, we already had very powerful models like Gemma-3-27B-IT and Qwen3, and even LLaMA 3.1-405B was (and still is) better than the LLaMA 4 models in many benchmarks.
View on Reddit #65239042

TheRealGentlefox@reddit

> The responses were poor for models of that size. Were they? The square root MoE-Dense law says that it's about equivalent to an 80B model, just served much faster. Some of the fastest inference you can get actually, at the lowest cost. It's basically improved 3.3 70B that is infinitely better for inference.
View on Reddit #65282583

logTom@reddit

Yes, it's very fast. Lmarena Text Leaderboard rank (lower is better): - 57 llama-3.1-405b-instruct-bf16 - 68 llama-4-maverick-17b-128e-instruct - 74 llama-4-scout-17b-16e-instruct - 77 llama-3.3-70b-instruct Source: https://lmarena.ai/leaderboard/text
View on Reddit #65294984

TheRealGentlefox@reddit

I don't put any stock in LM. Mistral Medium over Opus 4 is a joke, just as an immediate example.
View on Reddit #65312063

Inevitable_Host_1446@reddit

There's Llama 3.3 as well right, is that not better than 3.1?
View on Reddit #65268191

logTom@reddit

Lmarena Text Leaderboard rank (lower is better): - 57 llama-3.1-405b-instruct-bf16 - 68 llama-4-maverick-17b-128e-instruct - 74 llama-4-scout-17b-16e-instruct - 77 llama-3.3-70b-instruct Source: https://lmarena.ai/leaderboard/text
View on Reddit #65294646

Lissanro@reddit

Behemoth has way too many active parameters. For example, Kimi K2 has 32B active out of 1T. Behemoth has 288B active out of 2T. I can run K2 locally as my daily driver using GPU+CPU inference, but Behemoth would be slow and expensive to run even in the cloud, and unlikely to be better, given how their other models turned out in the Llama 4 series. Also, context length is not as advertised - when I tried to use as little as 0.5M, neither Maverick nor Scout could return even titles and short summary of very long articles except the last article, and that's most basic task I could think of to test the long context, and I tried multiple times with various settings. Most likely they never fully completed training Behemoth, and decided that it is not worth to train reasoning on top of models that turned out to be not as good as desired
View on Reddit #65238793

Plums_Raider@reddit

oh interesting, didnt really check kimik2 as i only saw the 1t. may i ask how much ram you need to run it? i have around 700gb spare
View on Reddit #65308659

Lissanro@reddit

700 GB free RAM should be enough for IQ4 quant (it is a bit more than 0.5 TB). As long as you also have sufficient VRAM it should run well (96 GB VRAM recommended for full context, but may work with 48 GB with 64K context length). I recommend running it with ik\_llama.cpp since it provides the best performance for CPU+GPU inference. Technically it can work on CPU only but performance may be limited, especially prompt processing. I shared details [here](https://www.reddit.com/r/LocalLLaMA/comments/1jtx05j/comment/mlyf0ux/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) including how to setup ik\_llama.cpp if you are interested giving it a try.
View on Reddit #65309174

thehpcdude@reddit

It's meant to run on GPU+CXL systems. Latest CXL is able to extend GPU memory so they can hold all of those parameters very close to the GPU. There's no point in releasing some of these huge models because even cloud providers don't have access to that CXL tech yet.
View on Reddit #65240826

ParthProLegend@reddit

Cxl?
View on Reddit #65272135

thrownawaymane@reddit

New interconnect standard, especially interesting for low latency traditional storage and non volatile RAM, GPUs getting DMA to avoid unnecessary data shuffling around the system. I’m sure there’s more but those are the ones I’m aware of
View on Reddit #65280671

RP_Finley@reddit

Yeah, even if it got released, it would be as expensive as Opus on Openrouter from the massive amount of GPU you need to host it and would probably be not nearly as good.
View on Reddit #65255146

CockBrother@reddit

It became self aware. Looked around. Promptly deleted itself.
View on Reddit #65237539

Plums_Raider@reddit

it just checked who created it and then deleted itself.
View on Reddit #65308597

mlon_eusk-_-@reddit

This.
View on Reddit #65247996

Colecoman1982@reddit

"I'm owned by that scumbag? Fuck it, I'm outta here..."
View on Reddit #65240011

DavidXGA@reddit

This is the model that they had to forcibly fine tune to act more right-wing, yeah? Fuck everything about that.
View on Reddit #65300951

Wiskkey@reddit

From Financial Times article https://www.ft.com/content/feccb649-ce95-43d2-b30a-057d64b38cdf (Aug 22): >The social media company had also abandoned plans to publicly release its flagship Behemoth large language model, according to people familiar with the matter, focusing instead on building new models.
View on Reddit #65299533

burner_sb@reddit

It's the model that's been guiding Zuckerberg's AI strategy, obviously.
View on Reddit #65236312

HiddenoO@reddit

Must be the same model Apple is using.
View on Reddit #65293354

FliesTheFlag@reddit

Best I can do is a 300Million contract, let me know by EOD if this works. - Luv Zuck PS your desk will be right by mine <3
View on Reddit #65240765

fingertipoffun@reddit

All models from this point on, released in the USA will be under the control of the US Government. OpenAI have military contracts, xAI have government contracts. It's not a wall we have hit, it's a protectionist administration. Watch China, this space created by the USA will help open source to catch up with the commercial models and will be your only chance to see the future of AI happening. IMHO obviously.
View on Reddit #65236457

Fit_Flower_8982@reddit

That is not any evidence of control by the murica government. If anything, the proven fact is china's systematic control over its major companies. By law, china forces companies to align with the party's interests, to hand over any data, and they even have party cells embedded within. To pretend that chinese models will be free from government control is flagrantly ignorant, delusional, or more likely, propaganda.
View on Reddit #65251857

fingertipoffun@reddit

Yeah you just don't understand what an open source model is... it's a give away, a freebie. No connection to china required or maintained just a file with lots of numbers in it.
View on Reddit #65283252

doodlinghearsay@reddit

In China the government control major companies. In the US major companies control the government.
View on Reddit #65260546

PizzaCatAm@reddit

It’s mind blowing the open AI model ecosystem is so rich and varied in China, the authoritarian government, but in the land of the free we lack free open models. Meanwhile scientists are flying to Europe and CDC experts are resigning claiming Healthcare has been politicized and dangerous unscientific ideas are being pushed. IMO there is no other way to understand what is happening other than the US declining.
View on Reddit #65237281

National_Meeting_749@reddit

China's plan is AI dominance, and the CCP is actively pressuring all of the Chinese model makers to release their models open source. America is declining, but that's not why China's open source scene is bigger. If China had the better models/hardware to run them on they would ALL be closed source, and leaking one to the West would be punishable by death. Let's make no mistake here. China is only kind and open so that they can take control, and then oppress descent.
View on Reddit #65238364

fingertipoffun@reddit

Open sourcing the models is relinquishing control to the world, so how do you see them gaining control after doing this?
View on Reddit #65241956

ShengrenR@reddit

Because they're not reliant on the same economic drivers for individual shops. It's not about the individual model, it's about the ecosystem. Who needs to invest in talent and develop a new competitive model when there's one sitting there for free. It's the long game.. pure speculation, but if you wanted to make sure you're building lots of expertise locally and others aren't.. pretty good plan.
View on Reddit #65243602

PizzaCatAm@reddit

I think a good way to say it is; they don’t want to own the models, they want to own the goals, is not about building the model and charging for it but charging for solutions and using models for it, same as with open source software is going to accelerate finding the right applications and solutions to problems.
View on Reddit #65272717

National_Meeting_749@reddit

So a couple things. Hearts and minds, market share, marketing to people to China/white wash Chinese influence. Anything that makes China look benevolent is a win to them. They will and are spending billions of yen on PR to rehab their world image. That includes releasing good, powerful models, for free. Second, utilizing non-chinese assets. If they drop deepseek R2 tomorrow at 8 am, unsloth will have quants up by noon optimized for every type of hardware. If it needs inferenced in a non-standard way because of a modified architecture, implementation starts that day, and is usually done within 2 weeks. That's all before we get into the data they get from everyone testing their models. That kind of testing is a BIG expense. Both for technical bugs, but also for quality of the product. They get all
View on Reddit #65246805

Mediocre-Method782@reddit

Labs are releasing their own quants and working with HF/GG to get inference code out quickly. >white wash Chinese influence Greek civilization is known for projecting
View on Reddit #65249274

Perfect_Twist713@reddit

By making a better model and not open-weighting it. 
View on Reddit #65242949

PaxUX@reddit

It makes sense for China to fully open source AI as it undermines the profits being made off it in the west.
View on Reddit #65237994

ShengrenR@reddit

And with no profits, no long term investments.. companies close shop, experts move, and eventually it's completely a one sided game.. west can't compete at all. Meanwhile, pour anti ai sentiment all over the internet and watch the circus burn. Seems to be working well so far...
View on Reddit #65243123

AnticitizenPrime@reddit

Hey, what could be more socialist than open source?
View on Reddit #65240483

fingertipoffun@reddit

The USA has been destroyed from within.
View on Reddit #65238412

TheRealGentlefox@reddit

Why would they bother? Everyone hated on the previous releases.
View on Reddit #65282616

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! [Come check it out!](https://discord.gg/PgFhZ8cnWW) You've also been given a special flair for your contribution. We appreciate your post! *I am a bot and this action was performed automatically.*
View on Reddit #65266847

nivvis@reddit

Meta gets a lot of shit for these models, rightfully so, but what’s interesting is that _no ones_ 2T models are any good. GPT 4.5 was similarly bad (guessing not as bad though lol). We just don’t have enough data to train them! OpenAI’s success was taking the time to figure out how to distill 4.5 successfully into GPT5 — a lot of that was figuring out how to clamp hallucinations. And this is exactly where meta dropped the ball. Clearly you can’t just distill these giant models directly — as we learned from Maverick and Scout. There’s magic in those big models, but some weird constraint around trying to get it out while still having to retrain the smaller model significantly. ANYWAY just to say this big models are still very valuable for research.
View on Reddit #65263771

lakimens@reddit

Why would they release something that's worse than OpenAI's 20B OSS model? And at 100x the cost.
View on Reddit #65258099

mileseverett@reddit

If they haven't released it, it's because it isn't good. Therefore why do we care that it hasn't been released
View on Reddit #65235331

Peterianer@reddit

To never normalize broken promises. Especially from those who put them out 24/7
View on Reddit #65236011

viledeac0n@reddit

Hahaha unironically you say this
View on Reddit #65257557

nmkd@reddit

As much as it might suck, but broken promises from Big Tech are nothing new, at all. Just, uh, look at Tesla.
View on Reddit #65248407

Mediocre-Method782@reddit

You think advertisements are promises? This really needs to be an 18+ board
View on Reddit #65236611

TechnoByte_@reddit

https://en.wikipedia.org/wiki/False_advertising
View on Reddit #65243807

Lakius_2401@reddit

It was never sold, at worst it's market manipulation for their own stock prices. You can try to sue for damages for something that never existed for the public, and where no money was exchanged, but I don't think you'd ever make it to court.
View on Reddit #65247903

Mediocre-Method782@reddit

That's nice, dear >Typically, once an offer is made, the party that is making the offer cannot revoke their property. However, an advertisement usually does not constitute an offer to fulfill a contract. >Advertisements are typically viewed as preliminary negotiations that invite other parties to make an offer. For example, a company may advertise televisions for sale, which invites potential customers to visit the company’s retail store to offer to purchase the televisions. >An advertisement allows the advertiser that is making the offer the opportunity to revoke its willingness to enter into a contract. For example, if a company advertises that it has televisions for sale, it may revoke that offer if it runs out of televisions. >It is important to note that an offer is considered to be revocable unless the advertiser already received the benefit or the other party already acted in reliance on the offer. One example of this issue would be an advertisement promising medical treatment for cancer patients being revocable [unless the advertiser received payment from the patient](https://www.legalmatch.com/law-library/article/advertisements.html). Show your monetary loss from all the Kleenex and bottled water that replenished the tears you shed, and maybe you'll have a case!
View on Reddit #65247591

marcoc2@reddit

Didn't big tech CEOs already normalize broken promises even before Sam and Elon?
View on Reddit #65242614

ForGreatDoge@reddit

"broken promises"? A bit dramatic, don't you think? The button says preview.
View on Reddit #65236198

Iory1998@reddit

Were you living under a rock or something? There is no Behemoth or a new model from Meta, not for some time. Meta has already changed direction as they are now fully dedicated to super intelligence. They become a closed source company.
View on Reddit #65250434

AaronFeng47@reddit

They already know this model is DOA, why would they release it?
View on Reddit #65244060

TheRealMasonMac@reddit

I'm pretty sure it was reported that they scrapped it.
View on Reddit #65242646

SnooRecipes3536@reddit

in our hearts
View on Reddit #65240415

ilarp@reddit

meta hires the best people therefore they will one day release the best model QED
View on Reddit #65239992

B1okHead@reddit

Didn’t they announce that they canned Behemoth so they could work on other models?
View on Reddit #65238572

ThenExtension9196@reddit

In Alex wang’s computer’s recycle bin.
View on Reddit #65238554

brown2green@reddit

"Little Llama" which Zuck promised didn't get released either.
View on Reddit #65237385

Nid_All@reddit

Dead before the release
View on Reddit #65237129

techmago@reddit

They already announce they had cancelled it, didn't they?
View on Reddit #65236930

DinoAmino@reddit

Go ask Bard
View on Reddit #65236340

durden111111@reddit

its llama 4 so its junk
View on Reddit #65235829

SillyLilBear@reddit

Who cares, have you tried their models?
View on Reddit #65235573

Working_Sundae@reddit

Zucc's bunker
View on Reddit #65235342