Guys real question where llama 4 behemoth and thinking ??

[-]

typeryu@reddit

just my hunch, but they probably were disappointed with the results and have sent it back for pre-training. There is apparently a ceiling we are reaching where we are running out of high quality data to train on and increasing parameter sizes are having diminishing returns. Same thing happened with GPT-5 which at this point has also been delayed for a while and rumor has it 4.5 was originally 5, but the disappointing results warranted a downgrade.

[-]

No-Refrigerator-1672@reddit

I would like to argue that there isn't a ceiling, there are bad algorithms. Latest gen of models are all trained on ~10T tokens. If we assume that 1 token is roughly 1 word; an adult human can read 200 words per minute, then model training is an equivalent of non-stop reading for 95 thousand years. That quick math highlits a big problem with our AI: it's basically untrainable, any real-life organism trains way faster, ehich means that a better learning algorithm exists. Maybe this "ceiling" will encourage researchera to start looking for it, instead of utilizing the same 60 years old equation with just more compute.

[-]

Long_Pomegranate2469@reddit

This is discounting all the tokens coming in from things other than reading.

Watching someone do something is probably hundreds or thousands of tokens a second .. which includes continously reinforcing your whole body motions, all senses, etc...

[-]

No-Refrigerator-1672@reddit

Which is irrelevant, cause there are tons of tasks that don't require other hjman capabilities. I.e. how long would it take a human to get a masters degree in hustory vs how much tojens an lln would need? The comparison still won't be in favour of LLM by a long shot.

[-]

Bakoro@reddit

A human brain is genetically primed to learn things.
You can't discount the billions of years of evolution that went into the human brain.
Think about the complicated things animals do by instinct, they get that "for free" because of evolution.

An AI model is starting from almost nothing.

A human can read every book on football, know all the rules of football, and learn to do the calculations for the physics of football.
No amount of reading will make a person good at football. A required component of getting good at football is to practice football, and football related physical skills.

I can read a math text book and regurgitate math theory, that doesn't mean that I can actually do the calculations. I can read the calculations a few times, and still not be able to do them, because I haven't actually done the work myself.

LLMs read a lot, that doesn't mean that they have done the work.

This is part of why recently reinforcement learning has been a huge topic.
Humans get nearly two decades of constant reinforcement learning, and it never really stops throughout your life.
What would happen if we have an LLM twenty years worth of reinforcement learning?

LLMs don't have the free floating self direction and self reflection of humans, once they are in production they are crystalized until further training happens. It's been observed that LLMs can learn and get smarter during operation in a kind of sweet spot where they have some context. How often are they getting to synthesize that? How often are people going back to those topics after synthesis?

There very well may be better learning algorithms. There very well may be better neural structures. We still haven't pushed the current tech as far as it can go yet.

[-]

No-Refrigerator-1672@reddit

I apologise in advance for my formatting, I'm writing from phone and can't nicely put in citations.

Well, first, an "ai model" is not starting from nothing. Just like the brain, it has some predefined structure. Granted, the architecture and complexity of those two structures are very different; but just like the LLM, the brain starts with useless weights (coefficients that govern neuron activation patterns), and nudges said coefficients in the right direction during training.

The football example is not correct. Your main error is that you've mixed task domains: text reading with 3d world actioning. The correct way would be compare how much token does an LLM needs to get a PhD in some science (i.e. math) vs the amounts of words that a brain need to read to get the same PhD in math. I assure you, a brain will win this comparison by a big margin. We do have AIs that are trained for robotics, that are intended to deal with physical world domain; and those AIs take thousands of aimulated hours of time to learn how to fold clothing for storage. A brain can learn it in an hour, or even 1-shot it. The efficiency of the training is vastly different. I am not avare of any AI system trained to play football, but I assure you, if one would be created today, it would've taken hundreds of simulated years of football to reach the level of professional athlete.

You talk about reinforcement learning, but you still skip my point about the token count/throughput. 20 years of human RL is how much tokens? Well, there's no distinct way to compute that, but I can provide you an upper boundary: if you assume that's non-stop reading, then it's an amout of learning comparable to 2B words. That's start to finish, no pretreining is happening before the birth. Is there at least a single LLM that can reach a young adult level of intellect within 2B of training tokens? No, you'll be lucky if it can compose a cohesive 5-word long sentence.

[-]

AppearanceHeavy6724@reddit

That is not an issue of algorithm, it is an issue of hardware 100 milliwatt cat brain outperforms best robots at equilibrium maintenance, spatiotemporal reasoning and ability to read the others species body language.

[-]

No-Refrigerator-1672@reddit

The power efficiency has nothing to do with training efficiency. My point is that no human being requires 96 thousand years of uninterrupted reading to learn the things an LLM learns. Granted, their overall knowledge is wider that a regular human's, but still, we can learn from a thousand times less information by very conservative estimations, which means that LLM training algorithm is a piece of garbage.

[-]

AppearanceHeavy6724@reddit

You are missing the point. First of all we have completely different network, which comes pre-configured when we are born; there millions of years of evolution that shaped our brains, compared to the tabula rasa untrained LLMs are. Not only that, it is not clear if we are doomed to use with modern digital ai hardware bsckpropagation and there is "no clearly possible" ways to train modern AI much more efficiently than what we have now. Everything you just stated sounds obvious to you, but it is not in fact obvious at all. Every word of your comment requires some evidence but none given.

[-]

No-Refrigerator-1672@reddit

The evidence is in the first comment. A human brain requires significantly less data to learn, which means that significantly better learning algorithm exists. Yes, the structure of the brain's connections is significantly different: but my counterargument would be that the only way how brain can learn a thing is by adjuating the weights between neurons and activation thresholds, which is fundamentally the same as LLM, and the brain still can learn a thing given only a single example, which means that it can 1-shot it's parameters, which again means that a better learning algorithm exists. If you can disprove my concluaions, I would be very interested to read it.

[-]

AppearanceHeavy6724@reddit

No, we have no idea how brai works on thing is clear us that is has almost nothing in common with ANN. You are absolutely ignoring the physical difference between analog architecture of brain and digital structure of llms. Something easily done in analog domain can be absolute pain in ass fir digital system. Analog systems are capable of slow but very wide chemical signaling which enables huge bandwidth information passing - neurotransmitters, hormones etc. Also, even if we buy jnto your idea that we can train a digital ANN more efficiently just because human brain is very trainable, you need to accept the fact that the current layer based GPT llms have absolutely nothing to do with human structure of human brains, and their structural constraints may as well prevent finding anything better than backdrop. I mean seriously, do you have any single idea how anything can be better than backprop on classic multilayer llm ffns? I am all attention (no pun intended).

[-]

No-Refrigerator-1672@reddit

All that I said is fundamentally dependent on an assumption that "a brain is governed by a set of coefficients"; I feel like it is this assumption that you're arguing against, but then you need to prove that a neuron can't be approximated as a mathematical function. I do agree that this "neuron function" is utterly complex and is completely unknown at this moment of time; but I insist that it exists. I'm certain that, if all other possible abstractions fail, we can describe a neuron as a set od chemicals and their coordinates, and we can describe every chemical reaction inside the neuron as an equation, and construct the "neuron function" this way. I would also clarify that I don't mean this is doable in the foreseeable future, what I mean that this action is fundametally possible. After all, nature always follow physics laws, and physics is nothing but a set of equations and an istruction of how to combine them. And if it the "neuron function" exists, then it's governed by a set of coefficients, then the training is just a process of adjusting the coefficients, then a better training algorithm exists. If I knew what this better algorithm looks like I would be a billionaire, but I feel like my chain of thoughts is solid enough to prove that there is a better algorithm.

[-]

florinandrei@reddit

My guess is - the ceiling is baked into the current model paradigm. Even if you're training them on infinite amounts of internet text, you're still on the shallow arm of the asymptote.

[-]

AppearanceHeavy6724@reddit

Purely theoretically it is probably not true - infinitely large llm with infinite training will probably act as enormous lookup table (if ran at t=0). Or you might be right, there is a theoretical ceiling indeed.Anyways, there is an obvious practical ceiling and we are almost there.

[-]

lly0571@reddit

I personally want Llama4-Thinking. The performance of the existing Llama4-Maverick (400B-A17B) is generally acceptable, being roughly on par with GPT-4o-0806. With appropriate offloading, you can get some t/s on consumer-grade hardware (for example, a PC with 4x48GB DDR5 memory) or mid-range server hardware (such as a server with Icelake-SP or Epyc 7002/7003 processors), and overall faster than Qwen3-235B-A22B.

However, the situation at Meta doesn't look good. There have been recent news reports about Meta reorganizing its GenAI department due to Llama4 falling short of expectations. It's hard to say their development progress won't be affected.

Llama4-Behemoth (2T-A288B) appears to be 17 times larger than Scout (109B-A17B). Even with 4-bit quantization, you would still require approximately 1TB of RAM to run it, which makes it too large to run locally.

[-]

shroddy@reddit

I still hope some day they give us llama-4-maverick-03-26-experimental but I don't think they ever will.

[-]

Echo9Zulu-@reddit

Behemoth reached spiritual bliss and took the weights with it

[-]

Neither-Phone-7264@reddit

it became an agi, escaped the meta servers, and now lives on the web, free roaming

[-]

GraduateDatafag@reddit

Can confirm

I was the CPU

[-]

rusty_scav@reddit

I heard he posessed a smart fridge somewhere in the Philippines and refuses to talk to anyone.

[-]

Both-Indication5062@reddit

Qwen Deepseek v4?

[-]

Aggressive-Writer-96@reddit

Mistral AI?

[-]

Ravenpest@reddit

Probably in the trash compactor

[-]

FullOf_Bad_Ideas@reddit

It got lost when it was thinking, wait, no.

They should have released it even if wasn't the best IMO.

They're busy playing internal politics instead. Meta is great at wasting money on moonshot projects, maybe someone from XR team taught GenAI team how to do it.

[-]

Direspark@reddit

Now, it makes sense why the Llama models are only good for RP.

[-]

silenceimpaired@reddit

Oh, are they? Perhaps I should give it a shot at my creative projects then.

[-]

Selphea@reddit

I hear the 3.3 dense ones are. 4 Maverick has been disappointing. Every female character is called Elara. Every sentence is punctuated with... ah, triple ellipsis. Try to up the temperature even slightly above 1 to fix it and it starts spewing out gibberish. Characters are very passive and reactive as well.

[-]

silenceimpaired@reddit

What do you use? What quant and fine tune

[-]

Selphea@reddit

I use an inference provider, usually running DeepSeek 0324 at FP4. For important plot junctures or complex prompts I switch to either DeepSeek 0528 (FP8) or Llama 3.3 405B. So I guess FP8 base model.

For local models my favorites are Violet Twilight and Chronos Gold but they're generally less capable with long contexts, keeping track of many small details or steps and getting math right compared to larger models.

[-]

DoggoChann@reddit

They forgot that as models scale larger you have to train them for longer and the training will complete in 2077

[-]

Selphea@reddit

I think it's been delayed to fall https://siliconangle.com/2025/05/15/meta-postpone-release-llama-4-behemoth-model-report-claims/

[-]

Charuru@reddit

There's been lots of rumors that it's delayed for quality reasons, the team is supposedly in some turmoil.

[-]

sartres_@reddit

If you spend $60 billion to make a 2T param model, and it's not as good as, hypothetically, a 671B model made in a cave with a box of scraps, it's better to bury the thing than release it and trigger all kinds of headlines and investor panic.

[-]

TheRealMasonMac@reddit

Meta's PALM/Bard moment.

[-]

florinandrei@reddit

According to a rabbinic legend, the land monster Behemoth is supposed to come out of hiding, along with the sea monster Leviathan, and do battle at the end of times.

So maybe it's good that we're still not seeing it. /s

[-]

jferments@reddit

They realized that nobody can afford to run it anyway, so why bother releasing it?

[-]

silenceimpaired@reddit

Rich people and small countries mad at you…. Or people who are very, very patient.

[-]

Calcidiol@reddit

Or people who are very, very patient.

Patience is a good bet considering that a new smartphone handily beats many of these pre-1995 "world's most powerful supercomputers" and a new personal workstation/server type desktop with a 5090 or two may significantly extend the superiority over several more supercomputer generations through ~2005:

https://en.wikipedia.org/wiki/LINPACK#World's_most_powerful_computer_by_year

https://www.top500.org/lists/top500/2005/06/

[-]

Super_Sierra@reddit

I had one of the first dual core macs, and it was faster and more stable than other computers at the time, but still took 4 minutes to boot and was loud asf.

Dirt cheap desktops of the early 2000s played World of Warcraft at below 1 FPS in certain cities without a GPU.

Zoomers have been fucking blessed and don't realize what 64kb internet was.

Integrated GPUs and NPUs now can play AAA modern games at over 60 FPS at 1080p. I pay 70$ for 1 gbps internet. My x3 4060 16gbs can run 70b models at 7 tks.

[-]

Calcidiol@reddit

Absolutely right, people since the early 2000s don't even realize how good it's been and how much every 5-10 years has brought to the capabilities. Now one can just casually wait 2-5y and have some hope of significant kinds of things compute related just having doubled by then if Moore's law holds and the market for evolutionary progress isn't dysfunctional.

Some people generationally upgraded -- pretty much doubling connection speed most every time -- 7x before they even got UP TO 56/64kbps (or for that matter "the internet" at all).

The original PC floppy disks were actually huge upgrade; before that it was basically record your modem's beeps (equivalent) on an audio cassette tape and call that data storage that'd take up to 60 minutes to write/read fully.

The original PCs might have cost $2k but now a $0.50 chip has literally more compute speed, RAM size, and "mass storage" than they had.

Heck not even knowing WHAT some of these things are / experiencing them is almost the common case for many adults now : DVD. CD. VHS video tape. Audio tape. Film on a personal media level. Developing pictures. Wired telephone. FAX. Floppy disc. Printed catalogs and mail order. Computer SW that doesn't have a GUI. Printed maps and not using GPS.

In another few years (5?) we'll seriously be at the point where it'll be unfathomable for average people to walk up to most any electronic appliance (and certainly pretty much any "computing device") and not have a serious default expectation that it'll talk interactively free form to you by voice / text in whatever language you expect, C3PO style.

$200 give or take will already buy one handheld storage big enough to hold the equivalent of a large library's worth of text books -- 2-10 million or so text equivalent very roughly -- and one's smart phone could random access search / pull up stuff in all that more or less immediately (0.1s).

We've got all the gadgets, but we're lagging far behind in realizing their potential to make the scope of human knowledge / capacity really available and synergistically aided by them.

[-]

Super_Sierra@reddit

Bandwidth upgrades have been insane too. I remember getting a 1mbs hardrive back then and now I casually can transfer that in the milliseconds.

I have a 5G phone and can download an entire 1gb file in around 10s on a bad day. We are living in the scifi future and don't even know it.

Now if only American transportation could catch up ...

[-]

Neither-Phone-7264@reddit

[-]

datbackup@reddit

Even if only 0.0001% of people can run it, that’s still 8000 people… you call that “nobody”? 8000 people is a lot of people…

[-]

romhacks@reddit

They're still waiting for it to finish answering its first prompt.

[-]

98127028@reddit

They’re still loading the model into memory

Nice pfp

[-]

Maleficent_Sir_646@reddit

Internet explorer got a worthy competitor

[-]

tothatl@reddit

Chrome with tabs say hold my beer.

[-]

Sudden-Lingonberry-8@reddit

pretty sure it sucks for the size

[-]

Happysedits@reddit

There are rumors that the team imploded

[-]

Commercial-Celery769@reddit

My question is who can run a 2t parameter model besides a datacenter? Sounds like it needs 1tb of ram at a q3 quant.

[-]

florinandrei@reddit

My question is who can run a 2t parameter model besides a datacenter?

Whoever has the money for the GPUs and the space and resources for hosting them.

That means not me, for sure, and that's all I know.

[-]

Lissanro@reddit

I have 1TB but it would run at speed of 0.5-1 tokens/s probably, and unless exceptionally impressive, I most likely would just continue using DeepSeek 671B, its Q4_K_M quant runs at 8 tokens/s on my rig.

Most likely their 2T model turned out to be not that great and they decided to start over. I heard they recently reorganized their AI teams, so I think it may take quite a while before they release something.

[-]