NVIDIA announces Nemotron 3 Ultra

[-]

FatheredPuma81@reddit

I feel like them comparing it to Qwen3.5 was intentional. It in indeed their best open weight model and look how it loses to everyone else.

[-]

Qwen3.5 397b I think actually does quite well in this case. its a MOE model 17b vs the whooping 55b and it is only slightly worse in all the benchmark asides from knowledge. I was able to get running on a 16gb vram Laptop (with 192 gb ram ddr5) 2 tokens per second which isn't that bad for my non-coding use case. I would not be able to run this new one even with a 24gb vram, probably need 48gb vram.

[-]

tat_tvam_asshole@reddit

What quant? That makes a helluva a lot of difference

[-]

kawaii_karthus@reddit

Q3_K_XL.

[-]

michaelsoft__binbows@reddit

What kinda beefy ass lappy is that?

[-]

kawaii_karthus@reddit

msi raider 18 inches.. wouldn't recommend a 18 inch for portability. I also added ram luckily a year before the ram prices went up. it has 4x48gb ram. I tried 2x64gb ram sticks but it didn't work. (maybe the newer msi models will work i dunno, because in theory you can get 4x64gb = 256gb.) Also some gaming laptop also only has 2 RAM slots, luckily mines has 4. Though fyi, if your using 4 ram slots they automatically throttle it on my laptop and I am not gonna overclock it.

[-]

PaceZealousideal6091@reddit

Qwen loses to everyone else - Yes. But not by much. Qwen is giving mindblowing pound for pound value as compared to all the others.

[-]

StardockEngineer@reddit

I think they're comparing open weight models only.

[-]

ParthProLegend@reddit

But I'm all for Qwen fixing that situation :D

Me too.

[-]

LatentSpacer@reddit

It’s a MoE 550B-A55

[-]

dtdisapointingresult@reddit

55B active? That's way higher than any of the major modern open MoE. Even people with multiple DGX Sparks aren't gonna be able to run this at any reasonable speed.

[-]

techno156@reddit

At this rate, the MoE will need its own MoE.

[-]

baked_tea@reddit

Don't let the Chinese see this. Pretty sure after this comment they are actively developing the solution

[-]

Far-Low-4705@reddit

ik this is a joke, but if you actually did that, and for the 55b you had a 55b a5b or whatever, then the whole model would just be 550b a5b

[-]

ilintar@reddit

Not if the external expert had a dense attention layer and a MoE layer 😉

[-]

Hunigsbase@reddit

I love how architectural breakthrough ideas can sometimes just casually show up in Reddit comments 😂

[-]

TheRealMasonMac@reddit

Didn't the idea for speculative drafting come from an off-hand Reddit comment?

[-]

Ps3Dave@reddit

And now I want a qwen 55B-A5B...

[-]

Caffdy@reddit

It's MoE all the way down

[-]

baked_tea@reddit

Interesting.. im not technical enough in this to understand why but im kinda disappointed now

[-]

secunder73@reddit

We added MoE to your MoE

[-]

ortegaalfredo@reddit

Moeception

[-]

temperature_5@reddit

Could have larger experts trained in various domains (culture+language, math+physics, programming, law, health, etc) and then the smaller experts in each trained the usual way. Then you could eliminate entire large experts when not relevant.

[-]

LatentSpacer@reddit

MoE-MoE-550B-A1-55B-A2-5.5B

[-]

tat_tvam_asshole@reddit

Jeez luis

[-]

RobotRobotWhatDoUSee@reddit

The AA release post shows it being served approximately as fast as gpt-oss 120B, look at the x-axis. This seems strange, is the hybrid Mamba architecture really comparatively that much faster than everything else? Or is this mamba+Nvfp4?

[-]

Clean_Hyena7172@reddit

Never trust Nvidia benchmarks.

[-]

Icy-Degree6161@reddit

What's the niche for the Nemotrons? Qwen is great at coding/math and Gemma4 is better at creative/translation.So - any uses?

[-]

Maximum-Style2848@reddit

Trained using NVFP4 so fast af on Blackwell

[-]

-InformalBanana-@reddit

The niche is to get you to buy more gpus. Everytime I tried a nemotron for coding it was worse than qwen. I stopped trying nemotrons, they all just seem benchmaxed and that is it...

[-]

InevitableMaw@reddit

It's a base model.

[-]

autoencoder@reddit

Of course! Buy some of their chips.

[-]

Middle_Bullfrog_6173@reddit

It's fast for the size. Nano was good until it got surpassed by those models, but currently both it and Super are too weak compared to competition.

[-]

fastheadcrab@reddit

Also their training data is open unlike many others so truly open source

[-]

FoxiPanda@reddit

Commercial use and fine tuning for specific applications by corporations.

[-]

nofuture09@reddit

but what exactly?

[-]

annodomini@reddit

It's not specifically good at any particular thing.

But they release the full training recipe and most of the training data, which is all very useful for doing continued pre-training, supervised finetuning, and reinforcement learning to adapt it to particular needs

So, it's designed to be good for training to customize to your use case.

Which most people are going to be doing on Nvidia hardware. So in particular, what it's designed to be good at is selling Nvidia hardware.

It's also the leading US open weights model. So if you happen to have a need to only run US-produced models, and need to run them yourself, it's the best game in town.

[-]

SkyFeistyLlama8@reddit

If you're a US corporation that needs to use US-centric hardware and software for government certification or whatever, this looks to be the only game in town. Good on Nvidia for releasing a full set of tools to keep people even more locked into CUDA.

[-]

FoxiPanda@reddit

Well, there are like 20+ nemotron models, so let's say you need a chatbot, you can fine tune some small nemotron model (say nemotron cascade 2 or nemotron 3 nano) to use your company's "voice" and understand your product offerings and the general playbook.

Companies are starting to set up agents to act on certain limited requests too - meeting summarizers with action assignments & task board updates, reminder bots, ticket pushers, monitoring and first line root cause analysis... hell, they're even starting to message me on teams/other chat platforms to get my approval to go do things on our internal systems.

[-]

nofuture09@reddit

Thanka I didnt know that its possible to „train“ it in out company voice .. thata actually sick need to learn more about it

[-]

Mkengine@reddit

If you want to learn more about finetuning, I can recommend this book, it's really informative and also entertainingly written (though very long).

[-]

sergeialmazov@reddit

Same question

[-]

jreoka1@reddit

Cool I appreciate they do comparisons with other open source models.

[-]

CosmicRiver827@reddit

I find it sketchy that they compared Qwen 3.5 and not 3.6.

[-]

malchi0r@reddit

FWIW there isn't a Qwen3.6 in that space right now. (Hoping that changes sooner than later!)

[-]

CosmicRiver827@reddit

I have 3.6 downloaded locally on LM Studio so I’m not sure what you mean.

[-]

malchi0r@reddit

The smallest model being compared is the Qwen3.5-397B - Qwen3.6-35B-A3B is the largest model in Qwen3.6. It's not in the same class. That's all I mean.

[-]

ChocomelP@reddit

Yet still we'll call its frontier smart without comparing to actual frontier models. There are no open-source frontier models currently.

[-]

JockY@reddit

I call this “looking for the worst”.

An American company drops a 550B open _source_ model and your reaction is “not good enough”.

May I encourage you to look for the best? See the good. There is some.

[-]

ChocomelP@reddit

I'm only talking about the dishonest marketing. These benchmarks are good enough without pretending they are frontier.

[-]

nsdjoe@reddit

i inferred that as "open source frontier" rather than all AI frontier. possibly i missed some context which would make the latter inference more reliable?

[-]

Beamsters@reddit

48 artificial analysis score, one notch less than frontier, around minimax 2.7 ball park but promise to be best US open weight model.

[-]

TheRealMasonMac@reddit

They also release most of their training data.

[-]

ortegaalfredo@reddit

Yes, this is the real important thing. Nemotron models are truly open, unlike basically all other models. Even llama was not truly open.

[-]

nsdjoe@reddit

yes. NVDA wants as many people as possible needing its picks and shovels, so giving away the map to the gold mine makes business sense

[-]

arcanemachined@reddit

It's nice when the interests of the big corporations and the public good actually align, rare though it may be.

[-]

UnknownLesson@reddit

License?

[-]

TheRealMasonMac@reddit

It depends on the dataset. Their post-training datasets are generally permissive whereas their pretraining datasets are more locked down.

[-]

Middle_Bullfrog_6173@reddit

Their pretraining data is based on web crawls so they can't exactly MIT license it.

[-]

Inevitable-Plantain5@reddit

Yeah web crawls and data they knew was pirated. I have been wondering why basically none of the "open" options not even the free ones tell you how they started and I think it's because they all have questionable foundations. They had the articles where Nvidia and Meta knowingly incorporated huge pirated data sets so...

[-]

Middle_Bullfrog_6173@reddit

The share Nemotron pretraining datasets are Common Crawl based so nothing pirated there (unless you consider that piracy). Who knows what their private datasets contain, however. (They list them in model cards so we know they exist but not what's in there.)

[-]

ManikSahdev@reddit

We will take it tbh, big step.

I am specially looking forward for agentic based tasks given its nvidia I have high hopes, even if the intelligence is less, if the agentic bar is high enough, most redundancy tasks don't need that much intelligence but just agentic freedom.

[-]

DAlmighty@reddit

Didn’t they announce this MONTHS ago?

[-]

CosmicRiver827@reddit

How does it do with creative writing?

[-]

Technical-Earth-3254@reddit

Finally

[-]

Ok_Technology_5962@reddit

why did we need to compare with that one? its on AA analysis and kinda of old and low scores

[-]

deanpreese@reddit

I have yet to have a Nemotron model that does not feel over cooked.

But maybe that’s not the models fault.

[-]

koloved@reddit

What does 95% mean on his slide? in the line Long context .

[-]

TastesLikeOwlbear@reddit

It means that 95% of the time it works every time.

[-]

JockY@reddit

This is a base model. Fairly useless for the average enthusiast, but amazing for bigger companies who can afford the engineers and compute to fine-tune this monster.

What would it take to fine tune an instruct variant of a 550B model?

[-]

acquire_a_living@reddit

Compare with Qwen 3.6 27B cowards lol

[-]

FullOf_Bad_Ideas@reddit

Alibaba never released the base weights for Qwen 3.5/3.6 27B.

Announced at GTC San Jose 2026 · Best Open Base Model

"Base" is not there by mistake.

[-]

acquire_a_living@reddit

Sure, Alibaba didn’t release the base weights for Qwen 3.6 27B.

But then the table is bogus anyway. IFBench? "Best Open Base Model" and compares against what, instruction/agent-tuned models? Pick a lane lol

If they’re already comparing to instruct models, they could totally have put Qwen 3.6 27B there. They just wouldn’t like how it looks.

[-]

FullOf_Bad_Ideas@reddit

Those benchmarks are on their instruct models. Base model benchmarks are here

[-]

acquire_a_living@reddit

GLM 5.1 is fantastic, and my comment was just a little snarky for fun (I haven't seen an NVIDIA model that's worth it yet though).

[-]

theOliviaRossi@reddit

instruction following while poor performance for coding in such a huge model - who wants to use this BS???

[-]

-InformalBanana-@reddit

true.

[-]

sfifs@reddit

Strictly cloud or enterprise hardware I guess. In my benchmarking, their previous Nemotron mid sized MOE (30B a3b or something like that?) performed the poorest among mid sized models, though - so would be interesting to see if it's improved. Interestingly, Qwen 3.6 Flash on cloud was better but the mid sized MOE was competitive

[-]

-InformalBanana-@reddit

nvidia benchmaxes 100%, when trying coding I had significantly worse experience with nemotron models than qwen.

[-]

-InformalBanana-@reddit

How about you make a local ai focused gpu with a normal price tag, scale your production and stuff? In my experience nemotrons are just benchmaxed and qwen works better on coding, so I stopped trying nemotrons, always bad results.

[-]

No_Afternoon_4260@reddit

gguf wen ??

[-]

Inevitable-Name-1701@reddit

None of them was useful for my usecase.

[-]

girnyu@reddit

Huh another new ai ?

[-]

WebOsmotic_official@reddit

the 550B-A55 number is cool, but the actually interesting part is NVIDIA releasing enough of the stack that people can inspect and fine-tune it without playing license detective for a week.

open weights are nice; open-ish training data and a usable license are what make the model matter.

[-]

seamonn@reddit

No Vision?

[-]

redditrasberry@reddit

how can it rate 95% on professional work tasks without vision? Being able to screen shot what I want the model to work on and then feeding back the end result that way is half of my "professional" usage.

[-]

yes2matt@reddit

I just got from le express a cheap hdmi capture card to try and straighten that pipe.

[-]

seamonn@reddit

same

[-]

FullOf_Bad_Ideas@reddit

Weights will become available with the full release of Nemotron 3 Ultra, expected to release in 1H 2026.

So, sometime this month?

[-]

Charuru@reddit

For the biggest company in the world... embarrassing garbage ngl. RULER instead of any actually decent long context benchmark... what a joke.

[-]

FullOf_Bad_Ideas@reddit

you can run a better benchmark and share results with us once weights will be opened

[-]

HavenTerminal_com@reddit

'best US open weight' is doing a lot of work

[-]

FullOf_Bad_Ideas@reddit

yes but those are precisely the models we need more off

[-]

adt@reddit

https://github.com/NVIDIA-NeMo/Nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra-Base

https://lifearchitect.ai/models-table/

[-]

FrostTactics@reddit

Wait, what? The model table here states the total number of parameters for a couple of close source models, including Gemini 3.5 Flash. Have these companies actually gone public with these details?

[-]

FullOf_Bad_Ideas@reddit

it even gets model size wrong for models with known size

Ernie 5.0 is 2.4T - https://ernie.baidu.com/blog/posts/ernie5.0/

they have ernie 5.1, which is probably just a continued pretrain of ernie 5.0, to be 800B A60B.

[-]

kivaougu@reddit

Yea seems like a low effort site. I doubt they have more than guesses.

[-]

Technical-Earth-3254@reddit

Some are for sure estimated. But others are leaked by themselves or by others (but you can argue on how accurate this is).

[-]

Local_Phenomenon@reddit

Thank you

[-]

Erdeem@reddit

You only need to buy 4 sparks aka $20,0000 worth of their hardware to use it!

[-]

hay-yo@reddit

To run it at unusable speeds.

[-]

Practical-Collar3063@reddit

I actually think it would be pretty ast on a 4x spark cluster all things considered. It is Mamba + nvfp4. Both prefill and generation should be usable

[-]

hay-yo@reddit

Look forward to seeing.

[-]

ResidentPositive4122@reddit

Interesting that they went 10:1 total:active, in contrast to the more popular 20:1 of other recent models.

[-]

Substantial_Step_351@reddit

The benchmark framing is the usual vendor move, comparing only against other open weights models and leading with agent numbers that don't travel outside the eval. I think the part actually worth caring about is the open training data and code which a few people already flagged.

For agents that matters more than the leaderboard slot because reliability is mostly about how well the model fits your task distribution and an open model you can finetune on your own traces beats a higher scoring closed one you can't touch. The catch is the A55. A 550B with 55B active needs real hardware, so this is an institutional base you adapt, not a local model. More useful than another sealed model that benches a point higher, just not for the reason the slide is selling.

[-]

banasraf@reddit

Well, I hope that it gets some cheap API options. The 1M context and perf is pretty promising

[-]

Foxiya@reddit

Looks bad

[-]

Ok-Contest-5856@reddit

Comparable levels to Kimi K2.6 and GLM 5.1 with being 200B+ params less is good, even if it’s not strictly better

[-]

ortegaalfredo@reddit

Its 550B-A55B so about half as big as those.

[-]

lilunxm12@reddit

but it also has largest activated parameter count, 55/40/32/17

[-]

ortegaalfredo@reddit

Its a chunky bro

[-]

Winter-Editor-9230@reddit

Agreed, runnable on dual sparks at fp4 maybe.

[-]

hainesk@reddit

550B at fp4 would likely be over 230GB before context allocation, so I think it would be a stretch to run it on dual sparks. Also 55B active parameters means it would be pretty slow. I think a model like this would need better hardware.

[-]

Winter-Editor-9230@reddit

Youre right, maybe 3 of them. Newest benchmarks on qwen 3.6 27b is 30-40t/s, so im hoping for further optimizations. Id be happy at 15 t/s for frontier performance

[-]

JaredsBored@reddit

They're also releasing it as a base model, so it's ready for further training and tuning. Nvidia doesn't want anthropic/openai/Google to be the only shows in town, they want every big company needing to buy their hardware, and releasing big post-train ready models like this is their strategy to build that demand. Makes a ton of sense tbh.

[-]

mxforest@reddit

Their Nano was genuinely good. Then they fumbled with the 2 bigger ones.

[-]

jacek2023@reddit

Too big for my local setup but Nemotron Super is perfect. Nano is also nice.

[-]

smashedshanky@reddit

So basically another DLSS scenario by Nvidia

[-]

annodomini@reddit

What event was this at? Are there sources for new info? Have they actually released anything yet?

They announced Ultra back in December when they released Nano and announced the whole family. But I don't see anything new posted by them yet.

[-]

FoxiPanda@reddit

This is at Computex. They're posting datasets on HF like nothing else right now, but I haven't seen the actual weights drop.

[-]

annodomini@reddit

Link to the video: https://youtu.be/wSp6AiNIrsY?t=4541

And link the the AA announcement, though the full results aren't available yet: https://artificialanalysis.ai/articles/nvidia-nemotron-3-ultra-launch-announced

They are posting datasets right now, but nothing relating to Nemotron 3 Ultra; they're mostly robotics datasets.

Anyhow, given that AA is announcing numbers, it looks like it's probably finalized and will be released soon.

[-]