NVIDIA announces Nemotron 3 Ultra
Posted by themixtergames@reddit | LocalLLaMA | View on Reddit | 124 comments
Posted by themixtergames@reddit | LocalLLaMA | View on Reddit | 124 comments
FatheredPuma81@reddit
I feel like them comparing it to Qwen3.5 was intentional. It in indeed their best open weight model and look how it loses to everyone else.
kawaii_karthus@reddit
Qwen3.5 397b I think actually does quite well in this case. its a MOE model 17b vs the whooping 55b and it is only slightly worse in all the benchmark asides from knowledge. I was able to get running on a 16gb vram Laptop (with 192 gb ram ddr5) 2 tokens per second which isn't that bad for my non-coding use case. I would not be able to run this new one even with a 24gb vram, probably need 48gb vram.
tat_tvam_asshole@reddit
What quant? That makes a helluva a lot of difference
kawaii_karthus@reddit
Q3_K_XL.
michaelsoft__binbows@reddit
What kinda beefy ass lappy is that?
kawaii_karthus@reddit
msi raider 18 inches.. wouldn't recommend a 18 inch for portability. I also added ram luckily a year before the ram prices went up. it has 4x48gb ram. I tried 2x64gb ram sticks but it didn't work. (maybe the newer msi models will work i dunno, because in theory you can get 4x64gb = 256gb.) Also some gaming laptop also only has 2 RAM slots, luckily mines has 4. Though fyi, if your using 4 ram slots they automatically throttle it on my laptop and I am not gonna overclock it.
PaceZealousideal6091@reddit
Qwen loses to everyone else - Yes. But not by much. Qwen is giving mindblowing pound for pound value as compared to all the others.
StardockEngineer@reddit
I think they're comparing open weight models only.
ParthProLegend@reddit
Me too.
LatentSpacer@reddit
It’s a MoE 550B-A55
dtdisapointingresult@reddit
55B active? That's way higher than any of the major modern open MoE. Even people with multiple DGX Sparks aren't gonna be able to run this at any reasonable speed.
techno156@reddit
At this rate, the MoE will need its own MoE.
baked_tea@reddit
Don't let the Chinese see this. Pretty sure after this comment they are actively developing the solution
Far-Low-4705@reddit
ik this is a joke, but if you actually did that, and for the 55b you had a 55b a5b or whatever, then the whole model would just be 550b a5b
ilintar@reddit
Not if the external expert had a dense attention layer and a MoE layer 😉
Hunigsbase@reddit
I love how architectural breakthrough ideas can sometimes just casually show up in Reddit comments 😂
TheRealMasonMac@reddit
Didn't the idea for speculative drafting come from an off-hand Reddit comment?
Ps3Dave@reddit
And now I want a qwen 55B-A5B...
Caffdy@reddit
It's MoE all the way down
baked_tea@reddit
Interesting.. im not technical enough in this to understand why but im kinda disappointed now
secunder73@reddit
We added MoE to your MoE
ortegaalfredo@reddit
Moeception
temperature_5@reddit
Could have larger experts trained in various domains (culture+language, math+physics, programming, law, health, etc) and then the smaller experts in each trained the usual way. Then you could eliminate entire large experts when not relevant.
LatentSpacer@reddit
MoE-MoE-550B-A1-55B-A2-5.5B
tat_tvam_asshole@reddit
Jeez luis
RobotRobotWhatDoUSee@reddit
The AA release post shows it being served approximately as fast as gpt-oss 120B, look at the x-axis. This seems strange, is the hybrid Mamba architecture really comparatively that much faster than everything else? Or is this mamba+Nvfp4?
Clean_Hyena7172@reddit
Never trust Nvidia benchmarks.
Icy-Degree6161@reddit
What's the niche for the Nemotrons? Qwen is great at coding/math and Gemma4 is better at creative/translation.So - any uses?
Maximum-Style2848@reddit
Trained using NVFP4 so fast af on Blackwell
-InformalBanana-@reddit
The niche is to get you to buy more gpus. Everytime I tried a nemotron for coding it was worse than qwen. I stopped trying nemotrons, they all just seem benchmaxed and that is it...
InevitableMaw@reddit
It's a base model.
autoencoder@reddit
Of course! Buy some of their chips.
Middle_Bullfrog_6173@reddit
It's fast for the size. Nano was good until it got surpassed by those models, but currently both it and Super are too weak compared to competition.
fastheadcrab@reddit
Also their training data is open unlike many others so truly open source
FoxiPanda@reddit
Commercial use and fine tuning for specific applications by corporations.
nofuture09@reddit
but what exactly?
annodomini@reddit
It's not specifically good at any particular thing.
But they release the full training recipe and most of the training data, which is all very useful for doing continued pre-training, supervised finetuning, and reinforcement learning to adapt it to particular needs
So, it's designed to be good for training to customize to your use case.
Which most people are going to be doing on Nvidia hardware. So in particular, what it's designed to be good at is selling Nvidia hardware.
It's also the leading US open weights model. So if you happen to have a need to only run US-produced models, and need to run them yourself, it's the best game in town.
SkyFeistyLlama8@reddit
If you're a US corporation that needs to use US-centric hardware and software for government certification or whatever, this looks to be the only game in town. Good on Nvidia for releasing a full set of tools to keep people even more locked into CUDA.
FoxiPanda@reddit
Well, there are like 20+ nemotron models, so let's say you need a chatbot, you can fine tune some small nemotron model (say nemotron cascade 2 or nemotron 3 nano) to use your company's "voice" and understand your product offerings and the general playbook.
Companies are starting to set up agents to act on certain limited requests too - meeting summarizers with action assignments & task board updates, reminder bots, ticket pushers, monitoring and first line root cause analysis... hell, they're even starting to message me on teams/other chat platforms to get my approval to go do things on our internal systems.
nofuture09@reddit
Thanka I didnt know that its possible to „train“ it in out company voice .. thata actually sick need to learn more about it
Mkengine@reddit
If you want to learn more about finetuning, I can recommend this book, it's really informative and also entertainingly written (though very long).
sergeialmazov@reddit
Same question
jreoka1@reddit
Cool I appreciate they do comparisons with other open source models.
CosmicRiver827@reddit
I find it sketchy that they compared Qwen 3.5 and not 3.6.
malchi0r@reddit
FWIW there isn't a Qwen3.6 in that space right now. (Hoping that changes sooner than later!)
CosmicRiver827@reddit
I have 3.6 downloaded locally on LM Studio so I’m not sure what you mean.
malchi0r@reddit
The smallest model being compared is the Qwen3.5-397B - Qwen3.6-35B-A3B is the largest model in Qwen3.6. It's not in the same class. That's all I mean.
ChocomelP@reddit
Yet still we'll call its frontier smart without comparing to actual frontier models. There are no open-source frontier models currently.
__JockY__@reddit
I call this “looking for the worst”.
An American company drops a 550B open _source_ model and your reaction is “not good enough”.
May I encourage you to look for the best? See the good. There is some.
ChocomelP@reddit
I'm only talking about the dishonest marketing. These benchmarks are good enough without pretending they are frontier.
nsdjoe@reddit
i inferred that as "open source frontier" rather than all AI frontier. possibly i missed some context which would make the latter inference more reliable?
Beamsters@reddit
48 artificial analysis score, one notch less than frontier, around minimax 2.7 ball park but promise to be best US open weight model.
TheRealMasonMac@reddit
They also release most of their training data.
ortegaalfredo@reddit
Yes, this is the real important thing. Nemotron models are truly open, unlike basically all other models. Even llama was not truly open.
nsdjoe@reddit
yes. NVDA wants as many people as possible needing its picks and shovels, so giving away the map to the gold mine makes business sense
arcanemachined@reddit
It's nice when the interests of the big corporations and the public good actually align, rare though it may be.
UnknownLesson@reddit
License?
TheRealMasonMac@reddit
It depends on the dataset. Their post-training datasets are generally permissive whereas their pretraining datasets are more locked down.
Middle_Bullfrog_6173@reddit
Their pretraining data is based on web crawls so they can't exactly MIT license it.
Inevitable-Plantain5@reddit
Yeah web crawls and data they knew was pirated. I have been wondering why basically none of the "open" options not even the free ones tell you how they started and I think it's because they all have questionable foundations. They had the articles where Nvidia and Meta knowingly incorporated huge pirated data sets so...
Middle_Bullfrog_6173@reddit
The share Nemotron pretraining datasets are Common Crawl based so nothing pirated there (unless you consider that piracy). Who knows what their private datasets contain, however. (They list them in model cards so we know they exist but not what's in there.)
ManikSahdev@reddit
We will take it tbh, big step.
I am specially looking forward for agentic based tasks given its nvidia I have high hopes, even if the intelligence is less, if the agentic bar is high enough, most redundancy tasks don't need that much intelligence but just agentic freedom.
DAlmighty@reddit
Didn’t they announce this MONTHS ago?
CosmicRiver827@reddit
How does it do with creative writing?
Technical-Earth-3254@reddit
Finally
Ok_Technology_5962@reddit
why did we need to compare with that one? its on AA analysis and kinda of old and low scores
deanpreese@reddit
I have yet to have a Nemotron model that does not feel over cooked.
But maybe that’s not the models fault.
koloved@reddit
What does 95% mean on his slide? in the line Long context .
TastesLikeOwlbear@reddit
It means that 95% of the time it works every time.
__JockY__@reddit
This is a base model. Fairly useless for the average enthusiast, but amazing for bigger companies who can afford the engineers and compute to fine-tune this monster.
What would it take to fine tune an instruct variant of a 550B model?
acquire_a_living@reddit
Compare with Qwen 3.6 27B cowards lol
FullOf_Bad_Ideas@reddit
Alibaba never released the base weights for Qwen 3.5/3.6 27B.
"Base" is not there by mistake.
acquire_a_living@reddit
Sure, Alibaba didn’t release the base weights for Qwen 3.6 27B.
But then the table is bogus anyway. IFBench? "Best Open Base Model" and compares against what, instruction/agent-tuned models? Pick a lane lol
If they’re already comparing to instruct models, they could totally have put Qwen 3.6 27B there. They just wouldn’t like how it looks.
FullOf_Bad_Ideas@reddit
Those benchmarks are on their instruct models. Base model benchmarks are here
acquire_a_living@reddit
GLM 5.1 is fantastic, and my comment was just a little snarky for fun (I haven't seen an NVIDIA model that's worth it yet though).
theOliviaRossi@reddit
instruction following while poor performance for coding in such a huge model - who wants to use this BS???
-InformalBanana-@reddit
true.
sfifs@reddit
Strictly cloud or enterprise hardware I guess. In my benchmarking, their previous Nemotron mid sized MOE (30B a3b or something like that?) performed the poorest among mid sized models, though - so would be interesting to see if it's improved. Interestingly, Qwen 3.6 Flash on cloud was better but the mid sized MOE was competitive
-InformalBanana-@reddit
nvidia benchmaxes 100%, when trying coding I had significantly worse experience with nemotron models than qwen.
-InformalBanana-@reddit
How about you make a local ai focused gpu with a normal price tag, scale your production and stuff? In my experience nemotrons are just benchmaxed and qwen works better on coding, so I stopped trying nemotrons, always bad results.
No_Afternoon_4260@reddit
gguf wen ??
Inevitable-Name-1701@reddit
None of them was useful for my usecase.
girnyu@reddit
Huh another new ai ?
WebOsmotic_official@reddit
the 550B-A55 number is cool, but the actually interesting part is NVIDIA releasing enough of the stack that people can inspect and fine-tune it without playing license detective for a week.
open weights are nice; open-ish training data and a usable license are what make the model matter.
seamonn@reddit
No Vision?
redditrasberry@reddit
how can it rate 95% on professional work tasks without vision? Being able to screen shot what I want the model to work on and then feeding back the end result that way is half of my "professional" usage.
yes2matt@reddit
I just got from le express a cheap hdmi capture card to try and straighten that pipe.
seamonn@reddit
same
FullOf_Bad_Ideas@reddit
So, sometime this month?
Charuru@reddit
For the biggest company in the world... embarrassing garbage ngl. RULER instead of any actually decent long context benchmark... what a joke.
FullOf_Bad_Ideas@reddit
you can run a better benchmark and share results with us once weights will be opened
HavenTerminal_com@reddit
'best US open weight' is doing a lot of work
FullOf_Bad_Ideas@reddit
yes but those are precisely the models we need more off
adt@reddit
https://github.com/NVIDIA-NeMo/Nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra-Base
https://lifearchitect.ai/models-table/
FrostTactics@reddit
Wait, what? The model table here states the total number of parameters for a couple of close source models, including Gemini 3.5 Flash. Have these companies actually gone public with these details?
FullOf_Bad_Ideas@reddit
it even gets model size wrong for models with known size
Ernie 5.0 is 2.4T - https://ernie.baidu.com/blog/posts/ernie5.0/
they have ernie 5.1, which is probably just a continued pretrain of ernie 5.0, to be 800B A60B.
kivaougu@reddit
Yea seems like a low effort site. I doubt they have more than guesses.
Technical-Earth-3254@reddit
Some are for sure estimated. But others are leaked by themselves or by others (but you can argue on how accurate this is).
Local_Phenomenon@reddit
Thank you
Erdeem@reddit
You only need to buy 4 sparks aka $20,0000 worth of their hardware to use it!
hay-yo@reddit
To run it at unusable speeds.
Practical-Collar3063@reddit
I actually think it would be pretty ast on a 4x spark cluster all things considered. It is Mamba + nvfp4. Both prefill and generation should be usable
hay-yo@reddit
Look forward to seeing.
ResidentPositive4122@reddit
Interesting that they went 10:1 total:active, in contrast to the more popular 20:1 of other recent models.
Substantial_Step_351@reddit
The benchmark framing is the usual vendor move, comparing only against other open weights models and leading with agent numbers that don't travel outside the eval. I think the part actually worth caring about is the open training data and code which a few people already flagged.
For agents that matters more than the leaderboard slot because reliability is mostly about how well the model fits your task distribution and an open model you can finetune on your own traces beats a higher scoring closed one you can't touch. The catch is the A55. A 550B with 55B active needs real hardware, so this is an institutional base you adapt, not a local model. More useful than another sealed model that benches a point higher, just not for the reason the slide is selling.
banasraf@reddit
Well, I hope that it gets some cheap API options. The 1M context and perf is pretty promising
Foxiya@reddit
Looks bad
Ok-Contest-5856@reddit
Comparable levels to Kimi K2.6 and GLM 5.1 with being 200B+ params less is good, even if it’s not strictly better
ortegaalfredo@reddit
Its 550B-A55B so about half as big as those.
lilunxm12@reddit
but it also has largest activated parameter count, 55/40/32/17
ortegaalfredo@reddit
Its a chunky bro
Winter-Editor-9230@reddit
Agreed, runnable on dual sparks at fp4 maybe.
hainesk@reddit
550B at fp4 would likely be over 230GB before context allocation, so I think it would be a stretch to run it on dual sparks. Also 55B active parameters means it would be pretty slow. I think a model like this would need better hardware.
Winter-Editor-9230@reddit
Youre right, maybe 3 of them. Newest benchmarks on qwen 3.6 27b is 30-40t/s, so im hoping for further optimizations. Id be happy at 15 t/s for frontier performance
JaredsBored@reddit
They're also releasing it as a base model, so it's ready for further training and tuning. Nvidia doesn't want anthropic/openai/Google to be the only shows in town, they want every big company needing to buy their hardware, and releasing big post-train ready models like this is their strategy to build that demand. Makes a ton of sense tbh.
mxforest@reddit
Their Nano was genuinely good. Then they fumbled with the 2 bigger ones.
jacek2023@reddit
Too big for my local setup but Nemotron Super is perfect. Nano is also nice.
smashedshanky@reddit
So basically another DLSS scenario by Nvidia
annodomini@reddit
What event was this at? Are there sources for new info? Have they actually released anything yet?
They announced Ultra back in December when they released Nano and announced the whole family. But I don't see anything new posted by them yet.
FoxiPanda@reddit
This is at Computex. They're posting datasets on HF like nothing else right now, but I haven't seen the actual weights drop.
annodomini@reddit
Link to the video: https://youtu.be/wSp6AiNIrsY?t=4541
And link the the AA announcement, though the full results aren't available yet: https://artificialanalysis.ai/articles/nvidia-nemotron-3-ultra-launch-announced
They are posting datasets right now, but nothing relating to Nemotron 3 Ultra; they're mostly robotics datasets.
Anyhow, given that AA is announcing numbers, it looks like it's probably finalized and will be released soon.
darkplaceguy1@reddit
what is the API Price?
TheAzureTech@reddit
only need 40 grand in GPUs to run it!
Specter_Origin@reddit
Damn, why so low on coding : (
Very happy it exists though : )