Nvidia reveals Jetson Thor specs during GTC 2025
Posted by Dakhil@reddit | hardware | View on Reddit | 42 comments
https://www.reddit.com/r/nvidia/comments/1jg6m1e/jetson_thor_specifications_announced/
- 14 Poseidon-AE (Neoverse V3AE) cores
- 1 MB L2 cache per CPU core (14 MB L2 cache in total)
- 3 GPCs, 2560 CUDA cores, 96 Tensor cores
- Multi-Instance GPU (MIG) isolation
- 7.8 FP32 TFLOPS
- 500 FP16, 1000 FP8, 2000 FP4 TOPS
- 32 MB L2 cache
- 16 MB system level cache
- 128 GB 256-bit LPDDR5X at \~273 GB/s (8533 MT/s)
- 120 W TDP
WaterLillith@reddit
That's a ton of AI TOPS, holy moly. Even with sparsity.
2x RTX 5070
poli-cya@reddit
Check how the memory bandwidth compares...
Vb_33@reddit
Don't forget none of this matters because it's not a data center product so Nvidia is going to inevitably abandon this product line just like "they did with gaming".
PetiInco@reddit
Would be interesting in Costs and Parameters Diemand grown up processor, its not sand, and moust densite naturaly.... So shud be few times faster. And star Gate cristal technology feels like Cud be future ...
norcalnatv@reddit
Intended for self driving and robotics segment.
Strazdas1@reddit
Depending on price it might be great for it.
ResponsibleJudge3172@reddit
I believe this product line currently powers Mercedes Benz, BMW, BYD, etc self driving software and hardware so you can already judge it. This is just a faster chip
Strazdas1@reddit
So, not an endorsement...
norcalnatv@reddit
Automotive guys it will be hundreds for the chip, I don't know about the subsystem, but likely a 4-digit number for the advanced versions, and the Robotaxi guys will pay the most.
Robotics will have cut down and smaller family versions for sure, I imagine a price point in a few years of under $100 for a module.
Slasher1738@reddit
So this is just Digits/Spark
ResponsibleJudge3172@reddit
Some relation there, but this product line is very old. One of the first AI chips Nvidia made
SERIVUBSEV@reddit
Buckle up bros, gaming is going to ARM pretty quick.
Nvidia will release this as mini PC for gaming, they did the same with server chips where they sell the whole bundle of 1U or 2U rack with AI chips.
This maximizes revenue for Nvidia and they can eat up money from other vendors that sell mobo, memory, chasis etc separately.
My guess, they work with MS who is eager for ARM gaming, and Nvidia APU, AMD Sound Wave and Mediatek/QualComm being in race for next gen 2027-28 Xbox. Expecting compatibility to be 90%+ by then.
Strazdas1@reddit
this isnt PC. This is for self-driving and robotics. You wont find this used for gaming.
vk6_@reddit
Nvidia Jetson products were never intended for gaming. They run Ubuntu (not Windows on ARM) and are embedded systems mainly intended for robotics.
SherbertExisting3509@reddit
Qualcomm will need to create a fast, bug free x86-ARM translation layer and they need to cut their teeth creating a broadly compatible driver stack for windows games like what Intel was forced to do.
Tman1677@reddit
The CPU stack is pretty much perfect at this point (took a long time to get here though). Right now the main issue is driver compatibility especially in older games - if there's one company you can expect to do a good job with that it's Nvidia.
caelunshun@reddit
Always remember to divide the TOPS numbers by 2 to account for NVIDIA inflating them using the sparsity feature.
CatalyticDragon@reddit
They went from using dense to sparse figures but also went from 8-bit to 4-bit figures. Which is how the RTX5080 with 450 (INT8) TOPS gets marketed as having "1801 AI TOPS".
The most Ludacris example of their inflated marketing is probably the DGX Spark. A device they call an "AI supercomputer" which "delivers a petaflop of AI performance".
That is only true if your data, is sparse, fits into cache, and is at 4-bit precision. Which is a totally fictional scenario.
They really muddy the water and you've got to refer to their architecture whitepapers to get real data because the marketing obscures it so much.
noiserr@reddit
True, but that's always been the nature of xOPS numbers. P stands for Peak after all.
CatalyticDragon@reddit
Common Sense dictates that when you're comparing performance you use the same data types.
And the "P" in OPS stands for "per".
caelunshun@reddit
fortunately for NVIDIA, this trick will easily fool many of the people (investors) reading their marketing figures!
Tman1677@reddit
The sparsity feature and 4-bit figures are definitely them exaggerating for marketing purposes - but at the same time those features are going to become extremely useful for local-AI and probably a must have feature in ~2 years.
caelunshun@reddit
sparsity is neat (though very few implementations actually take advantage of it), but it's silly to claim you're doing twice as many calculations as you actually are.
doscomputer@reddit
but without sparsity you couldn't have that throughput so its not that silly tbh
just like how most GPUs don't have 1:1 fp64 support, if the arch isn't designed for it you aren't going to see the same performance at different bit depths if there isn't native support
Tman1677@reddit
For sure, I just think it's worth mentioning that it's definitely worth some sort of multiplier - and probably a lot. I remember when q8 calculations were deemed trickery - they totally were at the time, but now it's unthinkable to use a GPU which doesn't support them for AI inference.
CommunicationUsed270@reddit
It's highly usage dependent. The extreme case where there's only one entry in a sparse matrix is quite trivial.
EmergencyCucumber905@reddit
In that extreme case you wouldn't even use the matrix instructions.
The sparsity feature let's you encode as 0 up to 2 elements for every 4 elements. So the most speedup you can get is 2x, even in extreme cases.
ResponsibleJudge3172@reddit
Sparsity is ALWAYS noted when used in the spec sheets
hellotanjent@reddit
_120_ watt TDP? Weren't the previous Jetsons a fraction of that?
3ntrope@reddit
Yeah, this is like 25-50% more performance for 2x the TDP over Orin.
Sopel97@reddit
solid proposition if it's <$1500
JakeTappersCat@reddit
How does a chip with RTX-4050 core numbers have 2000 FP4 or 1000 FP8 TOPS. That's 5070ti-5080 level performance. I wonder if nvidia is taking more "liberties" in reporting specs or if its just a mistake because this shouldn't have more than 300 TOPS
ResponsibleJudge3172@reddit
Because Nvidia stated since Ampere that consumer GPUs have space efficient tensor cores
In other words, gimped to save space. They have same feature support, but not same performance.
This is obvious comparing rtx 4090 VS Hopper. H100 has LESS tensor cores and less cache but spanks rtx 4090 so hard in all AI scenarios, irregardless of VRAM constraints that it's not even funny.
Similar scenario where A100 had 20% more cores than 3090ti but more than 2X the performance long before the ultra large LLMs came in
lubits@reddit
Yeah, Jetson has nearly all the APIs of DC Blackwell. DC Blackwell is SM_100, Jetson is SM_101. Consumer Blackwell is SM_120.
Verite_Rendition@reddit
Based on how NVIDIA outlines the specifications for their existing Jetson products, those figures include NVIDIA's deep learning accelerators (DLA). DLA is incredibly fast for the space and power, but it's a fixed function accelerator block that's separate from the GPU and its tensor cores.
BlueGoliath@reddit
It's 3 GPUs.
WaterLillith@reddit
I thought the ~1000 TOPS on 5070 were FP4, meaning this is 2x as much
advester@reddit
Can't judge this without the price.
Vb_33@reddit
Switch 3 hardware for 2032 confirmed.
imaginary_num6er@reddit
Does this require an ASUS Thor PSU as well?
PAcMAcDO99@reddit
https://giphy.com/gifs/disneypixar-disney-pixar-BcsP48Gi2cqLS
atape_1@reddit
So this is the little 3k ai pc? 2560 cuda cores? yikes.