Nvidia reveals Jetson Thor specs during GTC 2025

Posted by Dakhil@reddit | hardware | View on Reddit | 42 comments

https://www.reddit.com/r/nvidia/comments/1jg6m1e/jetson_thor_specifications_announced/

14 Poseidon-AE (Neoverse V3AE) cores
1 MB L2 cache per CPU core (14 MB L2 cache in total)
3 GPCs, 2560 CUDA cores, 96 Tensor cores
Multi-Instance GPU (MIG) isolation
7.8 FP32 TFLOPS
500 FP16, 1000 FP8, 2000 FP4 TOPS
32 MB L2 cache
16 MB system level cache
128 GB 256-bit LPDDR5X at \~273 GB/s (8533 MT/s)
120 W TDP

[-]

WaterLillith@reddit

That's a ton of AI TOPS, holy moly. Even with sparsity.

2x RTX 5070

[-]

poli-cya@reddit

Check how the memory bandwidth compares...

[-]

Vb_33@reddit

Don't forget none of this matters because it's not a data center product so Nvidia is going to inevitably abandon this product line just like "they did with gaming".

[-]

Would be interesting in Costs and Parameters Diemand grown up processor, its not sand, and moust densite naturaly.... So shud be few times faster. And star Gate cristal technology feels like Cud be future ...

[-]

norcalnatv@reddit

Intended for self driving and robotics segment.

[-]

Strazdas1@reddit

Depending on price it might be great for it.

[-]

ResponsibleJudge3172@reddit

I believe this product line currently powers Mercedes Benz, BMW, BYD, etc self driving software and hardware so you can already judge it. This is just a faster chip

[-]

Strazdas1@reddit

So, not an endorsement...

[-]

norcalnatv@reddit

Automotive guys it will be hundreds for the chip, I don't know about the subsystem, but likely a 4-digit number for the advanced versions, and the Robotaxi guys will pay the most.

Robotics will have cut down and smaller family versions for sure, I imagine a price point in a few years of under $100 for a module.

[-]

Slasher1738@reddit

So this is just Digits/Spark

[-]

ResponsibleJudge3172@reddit

Some relation there, but this product line is very old. One of the first AI chips Nvidia made

[-]

SERIVUBSEV@reddit

Buckle up bros, gaming is going to ARM pretty quick.

Nvidia will release this as mini PC for gaming, they did the same with server chips where they sell the whole bundle of 1U or 2U rack with AI chips.

This maximizes revenue for Nvidia and they can eat up money from other vendors that sell mobo, memory, chasis etc separately.

My guess, they work with MS who is eager for ARM gaming, and Nvidia APU, AMD Sound Wave and Mediatek/QualComm being in race for next gen 2027-28 Xbox. Expecting compatibility to be 90%+ by then.

[-]

Strazdas1@reddit

this isnt PC. This is for self-driving and robotics. You wont find this used for gaming.

[-]

vk6_@reddit

Nvidia Jetson products were never intended for gaming. They run Ubuntu (not Windows on ARM) and are embedded systems mainly intended for robotics.

[-]

SherbertExisting3509@reddit

Qualcomm will need to create a fast, bug free x86-ARM translation layer and they need to cut their teeth creating a broadly compatible driver stack for windows games like what Intel was forced to do.

[-]

Tman1677@reddit

The CPU stack is pretty much perfect at this point (took a long time to get here though). Right now the main issue is driver compatibility especially in older games - if there's one company you can expect to do a good job with that it's Nvidia.

[-]

caelunshun@reddit

Always remember to divide the TOPS numbers by 2 to account for NVIDIA inflating them using the sparsity feature.

[-]

CatalyticDragon@reddit

They went from using dense to sparse figures but also went from 8-bit to 4-bit figures. Which is how the RTX5080 with 450 (INT8) TOPS gets marketed as having "1801 AI TOPS".

The most Ludacris example of their inflated marketing is probably the DGX Spark. A device they call an "AI supercomputer" which "delivers a petaflop of AI performance".

That is only true if your data, is sparse, fits into cache, and is at 4-bit precision. Which is a totally fictional scenario.

They really muddy the water and you've got to refer to their architecture whitepapers to get real data because the marketing obscures it so much.

[-]

noiserr@reddit

True, but that's always been the nature of xOPS numbers. P stands for Peak after all.

[-]

CatalyticDragon@reddit

Common Sense dictates that when you're comparing performance you use the same data types.

And the "P" in OPS stands for "per".

[-]

caelunshun@reddit

fortunately for NVIDIA, this trick will easily fool many of the people (investors) reading their marketing figures!

[-]

Tman1677@reddit

The sparsity feature and 4-bit figures are definitely them exaggerating for marketing purposes - but at the same time those features are going to become extremely useful for local-AI and probably a must have feature in ~2 years.

[-]

caelunshun@reddit

sparsity is neat (though very few implementations actually take advantage of it), but it's silly to claim you're doing twice as many calculations as you actually are.

[-]

doscomputer@reddit

but without sparsity you couldn't have that throughput so its not that silly tbh

just like how most GPUs don't have 1:1 fp64 support, if the arch isn't designed for it you aren't going to see the same performance at different bit depths if there isn't native support

[-]

Tman1677@reddit

For sure, I just think it's worth mentioning that it's definitely worth some sort of multiplier - and probably a lot. I remember when q8 calculations were deemed trickery - they totally were at the time, but now it's unthinkable to use a GPU which doesn't support them for AI inference.

[-]

CommunicationUsed270@reddit

It's highly usage dependent. The extreme case where there's only one entry in a sparse matrix is quite trivial.

[-]

EmergencyCucumber905@reddit

In that extreme case you wouldn't even use the matrix instructions.

The sparsity feature let's you encode as 0 up to 2 elements for every 4 elements. So the most speedup you can get is 2x, even in extreme cases.

[-]

ResponsibleJudge3172@reddit

Sparsity is ALWAYS noted when used in the spec sheets

[-]

hellotanjent@reddit

_120_ watt TDP? Weren't the previous Jetsons a fraction of that?

[-]

3ntrope@reddit

Yeah, this is like 25-50% more performance for 2x the TDP over Orin.

[-]

Sopel97@reddit

solid proposition if it's <$1500

[-]

JakeTappersCat@reddit

How does a chip with RTX-4050 core numbers have 2000 FP4 or 1000 FP8 TOPS. That's 5070ti-5080 level performance. I wonder if nvidia is taking more "liberties" in reporting specs or if its just a mistake because this shouldn't have more than 300 TOPS

[-]

ResponsibleJudge3172@reddit

Because Nvidia stated since Ampere that consumer GPUs have space efficient tensor cores

In other words, gimped to save space. They have same feature support, but not same performance.

This is obvious comparing rtx 4090 VS Hopper. H100 has LESS tensor cores and less cache but spanks rtx 4090 so hard in all AI scenarios, irregardless of VRAM constraints that it's not even funny.

Similar scenario where A100 had 20% more cores than 3090ti but more than 2X the performance long before the ultra large LLMs came in

[-]

lubits@reddit

Yeah, Jetson has nearly all the APIs of DC Blackwell. DC Blackwell is SM_100, Jetson is SM_101. Consumer Blackwell is SM_120.

[-]

Verite_Rendition@reddit

Based on how NVIDIA outlines the specifications for their existing Jetson products, those figures include NVIDIA's deep learning accelerators (DLA). DLA is incredibly fast for the space and power, but it's a fixed function accelerator block that's separate from the GPU and its tensor cores.

[-]