We might have a winner with the upcoming N1X

Posted by Ok_Spirit9482@reddit | LocalLLaMA | View on Reddit | 26 comments

https://www.notebookcheck.net/Nvidia-s-N1X-and-N1-processors-leak-in-full-ahead-of-launch.1311497.0.html

16 channel ddr5 memory is going to give us best of both world,light the memory bandwidth is going to be great than 500GB/S

[-]

LPDDR5X Grade	N1X (256-bit)	N1 (128-bit)
7500 MT/s	240 GB/s	120 GB/s
8000 MT/s	256 GB/s	128 GB/s
8533 MT/s	273.1 GB/s	136.5 GB/s
9600 MT/s	307.2 GB/s	153.6 GB/s

Charming-Author4877@reddit

Nice speed for copying a file or working on data.
Useless for running an AI model.

[-]

MrPecunius@reddit

My M5 Pro (307GB/s) is quite useful for running AI models, what do you mean?

[-]

My 7900XTX (960GB/s) is quite useful for running AI models that fit into 24GB of VRAM. You should have splurged on the M5 Max 40 Core for (614GB/s) if you were going for larger models (assuming 128GB of unified ram).

[-]

MrPecunius@reddit

I considered a M5 Max, but it's not twice as fast as a M5 Pro (see oMLX's extensive performance data); it's maybe 1.5-1.75X. Power consumption, however, is at least twice as high and you really need the 16" Macbook Pro chassis to have a chance of keeping it from throttling. No thanks.

The only thing 128GB gets you vs. 64GB is the ability to run a couple of midsize MoE models that are already being outperformed by e.g. Qwen3.6 27b. Larger models would need to be lobotomized to fit.

I looked at all of this in detail a couple months ago when I upgraded from my previous M4 Pro/48GB.

[-]

Charming-Author4877@reddit

It's really slow .. in a way the apple silicon is cool for casual use of AI - as it's so accessible in memory and somewhat decent in speed.
But still so slow that it's difficult to use productively - and you definitely won't run any professional inference on it (like offering a service).
A 5090 comes at 1700GB/sec - to put those 300 GB/sec of the M5 into perspective.
A 3090 or 4090 is at 1000GB/sec.

You can argue that your mac has larger (shared) VRAM but you'll have a hard time to run anything that needs that vram.
Even a tiny quantized 27B model that fills 17GB VRAM is not going to run at acceptable performance on a M5.
What do you get there on a 27B model? 1000 tokens/sec ? or less?

[-]

MrPecunius@reddit

I'm not trying to run a data center, and I am quite productive with LLMs.

I'm also not trying to one-shot everything or solve problems by throwing tokens at them. I happily run \~30-35b models @ 8-bit and pull maybe 70 watts while I'm doing it ... at a coffee shop.

You claimed 307GB/s is "useless for running an AI model". I do it all the time.

[-]

Charming-Author4877@reddit

You talk about MOE models, right ? A 35B dense model is theoretical on your mac - it just makes no sense.
The 27B Qwen 3.6 model, which is the best local model today that runs within 100GB of VRAM is already so slow that prefilling 120k context would take around 2 minutes on the M5 Pro.
So you can run it, but can you use it productively ? I'm doubtful.
For small jobs -> great.
For a productive workflow you'll be stuck in latency.

[-]

dampflokfreund@reddit

Atleast N2X should be amazing with LPDDR6.

[-]

Ok_Spirit9482@reddit (OP)

Oops thanks for correcting me! Forgot lpddr5 is 16-bit wide and got excited for a hot sec, moving on

[-]

Eyelbee@reddit

Those speeds seem way too low for 16 channels

[-]

Charming-Author4877@reddit

You'll probably get 3-4 times the performance getting a 150$ 3060 from ebay on any cheap PC. Or an old laptop with a 3060 mobile, they go cheap too.

[-]

MrPecunius@reddit

3060 has 8-12GB of 360GB/s VRAM, so I think you need a better example.

[-]

Charming-Author4877@reddit

The 3060 is 30% faster than the N1X and a 3060TI is double the speed of N1X.
At less than 200$.
It's a proper example.
Those 11 year old budget GPUs beat the N1X

[-]

khariV@reddit

Memory bandwidth performance.

I honestly don’t understand the trade off in having to run low parameter count models because GPUs have such little memory. Without stacking multiple GPUs, along with host computer, the power bill, and configuration headaches of getting them work, I just don’t get the usefulness of these small models running on 8-16g GPUs.

I get that they are super fast, but the output is such low quality that I generally don’t care how fast the answers come out.

I know that the dgx spark is a machine with a lot of compromises and that it’s expensive, but the ability to run larger models is worth it to me. Everyone is different I know, but I’d really be interested in what people find these small, fast, GPU contained models good for.

[-]

uti24@reddit

Why though, what logic you have behind that?

3060 has 300-350GB/s memory bandwidth, and this thing reportedly has 500GB/s

3060 has 3060 compute power and GB10 that is reportedly is used in this one has 4070 or whatever compute power.

Does not seem to add up.

[-]

Charming-Author4877@reddit

With normal DDR5 that's not possible, they are limited to 70 GB/sec, DDR6 double that, DDR7 doubles that, HBM3 doubles that and HBM3e doubles it again.
A 3060TI at 160$ comes with almost 500GB/sec memory bandwidth, the NX1 is likely half than that, most leaks show it around 250-300GB/sec.
It probably will use LPDDR5X, soldered non exchangeable RAM.

If you combine two RTX 3060ies you'll have a significantly stronger system, at a bargain pricepoint.
I don't recommend those, it's the absolute cheapest GPU you can use today.

I'd recommend a better one.

[-]

uti24@reddit

winner

I mean, if it would not cost like 7k$

[-]