We might have a winner with the upcoming N1X
Posted by Ok_Spirit9482@reddit | LocalLLaMA | View on Reddit | 26 comments
16 channel ddr5 memory is going to give us best of both world,light the memory bandwidth is going to be great than 500GB/S
oxygen_addiction@reddit
Charming-Author4877@reddit
Nice speed for copying a file or working on data.
Useless for running an AI model.
MrPecunius@reddit
My M5 Pro (307GB/s) is quite useful for running AI models, what do you mean?
Blakslab@reddit
My 7900XTX (960GB/s) is quite useful for running AI models that fit into 24GB of VRAM. You should have splurged on the M5 Max 40 Core for (614GB/s) if you were going for larger models (assuming 128GB of unified ram).
MrPecunius@reddit
I considered a M5 Max, but it's not twice as fast as a M5 Pro (see oMLX's extensive performance data); it's maybe 1.5-1.75X. Power consumption, however, is at least twice as high and you really need the 16" Macbook Pro chassis to have a chance of keeping it from throttling. No thanks.
The only thing 128GB gets you vs. 64GB is the ability to run a couple of midsize MoE models that are already being outperformed by e.g. Qwen3.6 27b. Larger models would need to be lobotomized to fit.
I looked at all of this in detail a couple months ago when I upgraded from my previous M4 Pro/48GB.
Charming-Author4877@reddit
It's really slow .. in a way the apple silicon is cool for casual use of AI - as it's so accessible in memory and somewhat decent in speed.
But still so slow that it's difficult to use productively - and you definitely won't run any professional inference on it (like offering a service).
A 5090 comes at 1700GB/sec - to put those 300 GB/sec of the M5 into perspective.
A 3090 or 4090 is at 1000GB/sec.
You can argue that your mac has larger (shared) VRAM but you'll have a hard time to run anything that needs that vram.
Even a tiny quantized 27B model that fills 17GB VRAM is not going to run at acceptable performance on a M5.
What do you get there on a 27B model? 1000 tokens/sec ? or less?
MrPecunius@reddit
I'm not trying to run a data center, and I am quite productive with LLMs.
I'm also not trying to one-shot everything or solve problems by throwing tokens at them. I happily run \~30-35b models @ 8-bit and pull maybe 70 watts while I'm doing it ... at a coffee shop.
You claimed 307GB/s is "useless for running an AI model". I do it all the time.
Charming-Author4877@reddit
You talk about MOE models, right ? A 35B dense model is theoretical on your mac - it just makes no sense.
The 27B Qwen 3.6 model, which is the best local model today that runs within 100GB of VRAM is already so slow that prefilling 120k context would take around 2 minutes on the M5 Pro.
So you can run it, but can you use it productively ? I'm doubtful.
For small jobs -> great.
For a productive workflow you'll be stuck in latency.
dampflokfreund@reddit
Atleast N2X should be amazing with LPDDR6.
Ok_Spirit9482@reddit (OP)
Oops thanks for correcting me! Forgot lpddr5 is 16-bit wide and got excited for a hot sec, moving on
Eyelbee@reddit
Those speeds seem way too low for 16 channels
Charming-Author4877@reddit
You'll probably get 3-4 times the performance getting a 150$ 3060 from ebay on any cheap PC. Or an old laptop with a 3060 mobile, they go cheap too.
MrPecunius@reddit
3060 has 8-12GB of 360GB/s VRAM, so I think you need a better example.
Charming-Author4877@reddit
The 3060 is 30% faster than the N1X and a 3060TI is double the speed of N1X.
At less than 200$.
It's a proper example.
Those 11 year old budget GPUs beat the N1X
khariV@reddit
Memory bandwidth performance.
I honestly don’t understand the trade off in having to run low parameter count models because GPUs have such little memory. Without stacking multiple GPUs, along with host computer, the power bill, and configuration headaches of getting them work, I just don’t get the usefulness of these small models running on 8-16g GPUs.
I get that they are super fast, but the output is such low quality that I generally don’t care how fast the answers come out.
I know that the dgx spark is a machine with a lot of compromises and that it’s expensive, but the ability to run larger models is worth it to me. Everyone is different I know, but I’d really be interested in what people find these small, fast, GPU contained models good for.
uti24@reddit
Why though, what logic you have behind that?
3060 has 300-350GB/s memory bandwidth, and this thing reportedly has 500GB/s
3060 has 3060 compute power and GB10 that is reportedly is used in this one has 4070 or whatever compute power.
Does not seem to add up.
Charming-Author4877@reddit
With normal DDR5 that's not possible, they are limited to 70 GB/sec, DDR6 double that, DDR7 doubles that, HBM3 doubles that and HBM3e doubles it again.
A 3060TI at 160$ comes with almost 500GB/sec memory bandwidth, the NX1 is likely half than that, most leaks show it around 250-300GB/sec.
It probably will use LPDDR5X, soldered non exchangeable RAM.
If you combine two RTX 3060ies you'll have a significantly stronger system, at a bargain pricepoint.
I don't recommend those, it's the absolute cheapest GPU you can use today.
I'd recommend a better one.
uti24@reddit
I mean, if it would not cost like 7k$
riklaunim@reddit
One of Lenovo's laptop listings prices the N1X laptop at a bit over 4000 EUR 😉
Dr_Allcome@reddit
How much ram does that one come with?
Usual-Orange-4180@reddit
That’s not bad, my laptop was 5k dollars
uti24@reddit
that would actually be pretty good, but how is it possible if even less powerful nvidia box already 4k?
riklaunim@reddit
It's low power, so the compute will be rather low compared to DGX or high-power Apple chips.
MexInAbu@reddit
I will wait to see how is Linux support on this. Windows enshitification is intolerable nowadays.
Eyelbee@reddit
That's the idea I've always had, but knowing nvidia, they will ridiculously overcharge for this
egomarker@reddit
There's no CPU that Windows on ARM can't make consumers lose interest in.