Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage

[-]

Puzzleheaded_Base302@reddit

Hopefully, this don't end up in the same boat as B70. Good hardware spec, terrible software implementation.

The bottom-line is that the marginal cost on electricity should be on par as cloud model. If the electricity cost alone is more than cloud API calls, it feels stupid to run the GPU locally.

[-]

HettySwollocks@reddit

Which body limb do they require, just make sure there's a spare for the power company.

[-]

That's not a lot of bandwidth for that much memory I would argue? Not an expert but I would think that for llm inference bandwidth per GB is quite important? (e.g. how fast can you do processing over the full memory)

[-]

FullstackSensei@reddit (OP)

My thesis is this is going to be a relatively low cost card. Depending on price, it could be great value.

It has practically the same memory bandwidth as a 4080, and more memory bandwidth than a 5070.

[-]

vasimv@reddit

I don't think it would be low cost card. Look at size of the die and whole chip. May be i'm wrong but i'm sure it would cost few thousands just the chip alone and card's price somewhere in $8000..$15000 range.

[-]

FullstackSensei@reddit (OP)

We don't really know how big the die. The rendering from Intel doesn't show scale. People just blew that up to the package size, but it's physically impossible to get a single chip that big. That's way larger than reticle size.

The package size needs to be hat big to accommodate the 640 bit wide memory interface. If you think about it, it's not much larger than a SP3 Epyc, which has an eight channel memory interface (512-bit).

Still, depending on how much compute it has, an 8k part for a 160GB card isn't bad if it has enough TFLOPS to saturate a 700GB/s (~600GB/s in practice) memory Interface. Remember that at most GPUs barely crack 50% of their theoretical memory bandwidth.

[-]

BoobooSmash31337@reddit

Half their theoretical bandwidth?

[-]

Ok_Mammoth589@reddit

Nvidia and amd have released high vram gddr cards. They're both extraordinarily expensive

[-]

ttkciar@reddit

I was thinking the exact same thing. My lowly (and ancient) MI60 has more memory bandwidth than that, and memory bandwidth is critical for fast inference.

On the other hand, all that memory would support large batches for batched inference, which should translate to high aggregate tokens/second, even though tokens/second for any given instance would be low.

That should make it appealing for hosting mid-sized models with a high subscriber count, or for batched data processing. There will be customers for this product.

[-]

ziphnor@reddit

Excellent point. Does make it less interesting for hobby llm inference.

[-]

a_beautiful_rhind@reddit

If it's really 700gb/s I'm not gonna shit on it. More than used servers were pulling. As long as the price is right.

[-]

ttkciar@reddit

Yep, this. If I could pick one up for less $$ than two second-hand 64GB MI210, I'd be tempted, especially if its peak power draw is less than the dual MI210's 600W.

One question nobody's addressed yet is, how would it be for training? Its training speed would be constrained by its low memory bw, but if its compute logic is power-efficient and natively supports the BF16 datatype, its cost to power and cool should be disproportionately low. That might be a reasonable trade-off (low training speed vs low joules-per-parameter training cost).

[-]

ziphnor@reddit

My point was mostly that its easy to get blinded by the bandwidth number and forget that it has to scale with the amount of memory to really make sense. I mean a 3090 or dual 5060ti provides higher bandwidth per GB than this. Dual RTX 6000 Pro has a combined 3.6 TB/s.

I mean I am sure hoping to see this priced as something hobby friendly, but when i read "datacenter GPU" i kind of doubt it 😄

[-]

a_beautiful_rhind@reddit

Yea, I think that last part is the rub. If they were reasonably priced it would be fine.

Never heard anyone spec out b/w per GB tho. This is going to beat a 5060ti. People were praising all the strix stuff and it's way worse.

[-]

ProfessionalJackals@reddit

My lowly (and ancient) MI60 has more memory bandwidth than that, and memory bandwidth is critical for fast inference.

Depends on how you look at it ... Your PG is not going to be faster when your forced to stack multiple cards into a system, with a slow PCIe X bus. All the communication overhead.

And its going to be way faster to fit a model into 160GB, then having a 32GB card + whatever on your system ram.

Or think of it like this ... Sure, you can run a heavy model with 512GB system memory, but now you can run 4 cards as much faster speeds to get more benefits. PP is going to be way better, PG is still going to be 4x to 5x faster because of the bandwidth and it being on card.

Better then what we have now, no? All depends on the price and future driver support.

[-]

ttkciar@reddit

Oh yes, having more "fast enough" memory is definitely nice to have. I am constrained far more by my MI60's 32GB than by its 1TB/s bw.

The point is, though, that the intended customers are datacenter users, and this GPU's bw is a tiny fraction of its competitors. The MI350P (also a PCIe card GPU) is 4TB/s, for example.

That means it will need to find the correct market niche best suited to its capabilities (and limits), and in my comment I identified two such niches.

Regarding the limitations of PCIe as GPU-GPU interconnect, layer-wise splitting is the unsung hero circumventing that problem. Layer-wise model splitting imposes minimal interconnection requirements, and scales inference speed almost linearly as batch sizes grow large.

Layer-wise splitting is less desirable for unbatched inference, of course, because for a single instance only one GPU is used at a time, resulting in no speedup whatsoever.

Since both of the market niches I identified in my previous comment involve batched inference, this would make it a good match with layer-wise model splitting, which means the lack of a dedicated interconnect fabric wouldn't penalize it in practice.

[-]

Wolvenmoon@reddit

With Xelink and TP/EP it should be a banger...for folks with deep enough pockets to get several. It'd be funny if they managed to launch it for $2k. They're not going to, but it'd be hilarious.

[-]

This_Maintenance_834@reddit

160GB is just a hair away to run deepseek-v4-flash locally.

[-]

Monkey_1505@reddit

Flash can be selectively quantized to under 100gb if you want.

[-]

corruptbytes@reddit

it could run the q2-imatrix on ds4 which could be decent, not sure if anyone has used it (i use the q4-imatrix and i love it but it's about 200GB of ram)

The 2 bit quantizations provided here are not a joke: they behave well, work under coding agents, call tools in a reliable way. The 2 bit quants use a very asymmetrical quantization: only the routed MoE experts are quantized, up/gate at IQ2_XXS, down at Q2_K. They are the majority of all the model space: the other components (shared experts, projections, routing) are left untouched to guarantee quality.

[-]

This_Maintenance_834@reddit

ds4 only work on mac?

[-]

corruptbytes@reddit

nope, metal, nvidia cuda, and amd rocm supported

intel might need a fork, seems possible

[-]

This_Maintenance_834@reddit

too much quantization at 2bit.

[-]

blastcat4@reddit

I could run a bunch of qwens!

[-]

FullstackSensei@reddit (OP)

Or perfect for a 120B model.

You don't design hardware for a given model size. It takes years to design, verify and manufacture a piece of silicon, while models have the shelf life of a pack of butter.

[-]

This_Maintenance_834@reddit

there is no exciting models on the market at 120B at present (yes, qwen3.5-122b-a10b exists, but qwen3.6-27b perform better).

deepseek-v4-flash is a exciting model. if a single gpu can run deepseek-v4-flash, that should be a good reason for easy sale.

[-]

kiwibonga@reddit

Intel engineers can't wait to fumble the drivers for that.

[-]

Ok_Mammoth589@reddit

What're you talking about? You don't like needing to update your kernel to the bleeding edge in 6 months just to get bugfixes needed today, to service software that's on an lts kernel?

[-]

Terminator857@reddit

How many can I buy with the nickel in my pocket?

[-]

More-Curious816@reddit

I found lint

[-]

Caffeine_Monster@reddit

Lintel Inside

[-]

Creative-Type9411@reddit

🎶🎵🎶

[-]

Caffdy@reddit

about three fiddy

[-]

cniinc@reddit

Maybe we can get private, crowd funded data centers that are run in some apartment complex, where you buy into access with a tailscale connection or something, and you crowd fund the electricity and housing and cooling costs with a monthly membership.

There has to be an abandoned mall somewhere that can be cleaned out and used if we can figure out the bandwidth part

[-]

brakx@reddit

Isn’t this just a worse version of vast.ai?

[-]

cniinc@reddit

Never heard of it, that's cool. Is it effective? How is it compared to using a cloud LLM?

[-]

brakx@reddit

I mean it’s going to be worse. You’re renting gpus not a service. You get what you pay for in most cases. But it can save money if used correctly.

[-]

xeroskiller@reddit

"Have an extra B200 sitting around?"

Oh yeah, lemme just go grab that lol

[-]

One-Employment3759@reddit

but how does this sidestep the general RAM shortage at all.

[-]

FullstackSensei@reddit (OP)

It absolutely side steps RAM shortage. HBM has terrible yields. Some articles say as low as 30%. But even if the number was closer to 70% (nobody would say yields were terrible if that was the case), that means you get 30% more usable memory if you're paying the same price per GB to the memory maker. If yields are 50%, the 160GB on that card cost about the same as 80GB of HBM.

[-]

ProfessionalJackals@reddit

Its not only yield, is that HBM needs more wafers just for the substrate. So even if you get 70% useful yields, your eating away at the global wafer production. Aka wafers that might have been turned into memory, are turned into the base layer for HBM.

[-]

fallingdowndizzyvr@reddit

Because the big shortage is in HBM, which is what's causing the "general RAM shortage" since companies are cutting DDR5 production to make more HBM. But even with that DDR5 is easier to get than HBM. HBM takes a lot more resources to make than the same amount of DDR5.

[-]

One-Employment3759@reddit

It doesn't matter though, because it has caused a shortage and price spike for LPDDR5X and DDR5 (different chips). So you are actually better planning for HBM because HBM is the thing being made by fabs now. No one is making enough DDR5.

[-]

fallingdowndizzyvr@reddit

No one is making enough DDR5.

More DDR5 is coming online than HBM. HBM is resource intensive and harder to make. Less people are capable of making it. A lot more people are capable of making DDR5.

https://wccftech.com/another-chinese-dram-maker-breaks-into-ddr5-memory-mass-producing-64gb-rdimms/

[-]

One-Employment3759@reddit

Oh wow, finally some good news, thanks!

China to the rescue again.

[-]

Silver-Champion-4846@reddit

LPDDR5X isn't that for consumer laptops? Will we plebians get anything?

[-]

ImportancePitiful795@reddit

LPDDR5X is been used even by NVIDIA on eg GH series.

[-]

fallingdowndizzyvr@reddit

Yeah but it's used the CPU half of GH. The Grace part. The GPU, Hopper, uses HBM. This Intel GPU uses it for it's VRAM.

[-]

j_osb@reddit

Meanwhile AMDs version of that, the MI300A is pure HBM.

Crazy card by the way, would love to have one but... money.

[-]

fallingdowndizzyvr@reddit

To be clear, the MI GPUs are well... just GPUs. Hopper is the GPU for Nvidia. That is pure HBM too. Grace Hopper is a solution that includes a ARM based CPU, Grace, with a Hopper based GPU, Hopper. Its basically a SBC while the MI300A is just a GPU. You can just get the GPU portion of GH, separately. Like the H100. That's what's comparable to the MI300.

[-]

Kamal965@reddit

No, not the MI300A. The MI300 and MI300X are normal GPUs. The MI300A has 24 Zen 4 CPU cores and CDNA 3 GPU compute units in a single package, sharing 128GB of unified HBM3 memory.

[-]

fallingdowndizzyvr@reddit

Gotcha. I missed that.

[-]

Kamal965@reddit

All good lol. AMD's really bad at naming things, honestly.

[-]

spaceman_@reddit

No, MI300A are APUs, with 24 CPU cores, a 228CU GPU and 128GB of unified HBM3 at 5200GB/s. All for the power budget of 550W.

It was launched in 2023 and is basically what Strix Halo and DGX Spark would want to be when they grow up.

[-]

fallingdowndizzyvr@reddit

Gotcha. I missed that.

[-]

j_osb@reddit

No, the MI300A is an APU featuring 24 Zen4 (iirc) cores. And a pretty big CDNA3 228CU CDNA3 CPU. Shared 128 gb of HBM3 for a total bandwith of 5.3TB/s.

They're actually the main thing powering the El Capitan Supercomputer!
I hope that I can get my hands on one of them in a decade or so lol

[-]

fallingdowndizzyvr@reddit

Gotcha. I missed that.

[-]

Silver-Champion-4846@reddit

GH is the cpu/gpu hybrids right? Basically server-equivalents of the DGX Spark or something stronger than it, i.e Spark doesn't have a good gpu

[-]

fallingdowndizzyvr@reddit

Not really. GH is a CPU + GPU that just happens to be connected by a really fast interconnect. The CPU has it's own RAM. The GPU has it's own VRAM. DGX Spark is not that. It's an APU using the same shared RAM for both the CPU and GPU.

[-]

Silver-Champion-4846@reddit

Oh nice and how practical is that? What does the cpu do that the gpu isn't good for?

[-]

fallingdowndizzyvr@reddit

Ah... what does the PC do when you have a Nvidia 5090 plugged into it? The computer stuff. That's what Grace does. Grace Hopper is a complete solution. Not just a GPU card.

[-]

Silver-Champion-4846@reddit

Interesting

[-]

FullstackSensei@reddit (OP)

Not really. They're mostly regular DDR5 chips, but driven at much lower power levels because the traces are not sockered and much much closer to whatever uses said memory. The proximity and lack of DIMM help maintain signal integrity, which in turn enables said lower power levels

[-]

Silver-Champion-4846@reddit

Thanks for the clarification. So these things use ddr5 ram instead of gddr vram? Also will we consumers get anything

[-]

vasimv@reddit

Well, that is step in right direction. Next they should do even bigger chip to accomadate 40-60 DDR3 chips from old memory sticks.

[-]

LocalLLaMa_reader@reddit

Fro the article... "Intel is currently targeting customer sampling for its Crescent Island GPU for the 2H of 2026, so we'll definitely learn more about the GPU in the coming months."

I would love this for a good price point... if AMD can package 128 GB in the form of Strix Halo (same LP5X if I understand) for \~2k$, then this card may also be in the 4 digits and hence beats out the RTX 6000's record 96 GB VRAM on a single PCIe device under 10 grand (excluding that 141 GB thing...). Of course not comparable in the slightest, but the price and the software stack will show whazzup.

I love the single, lonely USB-C hhaha

[-]

fallingdowndizzyvr@reddit

I would love this for a good price point... if AMD can package 128 GB in the form of Strix Halo (same LP5X if I understand) for ~2k$

They can't. They could. But that was then and this is now. That's $3K now.

Of course not comparable in the slightest, but the price and the software stack will show whazzup.

There's been no evidence that Intel is capable of that so far. Look about the recent B70 for evidence of that. Even at a lower price it's not as good value as the higher price R9700 since it's so much slower.