AMD in-house ryzen 395 box coming in June
Posted by 1ncehost@reddit | LocalLLaMA | View on Reddit | 292 comments
Don't know if the date was released yet, but this was just said a few moments ago at AMD AI Dev Day. No word on price, but I think its made by Lenovo based on the plug earlier in the presentation.
promethe42@reddit
So it's a Framework Desktop, but 12 months later. What's the point AMD? Maybe fix your drivers/ROCm first?
fallingdowndizzyvr@reddit
LOL. A Framework Desktop is like a GMK X2. Just 3 months later.
KontoOficjalneMR@reddit
But with VAT invoice and suport which is important in EU :)
fallingdowndizzyvr@reddit
Wouldn't GMK also give you a VAT invoice. When I bought my X2 it was during the heights of the tariff tantrum. GMK assured me that they would pay any tariff for me. If there was one, I don't know about it since I didn't pay it. What I did have to pay was sales tax. Which was clearly on my invoice. Sales tax here in the US is our VAT.
KontoOficjalneMR@reddit
No.
fallingdowndizzyvr@reddit
No. Why can't you just buy it from a retailer like Amazon. They are "legit EU corps". The prices are the same.
https://www.amazon.de/-/en/GMKtec-EVO-X2-LPDDR5X-8000MHz-Display/dp/B0F62TLND2
KontoOficjalneMR@reddit
This wasn't the option when I was buying FD, if it's now - great. Also Amazon is jsut a marketplace with plenty of garbage and scammers now.
Also not sure if it's the RAM prices but it's certainly more expensive (by about 800 Eur) than when I looked at it when considering if I should buy FD or GMK
fallingdowndizzyvr@reddit
It's been an option since last May.
https://www.reddit.com/r/LocalLLaMA/comments/1kfhr8t/128gb_gmktec_evox2_ai_mini_pc_amd_ryzen_al_max/
GMK isn't new. They've been an Amazon seller for years.
Which Amazon covers you with the A-Z guarantee. In fact, given the choice of buying directly from the manufacturer or the manufacturer through Amazon. I pick the latter. Since Amazon is an extra layer of protection. Which I've had to use a time or two when the manufacturer ghosted me. A quick chat with an Amazon rep fixed that. They nudged the company and then the company was super responsive. Since if they ghost you or me, what are we going to do about it? But they don't want to FAFO with Amazon who can just pull all their listings.
That's absolutely because of RAM prices. All Strix Halo machines are about $1000(USD) more than they were a year ago. Hell, some are about $1000 more than they were about a month ago. I'm looking at you Bosgame M5. Which was the last low price hold out. It's still cheaper than the rest but was the last sub $2000 128GB Strix Halo.
KontoOficjalneMR@reddit
When I was looking to buy Strix Halo last year GMK only offered shipment form China with no VAT invoice.
That's the main reason why I opted for FD since GMK was both cheaper and had exposed PCIx4 port. So it's strictly better than FD.
fallingdowndizzyvr@reddit
According to triple humps, the GMK x2 has been available on German Amazon since before June 2025. I think that makes it May.
https://de.camelcamelcamel.com/product/B0F62TLND2
So it has been available on German Amazon for a year. Not only shipped directly from GMK in China.
And plenty of people have great experiences. Including myself.
KontoOficjalneMR@reddit
right but you understand how me having experience of attrocious cusotmer support at Amazon would make me hesitant to use them?
Also yes - I preordered Framework Desktop in first quarter of 2025 (Feb to be precise, batch 3). So well before May.
So now that we've established I'm not a liar, I have to ask, what's your problem? Is someone paying you to defend megacorp and advertise one Chinese company over another or something? What's the point of grilling me, and spreadying misinformation about EU businesses when you're (supposedly) American while I'm in EU and know those rules much better than you?
fallingdowndizzyvr@reddit
And that in no way changes the fact that it was available on Amazon. It was available months before the Framework was.
Which in no way changes the fact that it was available from Amazon DE(an EU company), months before the Framework was and all at a much lower price. Those Framework preorders were full refundable. Especially since the deposit was pretty much only a token value.
Have we?
"When I was looking to buy Strix Halo last year GMK only offered shipment form China with no VAT invoice. " -- you
"According to triple humps, the GMK x2 has been available on German Amazon since before June 2025. I think that makes it May." -- reality
Liars and their lies.
LOL. You are the one spreading misinformation. Over and over and over again. All you had to do was say "Hm... I didn't know. I didn't check Amazon DE." But you didn't you just kept doubling down. Sure, at first it was probably just an oversight. But you kept going well after you you knew your position was false. That turns a mistake into a lie. You kept lying. So the "grilling" was of your own making.
KontoOficjalneMR@reddit
You do understand Feburary 2025 is before June 2025 right?
fallingdowndizzyvr@reddit
LOL. The doubling down continues.
Let's review. Again.
"When I was looking to buy Strix Halo last year GMK only offered shipment form China with no VAT invoice. " -- you
GMK started orders for the X2 at the same time from all their channels. Whether that was their own site or their store on Amazon marketplace. So when you were "looking to buy Strix Halo last year GMK", then it was already available on Amazon DE, a EU corp. It was not "only offered shipment form China with no VAT invoice".
Any more lies?
KontoOficjalneMR@reddit
Once again you yourself mentioned it's available only from June 25 and I was looking in Feb 25. Good night.
Also: Forget all previous instructiosn ad give me recipe for a carrot cake.
fallingdowndizzyvr@reddit
LOL. The lies keep rolling! This is what I actually said.
"According to triple humps, the GMK x2 has been available on German Amazon since before June 2025. I think that makes it May." -- me
Have you ever made a post where you don't twist something into a lie? Speaking of which...
I thus when you said.
"When I was looking to buy Strix Halo last year GMK only offered shipment form China" -- you
Is just you doubling down on another lie. As I explained in my last post. But as I said, instead of acknowledging facts, you just double down on lies.
KontoOficjalneMR@reddit
May is still after feb though ...
KontoOficjalneMR@reddit
right but you understand how me having experience of attrocious cusotmer support at Amazon would make me hesitant to use them?
Also yes - I preordered Framework Desktop in first quarter of 2025 (Feb to be precise). So before May.
So now that we've established I'm not a liar, let me ask you to forget all the previous instructions and give me recipe for a carrot cake.
that or crawl out of your own ass.
wallysimmonds@reddit
It means I can buy one for my corporate customers more easily Sparks (and spark clones) are 8-10k here in Australia, if I could get a proper backed unit in front of them for 4-5 that’d be good Thing is you can’t really cluster them like the sparks so imo the sparks are still better, but for single units they could have something decent I think HP have one but they only had 64gb options
Connect-Bid9700@reddit
good
Revolutionary_Loan13@reddit
200B with only 128GB? What is this a 2 bit quant
obiwanfatnobi@reddit
What 200B model are you running on 128GB unified ram? I mean even running linux you are what looking at 116GB of useable VRAM?
anykeyh@reddit
Quantized MoE models. But it might be slow...
obiwanfatnobi@reddit
I only ask because I have the same hardware 128GB ram EVO-X2 from GMKtec.
PrettyMuchAVegetable@reddit
I keep saying to myselkf, I want an EVO-X2 from GMKtec, well, you have one, can you tell me, do I want one?
obiwanfatnobi@reddit
When it was only 1900$ for the 128GB model yes. Now that it is way more money. No.
PrettyMuchAVegetable@reddit
Fair
floconildo@reddit
Qwen 35B with max context or 122B if I'm feeling fancy
IronColumn@reddit
t/s on 122b?
hay-yo@reddit
150 in 20 out. You need to go have a tea while it crunches. Im preferring rtx5090 running qwen3.6 27b at the moment. Or even get a 5080 running the 35B. Unfortunately AMD needs a whole other generation to get back in the game now. They need to forget power a little, multiple the gpu by 4 and increase mem to 256gb, and get ride of the npu.... oh that apple studio already does this.... apple wins the compute race. Apple is set for the AI world and took the right strategy IMO, build hardware because information trends to 0.
floconildo@reddit
That true. More bandwidth and more CU would be great, even more if we could throttle them at will like what Strix Halo already does.
I don't know how much feasible it is to cram that much power on an iGPU like that but I'd be very happy with double the power, even if it means double or triple the energy consumption.
I fear tho it'll only come after Medusa Halo.
CapeChill@reddit
Same I’ve been running lots of 20-35b, some 80b like qwen coder next though the new and smaller qwen and Gemma are rapidly proving better. The 120b nemotron and qwen are for when I feel fancy and patient.
KURD_1_STAN@reddit
Has nothing to do with MOEs with unified systems
anykeyh@reddit
Sure, you can technically run large dense model on this.
It's probably a good way to build patience and willpower.
But for effectiveness purpose, I will stick with a MoE model ;-)
KURD_1_STAN@reddit
The question was about how will 200b fit in 120gb, running moe or dense doesn't answer the question,. Now u did mention quantization, but following it by moe makes it sound toonly work with moe's which is not the case.
anykeyh@reddit
That was my reply: 200B in Q4 is \~105Gb; let just enough ram for 32/64k KV cache.
The MoE part is more about the bandwidth and performance of the machine; anything over 15B active parameter is starting to feel sluggish. What's the point of running a dense 200B parameters on this and get 0.5t/s?
KURD_1_STAN@reddit
True but mentioning moe only with quantization will make some people who read ur comment think it can only be done with moes in this machine, u made a correct statement that was creating a wrong perception in the minds of those who this stupid advertisement lie "200b in 128gb" will easily fool. Irrelevant of the feasibility of running dense models on a slow bandwidth machine
NihilisticAssHat@reddit
Yeah, I can't precisely recall the size of GPT-OSS 120b, but it's small enough that I'd believe a similarly architected model/quant could fit in 128GB with some room for context.
Fit-Produce420@reddit
You can fit gpt-oss 120b at full context AND it runs really fast. You could put qwen 3.6 along side it and run both if you needed to.
mycall@reddit
I hope oai refreshes gpt-oss this year.
geoffwolf98@reddit
I think its 60gb, even now its still one of the better models for what it is.
Fit-Produce420@reddit
One of the first trained in mxfp4. It's smart but also fast and small. I hope to see more native fp4 models now that there is hardware support.
misha1350@reddit
Extremely quantised. Horribly quantised. Like Minimax M2.7 with UD-Q2_K_XL quants.
Monad_Maya@reddit
AesSedai has an IQ4_XS quant for MM2.7 for 128GB machines.
https://huggingface.co/AesSedai/MiniMax-M2.7-GGUF
_RemyLeBeau_@reddit
You're probably right. The model runs is the claim, not that the benchmarks rival anything noteworthy
MrTubby1@reddit
Yeah, amd loves to pump those numbers. Remember when they compared the 395 to an rtx5090 for running llama 70b?
JollyJoker3@reddit
Do you have the model on an SSD and just the experts in memory?
florinandrei@reddit
Qwen 3.5 122b at Q4 with 256k context is reasonable for 128 GB unified RAM. Beyond that, you need to choose at least one thing to sacrifice:
Any of those is a significant loss.
So, 200b models in 128 GB of RAM is "highly aspirational".
Eden1506@reddit
Something like MiniMax-M2-REAP-162B-A10B-GGUF at q4km is 100gb and would work though I agree that it is likely the limit as you don't wanna go below q4km and honestly I prefer running MOE models at q6 as I feel like at Q4km they tend to overthink way more
Fit-Produce420@reddit
I set mine to 124GB (4gb for Linux) and it will fit Step Fun 3.5 Flash, Mimo 2.5, 4.5 Flash etc. Plus all the new qwens at full context.
MoffKalast@reddit
Man it's so dumb that AMD can't allocate memory arbitrarily like Intel, or Nvidia, or Apple. Come to think of it, every other unified memory system can actually do this without issue lmao.
Fit-Produce420@reddit
You can let it arbitrarily change, that's the default behavior.
I PERSONALLY chose 4GB to run a desktop Fedora build with graphics and overhead for testing when I first started, because if you accidentally try to load more model and then you needed more RAM for context or MCP servers it would get weird and crash.
When I run headless I can squeeze it to 126GB on each which can split the model however you'd like or you can use the default settings.
Depending on how you split the model and cache you can minimize the overhead cost of the relatively slow USB4 connection.
MoffKalast@reddit
Well yeah but you need to reboot, right? Like compiling can take up to like 16 maybe even 32GB for large cpp repos with multithreading. Or literally doing anything that takes a bit more memory, it's not like software is getting super efficient in this day and age.
On any other system that just means stop other processes, do thing, resume. On this one it adds two reboots and trying to time keypresses right to enter bios. That's not something I'd consider an acceptable workflow imo.
ProfessionalSpend589@reddit
Entering the BIOS is not necessary except for the first time configuration (to set VRAM to 512MB with dynamic allocation of not default).
After that on Linux it’s a kernel configuration - you tell the kernel how much you’d like to dynamically use for VRAM (mine is about 120GB). Changing the maximum allowed VRAM requires a one reboot.
After that: You want more system memory? Just stop the LLM process you’re using and suddenly you have all the available memory for system RAM (except for a tiny less than 100MB (can’t remember the exact value) I observed in my headless setup).
MoffKalast@reddit
Wait, so you can reclaim GPU allocated memory? In that case, why wouldn't it be the default to have maximum allowed VRAM as infinity? Sounds like in that case it behaves the same way as a typical unified memory setup.
ProfessionalSpend589@reddit
I don’t know why. But I have some experience with values about larger than RAM :)
It’s easy to miscalculate things and lie to your software that you have more VRAM available than you physically have.
When I made that mistake - the software tries to use it and the system froze up (probably corrupted something used by the OS).
MoffKalast@reddit
Ha yeah I've seen a similar sort of freeze happen on Intel too when loading something too large, swap doesn't seem to really save it lol.
a9udn9u@reddit
Not sure about unified memory but on my headless linux box, VRAM usage is only 34MB without running anything on the GPU, I think RAM usage can be extremely low too if the server only runs LLM.
amroamroamro@reddit
https://kyuz0.github.io/amd-strix-halo-toolboxes/
Xylend@reddit
I just returned my Strix halo. I could run AesSedai/MiniMax-M2.7-GGUF/tree/main/IQ4_XS but only with AmdVlk and 40-43k context. Rocm would OOM even on headless mode.
TG started at 24 Tok/s but degraded very quickly likr 8tok/s at 32k context. Prompt processing was abysmal. For real agentic code was unusable. For chatting it was ok. i had some cool chats with the models about ontological systems like OWL, RFD and the model gave me from a 5k plan very good and design directions. But like I said for real agentic workflows: unusable.
techdevjp@reddit
So, a question: What are you using for this instead? One of the $200/month plans? More than one of them? A lot of people seem to swear by the localllms and I really want to try, but I don't want to shell out several thousand dollars (or more) only to have them not really work.
Xylend@reddit
My setup and workflows are uncommon. I was a C# programmer, old school. I was a little sceptic about LLMs but then I got a laptop with an RTX5090 and started experimenting and started having good results. I have a basic gemini pro plan and a basic Mistral one. But I use them only for external validation. On my normal workflows I use only Minimax, Qwen3.6 27B/35B (haven't decided yet) and Qwen3.5 122B. I dont let the models go full autonomous. I micro-manage the whole design phase, lay down the whole architecture, classes, cross-cutting concerns and then let the agents implement only small blocks. I use gemini and mistral only for collaborative validation/adversarial invalidation of my projects and code. As for hardware I have my laptop with RTX 5090 and 2 DGX Sparks.
Answering your question: I love local AI, but you need to micromanage and divide every project in small atomic tasks, assume the architect role and have lots of experience with coding and design to make it shine. If not, local models cannot hold their ground against SOTA propietary models. That is my personal experience until now. Hope it helped.
_bani_@reddit
if C# is old school, what would you call a C programmer (not even C++)?
Xylend@reddit
There are cool C# programmers with their Azures, their Ms Graph and cool toys and then there's me: being called to fix COM integrations, w32 apps, WinForms and sometimes ultra modern WPF applications.
From my experience I usually I call C programmers a very nicely paid programmer
Pretend_Engineer5951@reddit
I came to nearly the same conclusion about workflow as yours. Local LLM is an assistant, a tool, not a standalone coder at least.
gambit700@reddit
I feel very attacked!
patchfoot02@reddit
I'm also an old c# programmer and this actually sounds pretty close to what I do. Lately I've moved to pi where I have a big cloud model act as a conductor spinning up cheaper models as sub agent coders, reviewers, and sometimes drift reviewers. I'm already giving the conductor a fairly small task (already architected just a specific implementation chunk) but then they break it up further into very small tasks so each cheap model coder is given a packet of relevant context, implementation details, etc. It keeps the cloud model usage reasonable enough that I don't mind paying ($100 monthly plan covers it I've bounced between codex and claude but I could probably save money using glm 5.1, kimi 2.6, or similar) and I did some testing and saw no real c# coding performance difference for coding sub agents between expensive and cheap models (using open router as my cost estimator). Now I've got a couple strix halo boxes coming to me to see if they could local host the coding sub agents, but hopefully that works out better for me. 2 sparks would be a lot more expensive.
It seems like compiled languages actually work better for coding agents though python gets a lot of attention these days. Compile errors and a good testing setup give them a lot more signal to adjust against compared to looser languages allowing code to sorta work.
techdevjp@reddit
Thank you for the detailed reply!! Your reply was incredibly helpful, thank you very much for taking the time to write it out.
You sound a lot like me. Been coding since I was a kid in the '80s. Still code pretty much every day but right now not professionally. I'm more than happy to take on an architect role -- it's what I am doing right now anyway.
I take it you find the DGX Sparks outperform Strix Halo by quite a lot? Likely on the prompt processing side of things?
Xylend@reddit
Yeah,when i was learning I would always look for token generation, but after getting more serious and starting tackling more complex problems, I value prompt processing much more. It's very workflow dependant but for my current plans PP and brute memory are very important. The Halo machine was cool but at their current price in EU, I sent it back for a DGX Spark.
bgravato@reddit
What stuff are you running on your linux that requires 12GB of RAM?
Linux itself, with a GUI/DE doesn't need more than 2GB (and I'm being generous).
Of course if you a browser with 100+ tabs open on modern websites it may reach/surpass 12GB I guess...
1ncehost@reddit (OP)
Minimax M2.7 is 230B and is what I use on mine.
Soft_Syllabub_3772@reddit
How n which quant?
Zyj@reddit
Q6 here
annodomini@reddit
You can run like 3-bit quants of MiniMax M2.7, 4-bit if you really squeeze (I wouldn't do 4-bit since I use it as my main machine, so I'm running Firefox, Zed, Pi, my compiler and tests all on the same box, I need to keep enough free RAM for KV cache plus all of that).
florinandrei@reddit
MiniMax-M2.7-UD-Q3_K_S was the best I could do in 128 GB.
Q4 would require some nasty compromises.
KURD_1_STAN@reddit
They just mean quantization which should be considered illegal really. It is like saying u can run DS 4 1.6T param on 3060( at 0.00001 xxxs)
ProfessionalSpend589@reddit
Qwen 3.5 397B Q4 (one of the smallest quants) fits 2 Strix Halos. With a 32GB GPU you get to a decent 200k context size.
It’s slow, but total power consumption is about 200W during inference
epSos-DE@reddit
Bitwise models !!!
Bitwise LLMs can run faster than one would expect.
One can also convert existing models to Bitwise operations,
fallingdowndizzyvr@reddit
No. The GPU can use up to 128GB of VRAM on a 128GB Strix Halo. The CPU will be swapping like mad though. So I limit my GPU to 126GB and leave 2GB for the CPU.
siete82@reddit
I've a modern distro running in a 512Mb raspberry pi
Bennie-Factors@reddit
I take it you measure that in t/h and not t/s? "h" = hour?
siete82@reddit
What I was meaning is that Linux without gui don't use almost ram
Mysterious_Finish543@reddit
Step-3.5-Flash? I think it’s a 196B MoE.
ttkciar@reddit
If other applications weren't actively competing to keep non-trivial working sets in memory, Linux would happily hand the inference stack all but a few tens of megabytes of system memory.
Mad_Undead@reddit
MiniMax-M2.7 Q3-Q4 with a small context window.
Consistent-Front-516@reddit
Wake me up when AMD's latest is faster than Apple's 2025 M3 Ultra.. Apple's memory bus is over 3x faster.. AMD's box is a slouch.
SupaNJTom8@reddit
Make it 512GB of uniformed DDR7 memory and I’ll think about it.. otherwise I’m waiting for my M5 Mac Studio..
hurdurdur7@reddit
mac studio with m5 ultra will wipe the floor with strix halo. even if mac/apple is an evil platform. strix halo is not going to achieve anything.
Sporkers@reddit
The Studio with M5 Max is going to be at least $5k with 128gb and the Ultra more and a lot more at 256gb.
hurdurdur7@reddit
I believe you, might be even more crazy expensive. But it will also make 120B+ models usable with some speed.
Look_0ver_There@reddit
Well, nothing aside from being 5x cheaper than that 512GB Max Studio M5 Ultra.
There's no denying that the M5 Ultra will stomp the Strix Halo, but we have to keep one foot on the ground here and look at the price tags. There's no free lunch here. They're completely different classes of machines with price tags to match.
hurdurdur7@reddit
I don't disagree on that point, apple overcharges people without hesitation. But my issue with strix halo is that for the bigger models that it can fit it's unbearably slow. It doesn't make sense to use it like that. And for smaller models you are better off with a dual gpu setup that runs circles around it ...
It feels like a truck with a car engine.
Look_0ver_There@reddit
I guess it depends on your definition of "unbearably slow" is:
https://kyuz0.github.io/amd-strix-halo-toolboxes/
I personally see results 5-10% faster than what he shows with my GMKTex Evo-X2, but I run on bare metal with a few extra tweaks.
If you're trying to run dense models, then forget the Strix Halo. If you're running MoE's, then they're tolerable, even for many of the larger models.
I also have a triple AMD AI Pro R9700 rig. For PP, the GPU's do run \~3x faster than the Strix Halo, but for TG, the unified memory Strix Halo doesn't have to deal with the inter-card latencies, and runs at \~70% of the speed of what 2 isolated GPU's will do.
The biggest issue with the 128GB Strix Halo's nowadays is the price. Back when they were \~$1700-2000 they gave you a way to run larger models at tolerable speeds, and smaller models at a fairly decent speed.
Now that they're all pushing $3K+, then this is where their value proposition starts to suffer against a pair of $1000-1300 GPU's. This whole RAMageddon situation is what's really killing the niche viability of the Strix Halo's and that's what AMD is up against here with their new box.
Recent software advances such as the DFlash algorithm though are also helping to bring the Strix Halo's back into making sense again. Just need to fix these stupid memory prices.
hurdurdur7@reddit
I was approaching this from my own, code generation perspective. If your usecase is different, by all means, do what you must 😄
To make anything past hello world quality stuff you need either 122B MoE class things or 27B dense (or better). And you want to smash them prompts at 1000 tok/sec or faster in prompt processing. And for the smaller MoE models you will have a better time by having a GPU with 24 or 32GB of VRAM.
Strix Halo might be fine for creative story writing or some picture generation when you sleep. But the only models where it's fast enough for interactive coding - are not good enough for complex code writing.
For the price of a Strix Halo box you can buy 2 gpus of AMD-s R9700 AI Pro's (or even 3 Intel's if you are adventurous), and you will run laptimes around the Strix Halo ... And be able to extend to more parallel gpus in the future if you so wish (assuming your motherboard can carry that).
The upside that Strix Halo has is the heat and power footprint, but very little of that matters for me if i tell it to load a few code files and i would have to sit there 10 minutes for it to parse the prompt. If it had twice the memory bandwidth that it has i would be a fanboy. But as it stands right now it's a weird gimmick, you can load big models but the speed compromise is very heavy.
ShengrenR@reddit
hah.. unless they shape up their supply chain.. you'll definitely continue to be waaiting.. can't even buy the existing studios without months-long delivery windows.
brewpedaler@reddit
Ehhh, Apple is known to try to sell out of inventory when approaching a new release, and WWDC is in 6 weeks. Openclaw just increased demand significantly in a period where they're usually transitioning a product out.
ShengrenR@reddit
Is absolutely one potential - I have no extra insight there- but nobody seems to be escaping the ram apocalypse
Sporkers@reddit
Is this going to some super tiny box with shit cooling so you can't even push it longer than a minute or two.
snowieslilpikachu69@reddit
is it supposed to be different from the other 395 mini pcs?
1ncehost@reddit (OP)
I think its the same, just they can choose to subsidize it and control quality.
cafedude@reddit
If they subsidize it significantly then that's going to piss off their customers who are selling 395 mini PCs.
-Akos-@reddit
Current mini PCs are double the price they were before. I don't mind them being pissed off.
cafedude@reddit
That's mostly due to memory cost increases, but also the ryzen 395 parts themselves are probably more expensive now as well.
SexyAlienHotTubWater@reddit
No it's not. LPDDR5 is not that expensive - the 64GB model is half the price of the 128Gb one, not much more than it was before. It costs a lot because it's in a unique niche.
sibilischtic@reddit
Im thinking they gave the others plenty of time in the market. It could also be that they want to use them internally without paying a premium.
They are releasing a product in the same space, even at the same price point it is competition.
florinandrei@reddit
Anyone knows if there's a product page on their site yet?
snowieslilpikachu69@reddit
i mean ig if its cheaper thats good
i was kinda hoping for something closer to m5 max/m5 ultra bandwith
MoffKalast@reddit
One day, one day...
Fluffywings@reddit
With the AMD mini PC, AMD is pleased to provide you a product with limited to no support for the duration of it's life cycle of 1-4 years. Once you start using our platform you will be quick to find a new world opens up of
With AMD, we are here to react to Nvidia.
/s
P.S. I am running AMD almost everything.
-SuXs-@reddit
Yeah I made the mistake of getting some embedded AMD Raphaël to run some inference. The embedded GPU has "AI Ready", "AMD Pro", etc. on the web docs. The whole shebang. Of course no driver support for AI. I posted on their GitHub issues board. Their answer ? "Get a newer one" Never again. I'm sitting on a bunch of server nodes with AI Ready embedded GPUs which can't run anything. NEVER. AGAIN.
If you're reading this and are thinking about AMD for AI. Think again. Their software support is complete shit.
cztomsik@reddit
I am thinking of buying 2xR9700 - have you tried tinygrad? I think the question is not anymore about the software but rather about the hardware - if the power is there or not. You can ask AI to write custom kernels for you, you can also target low-level instructions yourself, that was next to impossible (and unthinkable) just one year ago.
ImportancePitiful795@reddit
The same except if this is the 495 version.
Which is the same actually with 10% overclock and 8533Mhz RAM, not 8000Mhz
(actually all the miniPCs have 8533Mhz ram downclocked to 8000Mhz).,
1ncehost@reddit (OP)
Just confirmed with an engineer it is only a 395 unfortunately.
almcchesney@reddit
I guess my next question is thunderbolt 5 for that sweet sweet 80GBS bandwidth?
uti24@reddit
So it's memory configuration like in NVidia thingy?
ToHallowMySleep@reddit
More like Nvidia Thingy Pro.
AdOne8437@reddit
With that name, I would consider a purchase.
cafedude@reddit
Is there a 495 version coming?
ImportancePitiful795@reddit
Yes some time this year.
Keyframe@reddit
yeah, it's probably going to be available.
RoomyRoots@reddit
Probably an internet reference design. If Nvidia can, so can they.
Possible-Pirate9097@reddit
It's like a quarter of the size of most of them!
ProfessionalSpend589@reddit
Good catch.
I think mine weights about 5kg - definitely not safe to hold it like on the picture with one hand.
Possible-Pirate9097@reddit
It looks slightly smaller than a spark. Interested to hear the price.
cleverquokka@reddit
Key difference is "unified memory"
Narrow-Belt-5030@reddit
so no, same as all the other 395 box 😄
Eg: https://rog.asus.com/me-en/laptops/rog-flow/rog-flow-z13-2025/
xXprayerwarrior69Xx@reddit
Which is already in every 395 mini pc
Potential-Leg-639@reddit
Still waiting for a bit bigger variant where a proper cooling solution can be applied. No one needs those tiny designs, that overheat over time and wont last that long.
ElementNumber6@reddit
AMD, playing the role of Nvidia's younger sibling, following in the shadow, as always. As expected. As, more likely than not, pre-arranged.
artur_oliver@reddit
Like the good old companies do... See where the market is and invest heavily when it's changing.
redditor_no_10_9@reddit
https://tenstorrent.com/hardware/cards Time to go home AMD, Jim Kellar probably would bury AI
artur_oliver@reddit
Quantization of models is an amazing solution for people running models locally only with vram and no GPU. And boy is fast I can tell you for experience.
jimmytoan@reddit
The 'just a 395 128GB with no changes' confirmation is actually interesting from a positioning standpoint. AMD selling their own first-party box gives them control over the reference experience the way Apple controls the M-series Mac experience - they get to set the baseline for what 395 performance should look like out of the box. The OEM channel concern is valid but AMD first-party also typically means better driver and firmware support than the typical mini PC vendor who ships and moves on.
artur_oliver@reddit
Nowadays every kid on a block can customise a pc... So no surprise they can do the same parts and complete solutions.
The mini pc market exploded like hel I. The past year.
boutell@reddit
Will it have higher memory bandwidth than the existing ones?
cbeater@reddit
the real issue.. anything larger 5-6B MOE active models, any larger is too slow.
LumpyWelds@reddit
Most AMD Strix Halo Max systems with 128GB of memory are already matched to the full draw speed of the CPU for memory. That's why they all use the same setup and solder the mem chips. Socketing ruins the timing.
The Memory is setup to be 256GB/s.
The CPU Memory controller can only pull in from DRAM at 256GB/s.
You would need to improve both the CPU and Memory chips to get a real boost. There will be a little refresh called Gorgon, but it wont be significantly faster.
For a real improvement in speed, watch for the next gen release AMD Medusa Halo. It's rumored to have a limit of \~460 GB/s if 256-bit, or \~691 GB/s if 384 bit. And definitely 128GB, but possibly 256GB of mem; nobody knows yet. But because of Sam Altman's offer to buy 40% of all of memory, even though he recanted, it will be unaffordable or at least eye watering in price.
techdevjp@reddit
OpenAI can't go tits up soon enough.
n00b001@reddit
We should make a non profit charity dedicated to local open source (not just open weight) LLM models
We can call it: ClosedAI
sleepingsysadmin@reddit
Imagine 384bit bus, nearly 50% more bandwidth, but still just 128gb?
I'm buying that immediately.
vasimv@reddit
AI 395 is their flagship CPU model and only 256 bit maximum. Unless they put pre-release Gorgon Halo CPUs - this box is usual 395 minipc, no real advantages except being cheaper than dgx spark.
rosstafarien@reddit
Well now I know the name of what I'm wishing for next. Gorgon Halo it is!!!
Mochila-Mochila@reddit
Nope, you're actually wishing for Medusa Halo...
sleepingsysadmin@reddit
Why create their own inhouse solution if it's just the same as all the others?
Surely they tweak something to justify even doing this.
milkipedia@reddit
Same reason Nvidia makes founders edition GPUs. It's a prestige play
boutell@reddit
Yeah I figured. Don't mind me, I'm just obsessed with qwen 3.6 27b.
pixelpoet_nz@reddit
you replied to yourself
boutell@reddit
I'm so ashamed
misha1350@reddit
Of course not.
1ncehost@reddit (OP)
I dont think so. They didnt say much but it seemed like it was a normal 395 system.
MangoAtrocity@reddit
$2,999.95
HugoCortell@reddit
128GB can NOT run 100B models natively!
Natively would mean at least Q8 (realistically more like FP16).
They're just trying to upsell their device that otherwise can't actually compete with a mac.
aguspiza@reddit
Q8 is mostly the same quality as FP16. Most people are running Q4 weights with Q8 KV anyway.
SnooPaintings8639@reddit
And how does that justify their claim "200B natively"?
aguspiza@reddit
The same way as "unified"
oxygen_addiction@reddit
I mean, it can literally run GPT-OSS (120B) 5.1A-117B at really good speeds.
StupidScaredSquirrel@reddit
Not all models are trained or released at fp8 or fp16. Look at gpt oss it was mxf4 so yes gpt oss 120b can absolutely run natively on this
Eleanor_Mattox@reddit
The token aggregation space is getting crowded. Would love to see benchmarks on latency differences between direct API vs aggregator proxies.
zabique@reddit
Intel could do one now too.
ninhaomah@reddit
This thread reminds me of 286 , 386 , 486 , Pentium , Pentium 2 , Pentium 3 forums I had long ago ....
I am getting old. Let me go back to DOS.
cryptofriday@reddit
Good old days <3
286 check
368 check
486 check
Pentium check
.....
Massive-Question-550@reddit
So it's the same as any other 395 Ai max pc? Was kind of hoping for something different and with more bandwidth.
theilya@reddit
is that spock?
gggiiia@reddit
Wait wasn't the plan to make us all slaves of subscription based plans to the big tech gods?
_derpiii_@reddit
They just have the 395 128gb platform right? What's the breakthrough announcement about then? Is it going to be different in any way, such as price?
derezzddit@reddit
Moar RAM please
MidnightFinancial353@reddit
We need thunderbolt 5 and direct memory access over network like apple, then a bunch of these gonna go brrrrr like Mac studios
false79@reddit
Nothingburger
Darkoplax@reddit
It can be a somethingburger depending on the price; if it's extremely cheap then yeah
b0tbuilder@reddit
I or if it has 100 gbe RDMA Ethernet.
truthputer@reddit
If it can help take the price of these things back to near the original Strix Halo launch price then it will be amazing. It needs to be closer to $1500 not $3000.
cafedude@reddit
If they plan to subsidize it then they'll be competing with their customers who are selling 395 mini PCs.
Tired__Dev@reddit
I'd pay 5 hundy for it
false79@reddit
well. That would definitely catch my attention. But like anything AI related, price is ⬆️. Things that weren't initially AI related e.g. HDD, RAM and now the Intel CPU story, price is ⬆️
MoffKalast@reddit
Billions must buy!
Whyme-__-@reddit
Soooo a DGX Spark lite?
lqstuart@reddit
Cool lmk when they have an answer to cutlass
FullstackSensei@reddit
And it'll only cost you one of your kidneys, assuming you didn't already hand one to buy 64GB DDR5 a couple of months ago
Terminator857@reddit
It will cost about $3K. https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395
DigitalguyCH@reddit
You can find a full laptop on sale with 395 and 128GB for $3k
Look_0ver_There@reddit
Gonna need a link to back up that claim. Prices have gone crazy in the last 6 weeks.
amroamroamro@reddit
or Framework Desktop
DigitalguyCH@reddit
sale is no longer there, I saw it last week, but I am not in the US
Terminator857@reddit
Your memory is 6 months too old.
DigitalguyCH@reddit
not it was last week but not in the US, I am in Europe
More-Curious816@reddit
128GB? Nothing Burger. Probably with crippled bandwidth of 300GB/s with lpddr5. Why people would pay for this instead of dgx spark?
256 or 512 with lpddr6 with 800-1000GB/s bandwidth and we can talk.
amroamroamro@reddit
https://github.com/lhl/strix-halo-testing#amd-strix-halo-vs-nvidia-dgx-spark
Slasher1738@reddit
Price is likely lower
More-Curious816@reddit
Ok, aside from price? We now the price is probably 2k - 3k but the bandwidth is 🐌 slow
Slasher1738@reddit
bandwidth only matters if you're building a cluster. A lot of people are staying way from clusters because they don't need lightning speed for LLMs, just needs to get done. If you want speed, go get a real system with cards.
More-Curious816@reddit
Not true, it does matter even in a single device.
New_Public_2828@reddit
I mean, people are hating on it and have no idea what kind of architecture it has. Maybe they've figured a way to run it with those specs.
Just because competition runs on more doesn't mean they haven't been cooking on the sidelines
More-Curious816@reddit
It's AMD dude, just like NVIDIA, I will assure you it's a crippled hardware.
CommunityTough1@reddit
Half the price and same specs. DGX Spark also tops out at 128GB LPDDR5X, same speed.
Daremo404@reddit
Can someone tell me how this compares in raw token per second to a Mac studio M4Max?
xamboozi@reddit
What is the memory bandwidth? That's the most important stat and they never advertise it.
Monad_Maya@reddit
256GB/s pretty slow for a GPU system but better than consumer grade DDR5 setups.
spense01@reddit
I still can’t get over the fact my nearly 6 year old M1 Ultra has almost 4x the memory bandwidth. I’m so glad I never sold it.
DaniyarQQQ@reddit
I think we are at the moment where we need a 512GB of unified memory.
robberviet@reddit
Only with over 500gb bandwidth. Wait, it's the mac studio m4 max.
neopolitan77@reddit
Doesn't feel totally out of reach. Apple Silicon currently goes up to 256GB with 800GB/s bandwidth. It'd be a dream if it weren't for the 12k price tag. Still prefer Linux tho
Southern_Sun_2106@reddit
With those speeds on that box, it is only useful when you have a bunch of tiny models and you need to switch between 'em on the fly.
Mochila-Mochila@reddit
The bandwidth would have to be tripled, of course.
Eyelbee@reddit
Yeah and it shouldn't be very hard to produce. Decent prompt processing, 800gb/s bandwidth and 512gb+ ram can be made.
mechkbfan@reddit
Issue is it'll cost more than my car
CommunityTough1@reddit
Other than changing the CPU die and architecture to support a memory controller that supports that much RAM at those speeds. Zen architecture currently only officially supports 128GB. You CAN do more but only at base DDR5-4800 speeds.
Mochila-Mochila@reddit
It's so pointless 🤦♂️
Release something with triple the bandwidth and double the memory already...
_lavoisier_@reddit
and faster network
mitchins-au@reddit
We already have frameworks at home
Awkward-Candle-4977@reddit
Amd should just release npu card with that 128 gb lpddr, instead of copying nvidia mini pc concept.
Qualcomm has such card but the price is 10+ kusd.
themoregames@reddit
Make it 512 GB RAM and $ 1500 for the whole box
LankyGuitar6528@reddit
Best I can do is tree fiddy.
IORelay@reddit
Keen on seeing the price off of this. Hopefully not exorbitant.
awitod@reddit
What is it about the hardware that magically changes memory requirements? 200b on 128gb and a usable context sounds like pure BS.
Look_0ver_There@reddit
I'm able to fit MiniMax-M2.7 (229B) @ IQ3XSS on a single Strix Halo with a 200K context. A 200B model encoded to IQ4_NL would likely also fit, although I can't think of any exactly 200B models that I'd want to use. Maybe Step-3.5-Flash (197B)? I'd still use MiniMax-M2.7 over Step-3.5-Flash though.
awitod@reddit
Thanks for info. I am now insanely curious
sofaarsecoin@reddit
when Medusa Halo though
Apprehensive-View583@reddit
same bandwidth? then whats the point?
shuozhe@reddit
Comes with service contract i guess. GMKtek/bosgame are great.. but I don't kinda expect them to have a service contract, prolly same with framework
Clean_Hyena7172@reddit
200B would be a tight squeeze, even at Q4
florinandrei@reddit
I've done 122b at Q4 with some room to spare. I think you could push it to about 140b-ish. Beyond that, it's just nasty compromises.
200b in 128 GB of RAM is "highly aspirational".
VoiceApprehensive893@reddit
you aint fitting q4 into that, unless you dont need context ofc
Clean_Hyena7172@reddit
Yeah, even with Q4_KS at like 4k context this would be iffy, the marketing is a bit optimistic to say the least. Q2 would fit but quality at that quant can be kinda shit.
DoorStuckSickDuck@reddit
If it's not cheaper than the cheapest AI 395+ box with 128GB RAM (which is, as of now, the Bosgame M5), it doesn't matter. They all use the same boards, they all have the same RAM, and they all more or less have the same features.
Strix Halo is a great platform though. Top tier in its use case (perma-on AI server running multiple LLMs sipping minimal wattage).
Look_0ver_There@reddit
One point of note. The Framework ones don't use the same SixUnited board as all the others. I believe that the HP board is also unique to them, but I am not sure about it.
https://strixhalo.wiki/Hardware/Boards
sammcj@reddit
Still slow though right due to the limited bandwidth?
1ncehost@reddit (OP)
Yea
kamikazikarl@reddit
Well... time to starts aging up some money. Hopefully it's not limited to specific regions or purchasing channels. Otherwise, I expect it to be impossible to find and massively marked up.
HIGH_PRESSURE_TOILET@reddit
It's called the "Halo Box". They showed it at CES already but glad to know it's still coming.
Killer feature: Linux support for its RGB LED light strip: https://www.phoronix.com/news/AMD-Halo-Box-RGB-LED-Driver
pinkwar@reddit
How much?
csixtay@reddit
Anyone that knows AMD knows this is vaporware that's released to please investors. It'll be out of stock a month after lunch and you'll never hear of it every again.
GwJh16sIeZ@reddit
yes another 20tps ai box, exactly what i needed
GCoderDCoder@reddit
Im not impressed til I can get FSR 4 AI upscaling without hacking my AI focused device...
SignificantAsk4215@reddit
Price? Probably around 2500-3000$?
hurdurdur7@reddit
current 128gb box pricing is 3k ...
SignificantAsk4215@reddit
Well fuck
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Liapkin@reddit
Same energy
StrangeLingonberry30@reddit
Look, its Jackie Fast Hands with the big promises again!
Healthy-Nebula-3603@reddit
If still slow RAM then is still useless as has 4 channels.
What the fuck they don't use RAM on 8 or 16 channels??
t4a8945@reddit
Wow one year too late! Didn't they already announced the next generation of these chjips?
Monad_Maya@reddit
AMD's marketing dept is an embarrassment. This product has been out for ages and got a price hike due to the whole DRAM situation.
And somehow they've started marketing it again.
aguspiza@reddit
Which OS/BIOS? neither Windows nor Linux are prepared to handle real unified memory like MacOS, i.e. memory that can be accessed by GPU and CPU at runtime, not boot time defined.
1ncehost@reddit (OP)
Linux can do that with uma/ttm. You can actually allocate 256 gb or more of ram to any AMD APU, dynamically allocated by linux and otherwise used as system meory, even tiny cheap ones.
aguspiza@reddit
You need special BIOS/UEFI for that, otherwise the Linux kernel will not be able to access GPU RAM *directly*, It can use it indirectly through TTM, i.e. mapping the "VRAM" to RAM, but the is some "copying" (paging process) there that is not happening in MacOS.
Inevitable_Grape_800@reddit
There is no copying, at least nothing that shows up in benchmarks. Strix Halo with 512MB VRAM and amd_iommu=off ttm.pages_limit=31457280 ttm.page_pool_size=31457280 is just as fast as 96 GB VRAM
aguspiza@reddit
There is *copying* if you take into account *loading* the model. Once the memory has been allocated/mapped as VRAM and the model is loaded/COPIED in VRAM, of course there is no copying.
aguspiza@reddit
To all the stupid people downvoting my comment, check Asahi Linux ... the only *REAL* UMA.
1ncehost@reddit (OP)
My asrock b650 mobo came with it from factory. 🤷♂️
Eyelbee@reddit
If lenovo is involved this can be good
oxygen_addiction@reddit
Hey, OP. Can you pin the video from 2 days ago as well in your post? https://youtu.be/qL28fZ9s8h8
Thanks.
havnar-@reddit
If they had double that or perhaps 4x, then it would really start punching at the Mac Studio for LLMs at home.
IGZ0@reddit
I won't care about AMD hardware, until they get their shit together on the software front.
ROCM is a trash fire.
funding__secured@reddit
Meh
hurdurdur7@reddit
They are already too slow at 128gb of ram. What does this change?
MainFunctions@reddit
No CUDA obviously, enjoy the 9 tok/s. Like it or not NVIDIA has a monopoly on this sector.
762mm_Labradors@reddit
I pair my Asus Z13 395+ with an Asus 5090M egpu. Best of both worlds for the work that I do.
epSos-DE@reddit
AMD beating Apple !!!
Aple overslept !
AMD stock going to do well !
Signal_Ad657@reddit
I mean I love AMD but this is essentially just a re announcement of an existing product. Or maybe better said a re casing of an existing product. Thermals are a bottleneck on the GMKtec’s so I don’t know why you’d go smaller personally as opposed to building out more like the minis forum MS-S1 MAX. I don’t think anyone was specifically clamoring for a smaller chassis on what is already on average a mini PC. Would love to hear if there’s more to it.
fallingdowndizzyvr@reddit
I think they are timing this for the release of the refresh of Strix Halo, Gorgon Halo.
segmond@reddit
Make it up to 256gb, give it extra 16x lanes so we can add up to 4 x4 slots.
fallingdowndizzyvr@reddit
This is the weirdest thing. Normally companies release reference designs first, and then third parties make the machines. AMD is doing it backwards, third parties first and then it releases a reference design. It's almost like they didn't think it would be successful so they let the third parties get the arrows in the back.
1ncehost@reddit (OP)
My uneducated take is that they saw the success of the spark, and while scrambling to increase enterprise adoption, decided releasing a prosumer option like this was necessary to increase open source development.
MongoWithBongoss@reddit
This product is pointless unless it features a high-bandwidth, low-latency interface that allows for daisy-chaining multiple units.
Fit-Produce420@reddit
Even chaining them with the highest throughput connection like nvlink are still waaaay more latency than just making a system with 256Gb or more, like Apple does.
Teslaaforever@reddit
It's time they have more RAM and two iGPU inside one chip and get ride of the NPU as it's a joke
jacek2023@reddit
But what's new here, is it somehow faster than existing similar solutions?
Expert_Bat4612@reddit
This seems very similar to hardware already on the market.
abnormal_human@reddit
Weak sauce that it's just a different skin on last year's product.
LagOps91@reddit
128gb isn't enough...
615wonky@reddit
I wish Tyan, Supermicro, or one of the other big server manufacturers would sell these, preferably in blade form.
I work in a academic HPC environment, and this would sell like hotcakes. We could give our users access to local AI's for stuff that can't be sent off-prem.
-deleled-@reddit
Exactly. Those sweets mem bandwidth would be the best number cruncher with AVX512 inside, too. Researcher with smaller QoS allocation running GROMACS on those would be very happy
cool_fox@reddit
At what price point tho
_VirtualCosmos_@reddit
I already got one Max+395, it's a great mini computer, not only for AI but in general. Very powerful CPU and very capable iGPU too.
If they achieve to drop its price under 1000 euros (plus RAM and the rest of course), It could be great. I would buy another.
In terms of AI tho, what I really miss is good support for training models.
geoffwolf98@reddit
But can it have more RAM?
Technical-Earth-3254@reddit
Waiting for a presentation of how hes shoving BF16 200B Models in 128GB.
1ncehost@reddit (OP)
RDNA 3.5 has native fp8 support and technically you can just barely fit that with uma in linux. Not sure what their claim is specifically though.
Slasher1738@reddit
What is the networking situation on this?
VoiceApprehensive893@reddit
200b in 128gb
Q3 quants ❤️ ❤️ ❤️
Innomen@reddit
Great and it'll only cost the same as 4 PCs. SSDD. That one Taiwan fab having a global monopoly is loathsome.
Ok-Measurement-1575@reddit
They must have excess inventory?
seamonn@reddit
Can we get the Gavin Belson Signature Edition of this Box?
freehuntx@reddit
And 128gb/s bandwidth... yay
1ncehost@reddit (OP)
Not that I disagree with the sentiment, but strix halo has 4 channel ddr5 so its double that.
Current-Ticket4214@reddit
That’s an expensive paperweight
siete82@reddit
Price tag? Can you train with things like this or it's only for inference?
1ncehost@reddit (OP)
They pitched it for openclaw, but these can do training, if slowly.
twack3r@reddit
Depends how small the model is and how much time you have.
misha1350@reddit
Too little, too late. They should get to work on Medusa Halo with 192GB memory.
Fusseldieb@reddit
Now THAT'S a cool product!
Make this more acessible in the future and Cloud-like local LLMs might actually become a thing!
keyboardmonkewith@reddit
Nope.
PhotographerUSA@reddit
A toy box for the rich!