How are you guys finding the GMKtec EVO-X2 128GB? Any regrets?

Posted by Sea-Championship2939@reddit | LocalLLaMA | View on Reddit | 26 comments

As the title says, I kind of am.

My unit runs pretty hot and just isn’t performing as well as I expected.

I’m trying to run some 70B models and I’m not satisfied at all.

I’m seriously considering returning it and going for a Mac Studio M4 Max 128GB instead.

With the recent updates to Exo and MLX, you can now cluster multiple Macs together and run truly massive models, something the EVO-X2 just can’t compete with.

What do you think? How is your EVO-X2 holding up a few months after purchase?

Also should I just wait for the Mac Studio M5 in June? Apple releases their quarterly earnings report on April 30th, so maybe they’ll announce some release dates then…

[-]

Worldly_Feeling_4697@reddit

My machine says 128gb but I only get 64gb in vram. Not sure if it will increase automatically if I load larger models. Anyone know this answer?

MOE is better than dense.

[-]

zrebar@reddit

Absolutely LOVE my Strix Halo (GMKtec EVO-X2 AI Mini PC AMD Ryzen™ AI Max+ 395). Unlike the NVIDIA one, this serves as both my daily machine and also for AI/ML tasks. I upgraded the cooling though.

I have no experience with Apple though, not a fan of anything that doesn't allow me to run my sweet Linux 😄

[-]

SirDVV@reddit

Did you have crazy temps at first? What did you tweak to fix? What was the improvement?

[-]

StardockEngineer@reddit

These posts have to be AI generated. No one is buying a whole LLM machine and running an old ass model at 70b

[-]

Sea-Championship2939@reddit (OP)

Haha I wish this was AI-generated, would’ve saved me a few hundred bucks and a lot of frustration.

Real user here, real machine, real heat issues, and real disappointment trying to run 70B dense models. I’m not saying the EVO-X2 is trash (clearly a lot of people are happy with it for MoEs), I’m just saying it’s not delivering what I personally expected for my workload.

That’s why I’m considering returning it while I still can.

[-]

peanutbuttergoodness@reddit

Can you say more about why you aren't happy? I'm considering buying one of these today. Was waiting on Apple, but with them removing 128/256Gb configs, I see no reason to keep waiting.

What would you buy indtead of you return your EVO-X2?

[-]

MelodicRecognition7@reddit

https://old.reddit.com/r/LocalLLaMA/comments/1s0g8wb/gatekeeping_in_ai/

[-]

HopePupal@reddit

100%. always check the post history: this guy's normal posting is the text equivalent of pointing and grunting.

[-]

Look_0ver_There@reddit

I have two of them. If you run 70B dense models then you're going to have a very bad time. Heck, even 27B or 31B dense models are pretty slow. The machines just don't have the memory bandwidth required to run those at speed.

These machines run best with MoE models. MoE models have gotten so good now that I'm sort of puzzled as to why people insist on taking a "70B dense or nothing!" attitude.

Run a Q5_K_M quant of Qwen3.5-122B-A10B, or an IQ3_XXS quant of MiniMax-M2.7 if you want to see about the best that these machines can do in terms of model intelligence and not be horribly slow at it.

In short, your problem is your model selection. Something something "horses for courses" and all that.

[-]

DramaKlng@reddit

Wouldnt deepseek 4 flash (due to MoE) run pretty good if you use two gmktec ?

[-]

Look_0ver_There@reddit

It may do. It's all encoded as a mix of FP4 and FP8 though, neither of which are natively supported by Strix Halo APU's. It would need to be requantized to an integer format to get any real speed out of it, but it looks like that's not supported by llama.cpp yet as evidenced by the lack of integer based GGUF's. There are some FP based GGUF'S on HF, but they both require ik_llama.cpp and native FP4/8 support, so basically I'm just waiting for the llama.cpp team to pull a magic trick out of their hat at this stage.

[-]

DramaKlng@reddit

Oh shit i didnt notice that lol but yeah they are only couple days old

[-]

Look_0ver_There@reddit

I suppose someone could always up-quant the weights to BF16, and then quant back down to the integer quants. That would work, but there no telling how much damage that would do to the model. Maybe I'll just give that a go today just for fun and see what happens.

[-]

Look_0ver_There@reddit

Replying to self. It looks like DeepSeekV4 support was added to Transformers only just yesterday. Still going to be a few days before a GGUF port can be added it seems.

[-]

tmvr@reddit

I’m trying to run some 70B models and I’m not satisfied at all.

and

real disappointment trying to run 70B dense models.

OK, let's play - which 70B models are you running in April of 2026 and why?

[-]

dsartori@reddit

I've had mine for about five months now. It's now my sole inference source for work. Midsized MoEs are the way to go to get good performance out of them. Qwen3.5 and now Qwen3.6 have made it a viable platform for just about everything I do with LLMs.

70b dense models and larger will certainly run poorly on these devices.

[-]

Kulqieqi@reddit

There's GMKtec EVO-X2 with 64gb for half the price, guess it's better deal than 128gb for horsepower 395 has.

[-]

dsartori@reddit

I would not go the 64GB route, especially running Windows. 96GB is as low as I would go if you want to target small MoEs only (27-35B). 128GB is worth the purchase if you have the resources as it allows you to access the midsized class of MoE models (100-130B).

Qwen3.5-122B was my daily driver until 3.6-35B came along with comparable output quality and much faster prompt processing.

[-]

sittingmongoose@reddit

FYI qwen 3.6 27b dense just came out and supposedly smokes 35b. Just released like an hour ago.

[-]

Sea-Championship2939@reddit (OP)

Oh damn, a 27B dense that supposedly beats the 35B? That’s interesting timing. I’ll grab it tonight and benchmark it against what I’ve been running.

If the new smaller dense models are that good I might actually keep the EVO-X2 after all — especially if they run cool and quiet. Thanks for the heads-up!

[-]

dsartori@reddit

I see that. I’ll likely wait for the midsized MoE just because they perform so much better on Strix hardware.

[-]

Sea-Championship2939@reddit (OP)

Yeah I saw the 64GB version is basically half the price right now. I was tempted for a second, but after reading more about the memory bandwidth on Strix Point I decided against it. 64GB feels way too tight once you start loading a 70B model + context + KV cache.

So the 128GB model gives me a bit more breathing room. Still not 100% convinced it was the right call though.

[-]

Sea-Championship2939@reddit (OP)

So the 128GB model gives me a bit more breathing room. Still not 100% convinced it was the right call though.

[-]

Sea-Championship2939@reddit (OP)

Thanks for the detailed experience, man. Five months in and it’s your daily driver.
that’s actually really good to hear. I’ve only had mine a couple of weeks and I’m mostly banging my head against dense 70B models which is clearly the wrong approach on this hardware.

I’ll definitely try the midsized MoEs you mentioned Qwen 3.6 sound promising. The heat is still my biggest issue though; even at 40-50% load the fans spin up like crazy and the chassis gets uncomfortably warm. Have you found any good undervolting or power-limit tweaks that help keep temps down without killing performance?

[-]

CryptographerKlutzy7@reddit

No regrets at all.... I've been using it to train small models, and be my daily coding box, AND do gaming. No regrets at all. Just don't try to decode video at the same time as anything else.

[-]

DataGOGO@reddit

Nothing stopping you from running two+ EVO's in a cluster, but just like with the mac's, the interconnection is so slow it defeats the purpose.