Anybody using LMStudio on an AMD Strix 395 AI Max (128GB unified memory)? I keep on getting errors and it always loads to RAM.

Posted by StartupTim@reddit | LocalLLaMA | View on Reddit | 12 comments

Hey all, I have a Framework AI Max+ AMD 395 Strix system, the one with 128GB of unified RAM that can have a huge chunk dedicated towards its GPU. I'm trying to use LMStudio but I can't get it to work at all and I feel as if it is user error. My issue is two-fold. First, all models appear to load into RAM. For example, a Qwen3 model that is 70GB will load into RAM and then try to load to GPU and fail. If I type something into the chat, it fails. I have the latest LMStudio, and the latest llama.cpp main branch that is included with LMStudio. I also set GPU max layers for the model. I have set 96GB vram in the bios, but also set it to auto. Nothing works. Is there something I am missing here or a tutorial or something you could point me to? Thanks!

Reply to Post

12 Comments

[-]

ImportancePitiful795@reddit

Use Lemonade server either on Windows or Linux.

[-]

HealthyCommunicat@reddit

Did you troubleshoot the rocm stuff? First try turning off all settings such as kv cache quantization, and also try putting it all to ur cpu ram only and see if that works. Also go into your hardware runtime and update and download all the stuff in there as if its showing up you most likely need it, and then go out of ur way to reinstall drivers after if it still doesnt work. If you try these and tell us results I can help out more

[-]

digamma6767@reddit

So, you need to make sure to DISABLE MMAP. It causes crashes on the Strix Halo. I like LM Studio for rapid testing of different models. Makes it easy to experiment, especially since it has such an easy to use UI. Switching to Fedora 43 instead of Windows is definitely a good idea if you plan on using your Strix Halo as a dedicated LLM machine, but you're fine running Windows and LM Studio, you just won't get the absolute most out of the Strix as you could on Linux.

[-]

cunasmoker69420@reddit

start reading here, particulary the host setup instructions. You want to be on Linux to make the most of your Strix Halo system: https://strix-halo-toolboxes.com/ Then I would recommend Lemonade Server (its llama.cpp under the hood with easy model downloading, model switching, parameter setting, etc) and hook that up to Open WebUI and you'll be set

[-]

KingGeekus@reddit

Another vote for lemonade.

[-]

fastheadcrab@reddit

Did you try the disable "keep model in memory" feature in LMStudio when loading the model, you will need to have the "advanced" settings enabled. That should stop your issue. Yes Linux is better but this is a very easy problem to solve even when using LMstudio. So many bots here because they are not reading the post properly

[-]

Fit-Produce420@reddit

I'm not a bot I just don't use Windows or give a shit about getting things to work for it. So much wasted overhead when RAM counts.

[-]

dsartori@reddit

I had zero problems with LMStudio on this device on Windows. I switched to Ubuntu and LMStudio is just not working well. I’m able to run models fine with llama.cpp in containers thanks to the great toolboxes linked in another comment here, but LMStudio only shows 80GB of VRAM available though I’ve configured Ubuntu for 120 max. Too bad. Presumably it is fixable but I haven’t got it sorted out yet.

[-]

Fit-Produce420@reddit

LMstudio didn't work well for me on any distro I tried.

[-]

Drpuffncough@reddit

I just got my Framework this week, I would look into llama.cpp that is what I've been able to get mine to run. BIOS I originally set to 96GB VRAM but I ran into some issues and set to min VRAM in BIOS but set to AUTO so it grabs what it needs. I'm using llama.cpp with **Vulkan** backend (not ROCm, not LMStudio). MiniMax M2.5 UD-Q3\_K\_XL (456B MoE) — runs at \~31-32 tok/s with 65K context.

[-]

Fit-Produce420@reddit

Depending on what Linux kernel you're running you should be able to get rocm7.2 working, if you felt like it. I use rocm for image and video and it went from broken to working over the course of a couple updates.

[-]

Fit-Produce420@reddit

First off, switch to Linux when you're doing AI stuff. You can install Steam and play games on Linux as well, I don't even have Windows installed. Second, LMStudio isn't great on Linux. I heard its better on Windows but here you are. Lemonade is supposed to be AMD's easy to run ecosystem, it's a great way to get started. If you find other models you want to use it's easier just getting familiar with llama and huggingface. You can also run image or video generation, too.