Starter asking for guidance

Posted by HermanHMS@reddit | LocalLLaMA | View on Reddit | 8 comments

Hello everyone!

I’m new here as I have decided to go local. My main goal is to run vulnerability research on open-source software. I have bought GMKTEC EVO-X2 Ryzen AI Max+ 395 128GB RAM 2TB SSD and I plan to install ubuntu on it to run llama.cpp . Im planning to run openclaw and two models at the same time: llama 4 scout as master brain and qwen 2.5 coder for code analysis engines.

Do you have any tips/advices?

Thank you in advance!

[-]

Ulterior-Motive_@reddit

Don't bother with either of those models, both of them are ancient by LLM standards. You're better off using the latest Qwen3.6 or Gemma4 models.

[-]

HermanHMS@reddit (OP)

Which exactly would you recommend? I chose llama 4 for massive context windows and read that coder 2.5 will be better at php cose analysis than 3

[-]

Ulterior-Motive_@reddit

Llama 4's large context is largely theoretical and in practice it'll rapidly degrade. Most models are like this actually, but newer ones tend to be closer to advertised. Personally, I'd use a BF16 qaunt of Qwen3.5-27B, maybe HauhauCS's uncensored version if I was getting too many refusals, and Qwen3.6-35B-A3B if 27B was too slow.

[-]

HermanHMS@reddit (OP)

Thank you for your input! Would that also mean if i ran llama 4, but with limited context window to for example 400k. It would be better at reasoning? I mean if everything for the task fits in those

[-]

qubridInc@reddit

Nice setup skip dual-model complexity at first, run a strong single model like Qwen 2.5 Coder or Qwen 3.6, get your pipeline stable, then layer agents/tools once it’s reliable.

[-]

HermanHMS@reddit (OP)

Thanks!

[-]

ai_guy_nerd@reddit

That hardware is absolute overkill in the best way possible. 128GB of RAM gives you a massive amount of breathing room for those models, especially if you're running them via llama.cpp with a good quantization.

Running OpenClaw as the orchestrator with a 'brain' and 'engine' split is the right move. Using a specialized coder model for the actual analysis and a more general-purpose scout for the logic usually prevents the brain from getting bogged down in syntax errors.

One tip for the vulnerability research: set up a dedicated sandbox or VM for the code analysis engines. You don't want whatever you're analyzing having a path back to your host, even if the models are local.

[-]

BikerBoyRoy123@reddit

Not sure if this will help but i have two repos .

This one is full of useful info, setup and Python code for a RAG

https://github.com/RoyTynan/StoodleyWeather

This one is a full-blown Python app with an AI test-bed included

https://github.com/RoyTynan/HostScheduler