[Update] GHOST v2.1: Full Native Windows Support is Live.
Posted by ChrisGamer5013@reddit | LocalLLaMA | View on Reddit | 19 comments
FOR THE UNINITIATED:
GHOST is an open source environment manager that breaks the NVIDIA monopoly. It allows you to run high performance AI models on AMD hardware by automatically injecting ZLUDA and ROCm layers into your Windows environment. No Linux, no complex WSL2 setups, and no driver hacking required.
KEY FEATURES
Full Windows Native Support: Runs directly in PowerShell with a hardened virtualization layer.
Auto Hardware Mapping: Scans your system and spoofs the exact RDNA architecture needed for CUDA compatibility.
Multi GPU Prioritization: Automatically detects and targets your high performance discrete GPU instead of integrated laptop graphics.
Anti Nesting Logic: Prevents recursive shell loops and manages process lifecycles for maximum stability.
The Waiting Room: While your AI model loads, play DOOM and listen to music inside the terminal TUI to mask loading latency.
Safe Mode Fallback: If your hardware is unlisted, the script falls back to a stable RDNA2 baseline to ensure execution never fails.
Link to repo
https://github.com/Void-Compute/AMD-Ghost-Enviroment
Also consider supporting me via the methods provided at the bottom of the read me file
FullstackSensei@reddit
Would be cool to see if this can work with Mi50 and ik_llama.cpp. Have you tested with ik? Let me know if you need any help with Mi50 support. I have a rig with 6 and have a few spares and can set a test bench with one card running Ubuntu and latest ROCm it it helps with testing.
ChrisGamer5013@reddit (OP)
That would be an incredible test case. GHOST is currently optimized for consumer RDNA/CDNA architectures via ZLUDA on Windows, but expanding to a dedicated MI50 Ubuntu/ROCm bench is exactly the direction I want to take the environment.
I haven't specifically benched with ik_llama.cpp yet, so that would be a valuable compatibility check. If you're willing to run a test on one of those spare cards, I'd love to see the logs
FullstackSensei@reddit
Can you first check with ik? Ik doesn't support ROCm. Do you build from source or use precompiled binaries?
tiffanytrashcan@reddit
Thank your LLM for the reply.. And well all of this.
I get that we're all excited and gung-ho about this new technology, obviously we all use it here. But there has to be a limit, no?
I'm assuming you're not answering questions yourself because you actually have no idea. That's the case 99% of the time when someone copies and pastes a chatbot answer when discussing technical details about a project "they" supposedly developed.
Please don't waste everyone's time when you have no idea what's going on. This is entirely slop-coded.
I give that 1% the benefit of the doubt for translated projects. Modern LLMs are more likely to restructure the output than an old translation service. Obviously not the case here.
I get using it for assistance kind of the whole point, and only using "auto-complete" is a waste of time now from an archaic era. The problem is you have no idea how to even review what it spits out.
Props to you for actually running local. Smart replies too, great model choice. There's hope, and you learned enough to get there.
Really makes me worry for humanity though, if this is in everyone's hands already, and the people that should be experienced with it are failing a turing test!
ChrisGamer5013@reddit (OP)
Ok now this genuinely pisses me off. Yes I used AI to help write the code yes I used AI to help me write documentation everybody does it its the way to code more efficiently but if you even looked a bit more at the commit history you would see that I changed the GFX version on RDNA 4 cards to 11.0 because on 12.0 most Libraries simply dont have the support for 12.0 yet fixing the venv enviroment because it wouldnt start fixing the doom engine because it kept giving me DG_DrawErrors. If it's so slop coded then how come i dont have any issues open or closed on github? and me removing the comments isn't hiding anything it’s just refactoring. I clean up the script once the logic is solid because I want it to look like a finished project, not a draft. An AI can tell you what an error code means, but it can't sit there for hours testing different environment variables on my specific hardware until the engine actually boots. I'm the one who did the actual troubleshooting.
tiffanytrashcan@reddit
Look I'm sorry for being so harsh in how I approached that. I see too much of this type of stuff, and well that's my own fault... Something something grass.
You're excited and I don't want to take away from that.
I'm not the worst you'll run into so please don't let it piss you off. Look up "Linus Torvalds Hate" and imagine those types of replies 🤣 I'm saying "get used to it" in a nice way, don't let it bother you.
"then how come i dont have any issues open or closed on github?" You'll learn this really isn't a good thing.
see_spot_ruminate@reddit
Why do all this work when Linux could be the easiest way to get stuff working. I mean, great, but why?
ChrisGamer5013@reddit (OP)
Because most people sadly don't use linux the first fully working iteration was designed for Linux but since only a handfull of people use linux and on windows a lot of people cant setup WSL2 correctly i decided to make a 1 click solution
see_spot_ruminate@reddit
I get that it is technically cool, but at some point there are so many hoops to go through which a dual boot of one of the beginner distros will solve.
WindySin@reddit
Edge case, I have a Strix Halo (i.e. unified memory iGPU) and an nVidia dGPU. Would this automagically interface with the AMD iGPU and let me run CUDA llama.cpp across both GPUs?
ChrisGamer5013@reddit (OP)
The script specifically filters for AMD hardware to inject the ZLUDA environment. In a dual-vendor setup, GHOST will ignore the Nvidia dGPU and only "GHOST" the Strix Halo iGPU. This effectively presents the system with two CUDA-compatible backends.
You can then use the --tensor-split or --device flags in llama.cpp to distribute the model layers across both the native Nvidia silicon and the spoofed AMD silicon simultaneously. Since Strix Halo uses a unified memory architecture, it acts as a massive high-speed VRAM buffer for the Nvidia card.
WindySin@reddit
I'm currently using those flags for Vulkan backend on my setup. Will give this a try.
ChrisGamer5013@reddit (OP)
In short with the right config in llama.cpp yes it should
StardockEngineer@reddit
"breaks the NVIDIA monopoly". I'll tell the govt there's no longer anything to worry about.
ChrisGamer5013@reddit (OP)
Yep all taken care off and Jensen Huang is currently shaking in his leather jacket because im about to bankrupt NVIDIA /s
But for real, obviously I’m not taking down a trillion dollar company. GHOST is just about giving people an actual choice to use the hardware they already own instead of being stuck in one ecosystem.
qado@reddit
If i have 1x 5090 and will add ex 6900xt any profit and it will be usable anyway ? I know about bootlenecking, it's just general question
ChrisGamer5013@reddit (OP)
The profit in this setup is VRAM Capacity. By combining the RTX 5090 (32GB) and RX 6900 XT (16GB), you unlock a 48GB total pool, allowing you to run massive models like Llama-3-70B or Command R+ entirely on-GPU without offloading to slow system RAM. Additional Setup Requirements Driver Coexistence: Keep both Nvidia and AMD Adrenalin drivers installed. Windows 11 handles dual-vendor hardware natively.
GHOST Execution: Run the script as usual. It will automatically detect both, but will only apply the ZLUDA/ROCm environment to the 6900 XT, leaving the 5090 on its native CUDA path.
Llama.cpp Flags: You must use the --tensor-split (or -ts) flag.
Recommended: -ts 2,1. This allocates more work to the faster 5090 so it isn't waiting on the 6900 XT.
Hardware Check: Ensure your PSU is at least 1200W. A 5090 and 6900 XT running simultaneously draw massive power.
Final-Frosting7742@reddit
Does it work with unified iGPU?
ChrisGamer5013@reddit (OP)
Yes. GHOST handles unified memory architectures by mapping the hardware as a single pool. Instead of searching for dedicated VRAM, it identifies the system as unified and enables direct access to the RAM pool for inference. GHOST prioritizes discrete cards if present, but it still works great on APU systems.