I got tired of compiling llama.cpp on every Linux GPU

Posted by keypa_@reddit | LocalLLaMA | View on Reddit | 41 comments

Hello fellow AI users!

It's my first time posting on this sub. I wanted to share a small project I've been working on for a while that’s finally usable.

If you run llama.cpp across different machines and GPUs, you probably know the pain: recompiling every time for each GPU architecture, wasting 10–20 minutes on every setup.

Here's Llamaup (rustup reference :) )

It provides pre-built Linux CUDA binaries for llama.cpp, organized by GPU architecture so you can simply pull the right one for your machine.

I also added a few helper scripts to make things easier:

Once installed, the usual tools are ready to use:

No compilation required.

I also added llama-models, a small TUI that lets you browse and download GGUF models from Hugging Face directly from the terminal.

Downloaded models are stored locally and can be used immediately with llama-cli or llama-server.

I'd love feedback from people running multi-GPU setups or GPU fleets.

Ideas, improvements, or PRs are very welcome 🚀

GitHub:
https://github.com/keypaa/llamaup

DeepWiki docs:
https://deepwiki.com/keypaa/llamaup