How to simply run your model at startup in Debian/Ubuntu

Posted by EmilPi@reddit | LocalLLaMA | View on Reddit | 8 comments

I see lots of posts asking how to autostart models on startup. One solution is to use llama.cpp and systemd service to start API endpoint after boot - you can then connect it to OpenWebUI or another OpenAI API compatible UI. I have set up this exact startup scheme: \# ASSUMING YOU HAVE INSTALLED CUDA ACCORDING TO OFFICIAL GUIDE [https://docs.nvidia.com/cuda/cuda-installation-guide-linux/](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) git clone [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) cd llama.cpp cmake -B build -DGGML\_CUDA=ON -DGGML\_CUDA\_F16=ON cmake --build build --config Release --parallel 32 Now download some GGUF files (if they are large, there can be multiple which end with ...-00001-of-0000X - only reference -00001-of-... part then after `--model`) then (as root or with sudo) put file `llm-server.service` into `/etc/systemd/system/`: [Unit] Description=llama.cpp server After=network.target [Service] User=ai WorkingDirectory=/home/ai/3rdparty/llama.cpp ExecStart=<FULL_PATH_TO_llama.cpp_cloned_folder>/build/bin/llama-server --host localhost --port 1234 --model <PATH_WITH_YOUR_GGUF_FILE(S)>/model.gguf -ngl 999 -c 4096 Restart=always [Install] WantedBy=multi-user.target *This assumes your model fully fits into your GPU - otherwise you should play with -ts (tensor split over GPUs) and -ngl (how much layers put on GPU(s)) settings.* Now enable and start your service (adding sudo before every command or as root): systemctl enable llm-server.service systemctl start llm-server.service If you have any questions or errors, I will try to answer them in comments.

8 Comments

[-]

muxxington@reddit

I still prefer gppm. Although it is actually intended for use with P40, it also works for any other setup. It has a few clear advantages: You can define your instance or a bundle of e.g. instances and paddler as a load balancer as yaml and start, stop or change them very easily via a cli even during operation. In addition, you can do this not only via a cli, but also via an API, which makes it possible for an LLM to manage the LLM instances via a tool. Furthermore, I provide DEB pakets for daemon and cli, which makes the installation even easier than creating systemd services by hand.

mrpazdzioch@reddit

If you run your stack using docker compose, it will bring it all up automatically on startup. Just make sure to put `restart: always` in your compose file

AmericanNewt8@reddit

I'd set it as a cronjob set to run on start rather than messing with systemd, personally.

kryptkpr@reddit

Don't forget -fa !!!

EmilPi@reddit (OP)

I cannot edit my post :D maybe because of system code I put there :D

Succeeded, have to use markdown editor with code indented.

Good point! However, some models (like deepseek) don't support them. I copied and the privacy-edited this file from deepseek-lite, that's why I missed this.

If it doesn't work (like DeepSeek yes) it will turn itself off with a warning, so it's safe to always try to turn it on afaik

Reply to Post

8 Comments