Using my Mac Mini M4 as an LLM server—Looking for recommendations

Posted by cockpit_dandruff@reddit | LocalLLaMA | View on Reddit | 7 comments

I’m looking to set up my Mac Mini M4 (24 GB RAM) as an LLM server. It’s my main desktop, but I want to also use it to run language models locally. I’ve been playing around with the OpenAI API, and ideally I want something that:

• Uses the OpenAI API endpoint (so it’s compatible with existing OpenAI API calls and can act as a drop-in replacement)

• Supports API key authentication. Even though everything will run on my local network, I want API keys to make sure I’m implementing projects correctly.

• Is easy to use or has excellent documentation.

• Can start at boot, so the service is always accessible.

I have been looking into LocalAI but documentation is poor and i simply couldn’t get it to run .

I’d appreciate any pointers, recommendations, or examples of setups people are using on macOS for this.

Thanks in advance!

[-]

Steus_au@reddit

download illama, set os20b, start chating. if you need api in your python script use http://localhost:12434

RealLordMathis@reddit

I'm working on an app that could fit your requirements. It uses llama-server or mlx-lm as a backend so it requires additional setup on your end. I use it on my mac mini as a primary llm server as well.

It's OpenAI compatible and supports API key auth. For starting at boot, I'm using launchctl.

Github repo
Documentation

Key-Boat-7519@reddit

Easiest drop-in I’ve used on an M-series: LM Studio’s local server or litellm in front of Ollama; both mimic OpenAI and can require an API key. For litellm: pip install, then run litellm --model ollama/llama3.1 --api-keys sk-local-123 and point your client at base_url=http://127.0.0.1:4000 with that key. Launch at boot via a LaunchAgents plist (KeepAlive=true), or toggle “launch at login” in LM Studio. I’ve paired this with Tailscale for remote access, and when I needed per-route RBAC plus some DB endpoints, DreamFactory sat in front cleanly. Want a sample plist and curl tests for your model?

jarec707@reddit

IIRC LM Studio can do much of what you want. Not sure about the key authentication though

eli_pizza@reddit

Yup it can

Fun-Employment-5212@reddit

Hello, I’d love to see some benchmarks on this model, especially running gpt-oss-20b on ollama!

SomeOddCodeGuy_v2@reddit

I think that LiteLLM will do a lot of what you want. It's basically a proxy that lets you take any type of LLM Api connection and expose it as an OpenAI API connection. It also supports authentication keys.

For starting at boot, you don't have to rely on the app for that- you can write a bash script for it and [then run it at start](https://stackoverflow.com/questions/6442364/running-script-upon-login-in-mac-os-x).

I use several Macs as headless LLM servers with lots of other tool apps, and on each I have a script called "run_on_start.sh" that I kick off when I boot the server; that would be easy enough to automate. Just be careful about setting FileVault, as that will cause the Mac to wait for you to come put your password in when it reboots (which is why I just run the script myself, since Im already there)