24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5

Posted by Aromatic_Ad_7557@reddit | LocalLLaMA | View on Reddit | 1 comments

Here is the build guide for my setup. While it isn't a massive textbook, it provides enough detail to replicate the steps. Please note that this script ecosystem and the specific instructions were tailor-made for the Xiaomi 12 Pro. I cannot guarantee it will work out of the box on other hardware, though the general concepts apply universally.

Here are the key steps to achieve the build:

1. Unlock the Bootloader

Because unlocking the bootloader isn't strictly related to running Local LLMs, I’ve put together a dedicated post for this on my personal profile.

2. Flash LineageOS

Ditch MIUI/HyperOS for a cleaner, leaner Android experience.

3. Termux Setup & Android Survival Guide

By default, Android acts like a serial killer for background apps. You must grant Termux total freedom to prevent your LLM from being killed mid-generation.

4. Obtain Root Access

Install Magisk (preferably via F-Droid) and root your device. I won't provide a full tutorial here as there are thousands across the web, or you can simply ask an AI for the latest method for LineageOS.

5. The Headless Setup (Stopping the UI & Automation)

To maximize RAM and CPU for text generation, the Android graphical interface must be completely shut down. You do not need to do this manually— the zeus_cryo.sh master script will automatically execute the stop command and configure the headless environment for you.

If you are doing it yourself just investigate zeus_cryo.sh

However, before you execute that script, your device needs the right tools. You must push a series of custom binaries and monitoring scripts to the phone while the UI is still running.

5.1 Wi-Fi Recovery (Post-UI Kill)

When the Android UI is killed by the script, you lose standard Wi-Fi management. We use static binaries to maintain the connection in the background.

Bash

adb push wpa_supplicant_static /data/local/tmp/wpa_supplicant_static
adb push wpa_cli_static /data/local/tmp/wpa_cli_static
adb shell "su -c 'chmod 755 /data/local/tmp/wpa_supplicant_static /data/local/tmp/wpa_cli_static'"

(GitHub Links: wpa_cli_static | wpa_supplicant_static)

5.2 The "Zeus" Daemon Scripts

Push the automation scripts to your phone:

Bash

adb push zeus_cryo.sh /data/local/tmp/zeus_cryo.sh
adb push zeus_status.sh /data/local/tmp/zeus_status.sh
adb push zeus_battery.sh /data/local/tmp/zeus_battery.sh
adb push zeus_watchdog.sh /data/local/tmp/zeus_watchdog.sh
adb push zeus_watchdog_loop.sh /data/local/tmp/zeus_watchdog_loop.sh

Script Breakdown:

5.3 Smart Cooling Automation (Optional)

If you are using a smart plug (e.g., SONOFF S60 EU via eWeLink) and a phone cooler, you can automate thermal throttling.

Bash

adb push sonoff_ctl /data/local/tmp/sonoff_ctl
adb push zeus_cooler.sh /data/local/tmp/zeus_cooler.sh
adb push zeus_cooler.conf /data/local/tmp/zeus_cooler.conf
adb shell "su -c 'chmod 755 /data/local/tmp/sonoff_ctl'"

How it works: zeus_cooler.sh reads CPU temps every 2 seconds. Hit 45°C? The fan kicks on via sonoff_ctl. Drops to 42°C? Fan turns off. If it hits critical (55°C), it kills Ollama and pings you on Telegram.

zeus_cooler.conf

On Aliexpress:

Smart Plug:

SONOFF S60 EU SONOFF Wifi Socket Wifi Smart Socket Overload Protection Timer Smart Scene Remote Control Via EWeLink Home IFTTT
( Probably will work with any SONOFF smart plug)

Cooler :

Magnetic Semiconductor Phone Cooler - Ice/Frost Cooling Pad for Mobile Gaming & Streaming

5.4 Launching the Server

With files in place, initiate the headless mode and reconnect remotely:

Bash

adb disconnect
adb shell "su -c 'sh /data/local/tmp/zeus_cryo.sh'"

# Reconnect over Wi-Fi (Replace with your phone's IP)
adb connect 192.168.1.31:5555

# Check system status
adb -s 192.168.1.31:5555 shell "su -c 'sh /data/local/tmp/zeus_status.sh'"

(You can unplug the USB cable after the connect command).

6. Real-World Benchmarks

Per community requests, I ran some heavy tests to see what this Snapdragon chip could handle in a headless state.

Prompt used: "Write a 2000-word IT project essay."

Metric Model 1: Gemma4 E2B (Q8) Model 2: Qwen2.5 7B (Q4)
Output Generated 1,312 Words (without thinking) 3,453 Words
Total Duration 21m 18s 43m 34s
Load Duration 400.39 ms 282.03 ms
Prompt Eval Time 1.01s (24.67 tokens/s) 5.29s (3.59 tokens/s)
Eval Rate (Generation) 2.16 tokens/s 1.54 tokens/s

I've also attached power measurements, a short real-time video, and the raw model logs to the post.

GEMMA4-E2B-8Q.txt

Qwen2.5-7B-Q4_K_M.txt

https://reddit.com/link/1smedrp/video/tybzuwfkaevg1/player

Note on llama.cpp: I spent half a day trying to natively compile llama.cpp in Termux but keep hitting fatal spawn.h errors. Because of that, this guide focuses on my stable setup.

But I will compile it finally.

Thank you all for the interest. I hope this guide inspires some of you to dust off your old flagships and build something similar!