24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5

Posted by Aromatic_Ad_7557@reddit | LocalLLaMA | View on Reddit | 1 comments

Here is the build guide for my setup. While it isn't a massive textbook, it provides enough detail to replicate the steps. Please note that this script ecosystem and the specific instructions were tailor-made for the Xiaomi 12 Pro. I cannot guarantee it will work out of the box on other hardware, though the general concepts apply universally.

Here are the key steps to achieve the build:

1. Unlock the Bootloader

Because unlocking the bootloader isn't strictly related to running Local LLMs, I’ve put together a dedicated post for this on my personal profile.

Link: Guide: Securing a Xiaomi Bootloader Unlock (Beating the Quota)

2. Flash LineageOS

Ditch MIUI/HyperOS for a cleaner, leaner Android experience.

Link: Detailed Installation Guide for Zeus from LineageOS

3. Termux Setup & Android Survival Guide

By default, Android acts like a serial killer for background apps. You must grant Termux total freedom to prevent your LLM from being killed mid-generation.

3.1 Disable Battery Optimization (System Level)
Go to Settings > Apps > Manage Apps > Termux.
Find Battery Saver (or Activity Control) and select "No Restrictions".
3.2 Enable Wake Lock (Termux Level)
This prevents the CPU from entering deep sleep when the screen is off.
Open Termux, pull down your notification shade, and tap "Acquire wakelock".
Alternatively, run this in the terminal: termux-wake-lock
3.3 Disable the Phantom Process Killer (Android 12+)
Android 12+ has a hidden mechanism that aggressively kills resource-heavy background processes (like Ollama). Connect your phone to your PC via ADB and run this to set the limit to "infinite": Bashadb shell "/system/bin/device_config put activity_manager max_phantom_processes 2147483647"
3.4 Lock the App in Memory (Xiaomi Specific)
Open your Recents/Multitasking menu.
Long-press the Termux window and tap the Padlock icon. Termux will now survive the "Clear All" button.

4. Obtain Root Access

Install Magisk (preferably via F-Droid) and root your device. I won't provide a full tutorial here as there are thousands across the web, or you can simply ask an AI for the latest method for LineageOS.

5. The Headless Setup (Stopping the UI & Automation)

To maximize RAM and CPU for text generation, the Android graphical interface must be completely shut down. You do not need to do this manually— the zeus_cryo.sh master script will automatically execute the stop command and configure the headless environment for you.

If you are doing it yourself just investigate zeus_cryo.sh

However, before you execute that script, your device needs the right tools. You must push a series of custom binaries and monitoring scripts to the phone while the UI is still running.

5.1 Wi-Fi Recovery (Post-UI Kill)

When the Android UI is killed by the script, you lose standard Wi-Fi management. We use static binaries to maintain the connection in the background.

Kernel Note: Requires nl80211 support (standard on modern Qualcomm chips).
Compatibility: Universal aarch64 binary, zero dependencies.

Bash

adb push wpa_supplicant_static /data/local/tmp/wpa_supplicant_static
adb push wpa_cli_static /data/local/tmp/wpa_cli_static
adb shell "su -c 'chmod 755 /data/local/tmp/wpa_supplicant_static /data/local/tmp/wpa_cli_static'"

(GitHub Links: wpa_cli_static | wpa_supplicant_static)

5.2 The "Zeus" Daemon Scripts

Push the automation scripts to your phone:

Bash

adb push zeus_cryo.sh /data/local/tmp/zeus_cryo.sh
adb push zeus_status.sh /data/local/tmp/zeus_status.sh
adb push zeus_battery.sh /data/local/tmp/zeus_battery.sh
adb push zeus_watchdog.sh /data/local/tmp/zeus_watchdog.sh
adb push zeus_watchdog_loop.sh /data/local/tmp/zeus_watchdog_loop.sh

Script Breakdown:

zeus_cryo.sh: The master script that launches everything. (Requires your Wi-Fi SSID/Pass).
zeus_status.sh: Run this to check current system health.
zeus_battery.sh: Cycles battery between 40% and 80%. Connects/disconnects wall power to save battery health. (Requires Telegram Bot Token & ID for alerts).
zeus_watchdog.sh: Revives the battery and cooler daemons if the Android OOM (Out of Memory) killer terminates them during heavy LLM usage.
zeus_watchdog_loop.sh: Loops the watchdog every 15 seconds.

5.3 Smart Cooling Automation (Optional)

If you are using a smart plug (e.g., SONOFF S60 EU via eWeLink) and a phone cooler, you can automate thermal throttling.

Bash

adb push sonoff_ctl /data/local/tmp/sonoff_ctl
adb push zeus_cooler.sh /data/local/tmp/zeus_cooler.sh
adb push zeus_cooler.conf /data/local/tmp/zeus_cooler.conf
adb shell "su -c 'chmod 755 /data/local/tmp/sonoff_ctl'"

How it works: zeus_cooler.sh reads CPU temps every 2 seconds. Hit 45°C? The fan kicks on via sonoff_ctl. Drops to 42°C? Fan turns off. If it hits critical (55°C), it kills Ollama and pings you on Telegram.

zeus_cooler.conf

On Aliexpress:

Smart Plug:

SONOFF S60 EU SONOFF Wifi Socket Wifi Smart Socket Overload Protection Timer Smart Scene Remote Control Via EWeLink Home IFTTT
( Probably will work with any SONOFF smart plug)

Cooler :

Magnetic Semiconductor Phone Cooler - Ice/Frost Cooling Pad for Mobile Gaming & Streaming

5.4 Launching the Server

With files in place, initiate the headless mode and reconnect remotely:

Bash

adb disconnect
adb shell "su -c 'sh /data/local/tmp/zeus_cryo.sh'"

# Reconnect over Wi-Fi (Replace with your phone's IP)
adb connect 192.168.1.31:5555

# Check system status
adb -s 192.168.1.31:5555 shell "su -c 'sh /data/local/tmp/zeus_status.sh'"

(You can unplug the USB cable after the connect command).

6. Real-World Benchmarks

Per community requests, I ran some heavy tests to see what this Snapdragon chip could handle in a headless state.

Prompt used: "Write a 2000-word IT project essay."

Metric	Model 1: Gemma4 E2B (Q8)	Model 2: Qwen2.5 7B (Q4)
Output Generated	1,312 Words (without thinking)	3,453 Words
Total Duration	21m 18s	43m 34s
Load Duration	400.39 ms	282.03 ms
Prompt Eval Time	1.01s (24.67 tokens/s)	5.29s (3.59 tokens/s)
Eval Rate (Generation)	2.16 tokens/s	1.54 tokens/s

I've also attached power measurements, a short real-time video, and the raw model logs to the post.

GEMMA4-E2B-8Q.txt

Qwen2.5-7B-Q4_K_M.txt

https://reddit.com/link/1smedrp/video/tybzuwfkaevg1/player

Note on llama.cpp: I spent half a day trying to natively compile llama.cpp in Termux but keep hitting fatal spawn.h errors. Because of that, this guide focuses on my stable setup.

But I will compile it finally.

Thank you all for the interest. I hope this guide inspires some of you to dust off your old flagships and build something similar!