24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5
Posted by Aromatic_Ad_7557@reddit | LocalLLaMA | View on Reddit | 1 comments

Here is the build guide for my setup. While it isn't a massive textbook, it provides enough detail to replicate the steps. Please note that this script ecosystem and the specific instructions were tailor-made for the Xiaomi 12 Pro. I cannot guarantee it will work out of the box on other hardware, though the general concepts apply universally.
Here are the key steps to achieve the build:
1. Unlock the Bootloader
Because unlocking the bootloader isn't strictly related to running Local LLMs, I’ve put together a dedicated post for this on my personal profile.
2. Flash LineageOS
Ditch MIUI/HyperOS for a cleaner, leaner Android experience.
3. Termux Setup & Android Survival Guide
By default, Android acts like a serial killer for background apps. You must grant Termux total freedom to prevent your LLM from being killed mid-generation.
- 3.1 Disable Battery Optimization (System Level)
- Go to Settings > Apps > Manage Apps > Termux.
- Find Battery Saver (or Activity Control) and select "No Restrictions".
- 3.2 Enable Wake Lock (Termux Level)
- This prevents the CPU from entering deep sleep when the screen is off.
- Open Termux, pull down your notification shade, and tap "Acquire wakelock".
- Alternatively, run this in the terminal:
termux-wake-lock - 3.3 Disable the Phantom Process Killer (Android 12+)
- Android 12+ has a hidden mechanism that aggressively kills resource-heavy background processes (like Ollama). Connect your phone to your PC via ADB and run this to set the limit to "infinite": Bashadb shell "/system/bin/device_config put activity_manager max_phantom_processes 2147483647"
- 3.4 Lock the App in Memory (Xiaomi Specific)
- Open your Recents/Multitasking menu.
- Long-press the Termux window and tap the Padlock icon. Termux will now survive the "Clear All" button.
4. Obtain Root Access
Install Magisk (preferably via F-Droid) and root your device. I won't provide a full tutorial here as there are thousands across the web, or you can simply ask an AI for the latest method for LineageOS.
5. The Headless Setup (Stopping the UI & Automation)
To maximize RAM and CPU for text generation, the Android graphical interface must be completely shut down. You do not need to do this manually— the zeus_cryo.sh master script will automatically execute the stop command and configure the headless environment for you.
If you are doing it yourself just investigate zeus_cryo.sh
However, before you execute that script, your device needs the right tools. You must push a series of custom binaries and monitoring scripts to the phone while the UI is still running.
5.1 Wi-Fi Recovery (Post-UI Kill)
When the Android UI is killed by the script, you lose standard Wi-Fi management. We use static binaries to maintain the connection in the background.
- Kernel Note: Requires
nl80211support (standard on modern Qualcomm chips). - Compatibility: Universal aarch64 binary, zero dependencies.
Bash
adb push wpa_supplicant_static /data/local/tmp/wpa_supplicant_static
adb push wpa_cli_static /data/local/tmp/wpa_cli_static
adb shell "su -c 'chmod 755 /data/local/tmp/wpa_supplicant_static /data/local/tmp/wpa_cli_static'"
(GitHub Links: wpa_cli_static | wpa_supplicant_static)
5.2 The "Zeus" Daemon Scripts
Push the automation scripts to your phone:
Bash
adb push zeus_cryo.sh /data/local/tmp/zeus_cryo.sh
adb push zeus_status.sh /data/local/tmp/zeus_status.sh
adb push zeus_battery.sh /data/local/tmp/zeus_battery.sh
adb push zeus_watchdog.sh /data/local/tmp/zeus_watchdog.sh
adb push zeus_watchdog_loop.sh /data/local/tmp/zeus_watchdog_loop.sh
Script Breakdown:
- zeus_cryo.sh: The master script that launches everything. (Requires your Wi-Fi SSID/Pass).
- zeus_status.sh: Run this to check current system health.
- zeus_battery.sh: Cycles battery between 40% and 80%. Connects/disconnects wall power to save battery health. (Requires Telegram Bot Token & ID for alerts).
- zeus_watchdog.sh: Revives the battery and cooler daemons if the Android OOM (Out of Memory) killer terminates them during heavy LLM usage.
- zeus_watchdog_loop.sh: Loops the watchdog every 15 seconds.
5.3 Smart Cooling Automation (Optional)
If you are using a smart plug (e.g., SONOFF S60 EU via eWeLink) and a phone cooler, you can automate thermal throttling.
Bash
adb push sonoff_ctl /data/local/tmp/sonoff_ctl
adb push zeus_cooler.sh /data/local/tmp/zeus_cooler.sh
adb push zeus_cooler.conf /data/local/tmp/zeus_cooler.conf
adb shell "su -c 'chmod 755 /data/local/tmp/sonoff_ctl'"
How it works: zeus_cooler.sh reads CPU temps every 2 seconds. Hit 45°C? The fan kicks on via sonoff_ctl. Drops to 42°C? Fan turns off. If it hits critical (55°C), it kills Ollama and pings you on Telegram.
On Aliexpress:
Smart Plug:
SONOFF S60 EU SONOFF Wifi Socket Wifi Smart Socket Overload Protection Timer Smart Scene Remote Control Via EWeLink Home IFTTT
( Probably will work with any SONOFF smart plug)
Cooler :
Magnetic Semiconductor Phone Cooler - Ice/Frost Cooling Pad for Mobile Gaming & Streaming
5.4 Launching the Server
With files in place, initiate the headless mode and reconnect remotely:
Bash
adb disconnect
adb shell "su -c 'sh /data/local/tmp/zeus_cryo.sh'"
# Reconnect over Wi-Fi (Replace with your phone's IP)
adb connect 192.168.1.31:5555
# Check system status
adb -s 192.168.1.31:5555 shell "su -c 'sh /data/local/tmp/zeus_status.sh'"
(You can unplug the USB cable after the connect command).
6. Real-World Benchmarks
Per community requests, I ran some heavy tests to see what this Snapdragon chip could handle in a headless state.
Prompt used: "Write a 2000-word IT project essay."
| Metric | Model 1: Gemma4 E2B (Q8) | Model 2: Qwen2.5 7B (Q4) |
|---|---|---|
| Output Generated | 1,312 Words (without thinking) | 3,453 Words |
| Total Duration | 21m 18s | 43m 34s |
| Load Duration | 400.39 ms | 282.03 ms |
| Prompt Eval Time | 1.01s (24.67 tokens/s) | 5.29s (3.59 tokens/s) |
| Eval Rate (Generation) | 2.16 tokens/s | 1.54 tokens/s |
I've also attached power measurements, a short real-time video, and the raw model logs to the post.
https://reddit.com/link/1smedrp/video/tybzuwfkaevg1/player


Note on llama.cpp: I spent half a day trying to natively compile llama.cpp in Termux but keep hitting fatal spawn.h errors. Because of that, this guide focuses on my stable setup.
But I will compile it finally.
Thank you all for the interest. I hope this guide inspires some of you to dust off your old flagships and build something similar!
jamaalwakamaal@reddit
pretty elaborate, nice, first try running the pre-built llama.cpp