Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history
Posted by AnticitizenPrime@reddit | LocalLLaMA | View on Reddit | 29 comments
relmny@reddit
can you load qwen or other non-gemma models?
ben_g0@reddit
Queen 2.5 1.5B is officially supported. You can also add your own models (by opening the hamburger menu, selecting models, then clicking the + button, it's a bit hidden).
I haven't tried anything custom yet though so I unfortunately can't tell you how well it works.
AnticitizenPrime@reddit (OP)
1.0.13 was released yesterday, which brought MTP. The other features are part of today's 1.0.14 release.
Today's changelog:
**What's Changed 1. Experimental Model Context Protocol (MCP) Support
Introduced experimental support for MCP. Added a user permission flow for MCP tool calls, ensuring users are prompted for approval before the agent executes a tool (with an option to "Always allow"). Added comprehensive documentation for MCP.
Pixel TPU Support: Enabled execution support for models on Pixel TPUs, including support for sideloaded models. Speculative Decoding: Added configuration options and engine initialization for speculative decoding to improve model generation speed.
Calendar Integration: Added new skills to create-calendar-event and read-calendar-events directly from the chat. Scheduled Notifications: Added a schedule-notification skill (for one-time or daily alerts), complete with a Notification Management Screen and deep-link support that opens the agent chat with a pre-filled query when tapped. Learn Something New: Introduced a new learn-something-new skill. Note: Several older skills (like calculate-hash, text-spinner, send-email) were disabled by default to refine the default agent experience.
Gemini-like UI: Updated the UI for both Chat and Prompt Lab to better match the official Gemini app experience. System Prompt Customization: Added UI integration and core storage for users to edit and retrieve dynamic System Prompts. Chat History: Introduced chat history saving that supports text, images, and audio messages. Media Handling: Added a new feature allowing users to download and share images directly from the chat. 5. Notable Bug Fixes & Stability
Fixed session reset issues when turning off skills or deleting chat history. Switched from using "exact alarms" to "inexact alarms" to improve Android permission compliance. Fixed a NumberFormatException crash in the Benchmark Results Viewer that occurred for non-US locales (e.g., parsing "8186,03").**
Chupa-Skrull@reddit
Wish they would enable it for Pixel 8 TPUs. At least for the Pro. They're pretending it wouldn't work the same way they're pretending basic AI core stuff wouldn't work on the weaker 8 series models to force upsales
dryadofelysium@reddit
you have been able to enable the AI stuff if you acknowledge the out-of-memory possibilities in the developer options on the base Pixel 8 for years.
Chupa-Skrull@reddit
Yes, that's not at all what I'm talking about though. For instance, the NPU option enabled today in edge gallery is not available on any pixel 8 series device, and likely won't be
zxyzyxz@reddit
What does it show you, screenshot? For NPU you need to opt into their developer beta for AICore, that's what I did for my OnePlus 15 and I can now run models on the NPU.
Silver-Champion-4846@reddit
That's getting better! If I ever get a new pixel I probably won't be disappointed
dryadofelysium@reddit
LiteRT-LM for desktop also got an update. It works on all desktop OS, supports CPU/GPU/NPU and they work on OpenAI API support. Could be a great llama.cpp alternative for Gemma 4 very soon.
u23043@reddit
Hopefully they add support for the larger Gemma4 models, seems to just be E2B and E4B. Would also be nice to get a larger quant, based on file size it seems like 3-4 bit.
amelech@reddit
its definitely a 4-bit quant and not that bright
OcelotOk8071@reddit
this is great, but watch out, this is corporate takeover.
tiffanytrashcan@reddit
I haven't even heard about the desktop app before. However, the mobile one is on GitHub. Apache 2 licensed.
I would agree with you if the only way to run MTP or some other feature was on a Google exclusive app. They don't seem to be headed in that direction though. All the code is out there for everybody.
mtmttuan@reddit
In all of the shit show they pulled in google i/o at least this one is decent. Seems like they haven't forgotten about edge ai yet.
Quantum_Pigeon@reddit
I love how when you install the app you're immediately forced to agree to Google collecting data from the app. Doesn't this defeat the entire purpose of using local models?
AnticitizenPrime@reddit (OP)
According to the play store they say they collect 'Diagnostics and Other app performance data' which is of course vague.
It is Apache 2.0, so it should be able to be forked and that stuff removed. Or at the very least examined to see what it actually does report.
tiffanytrashcan@reddit
This makes me curious if the Play Store version is still downloading different models from Google instead of Hugging Face.
AnticitizenPrime@reddit (OP)
And after a followup question:
Chupa-Skrull@reddit
The purpose of edge-runnable models, from a business perspective, is to consume the user's hardware, battery cycles, etc. in order to save companies money serving inference for low-intelligence agentic workloads
slavetothesound@reddit
Is there an alternative way to use Gemma4 on iPhone/ipad without the data collection?
Chupa-Skrull@reddit
I'm not an iOS user but that LM Studio-affiliated option looks like a safe bet
JuJu_McGoo@reddit
These apps are able to run E2B decently well on a 17 pro.
Esper - my preference. Free version only allows one local model download though. Locally AI - Recently acquired by LMStudio. https://lmstudio.ai/blog/locally-ai-joins-lm-studio
amelech@reddit
I mean, that's at least part of the reason I started building https://github.com/NickMonrad/kernel-ai-assistant
ThePixelHunter@reddit
Pixel 9 here. Speculative decoding on GPU is impressively fast, about twice as fast as with speculative disabled. CPU is slower across the board.
jdchmiel@reddit
I tried to show a coworker gemma e4b today in edge gallery and had the first phone complete lockup i ever had. screen was on but frozen, no buttons or touch worked. I could not power it down any way other than a 30 second hold on power and down button. I thought I had bricked my pixel 9!
AnticitizenPrime@reddit (OP)
Might need to switch from GPU to CPU in the settings or vice versa, I also had a lockup before switching (though just the app, the whole device did not freeze).
jdchmiel@reddit
I dunno! I have used both gpu and cpu in the past week.
VoiceApprehensive893@reddit
basically edge gallery is legit usable now
AnticitizenPrime@reddit (OP)
Yeah lol. The lack of chat history before now was a head-scratcher.