Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history

[-]

relmny@reddit

can you load qwen or other non-gemma models?

[-]

ben_g0@reddit

Queen 2.5 1.5B is officially supported. You can also add your own models (by opening the hamburger menu, selecting models, then clicking the + button, it's a bit hidden).

I haven't tried anything custom yet though so I unfortunately can't tell you how well it works.

[-]

AnticitizenPrime@reddit (OP)

1.0.13 was released yesterday, which brought MTP. The other features are part of today's 1.0.14 release.

Today's changelog:

**What's Changed 1. Experimental Model Context Protocol (MCP) Support

Introduced experimental support for MCP. Added a user permission flow for MCP tool calls, ensuring users are prompted for approval before the agent executes a tool (with an option to "Always allow"). Added comprehensive documentation for MCP.

Hardware & Performance Enhancements

Pixel TPU Support: Enabled execution support for models on Pixel TPUs, including support for sideloaded models. Speculative Decoding: Added configuration options and engine initialization for speculative decoding to improve model generation speed.

New Agent Skills & Capabilities

Calendar Integration: Added new skills to create-calendar-event and read-calendar-events directly from the chat. Scheduled Notifications: Added a schedule-notification skill (for one-time or daily alerts), complete with a Notification Management Screen and deep-link support that opens the agent chat with a pre-filled query when tapped. Learn Something New: Introduced a new learn-something-new skill. Note: Several older skills (like calculate-hash, text-spinner, send-email) were disabled by default to refine the default agent experience.

UI & Chat Experience Improvements

Gemini-like UI: Updated the UI for both Chat and Prompt Lab to better match the official Gemini app experience. System Prompt Customization: Added UI integration and core storage for users to edit and retrieve dynamic System Prompts. Chat History: Introduced chat history saving that supports text, images, and audio messages. Media Handling: Added a new feature allowing users to download and share images directly from the chat. 5. Notable Bug Fixes & Stability

Fixed session reset issues when turning off skills or deleting chat history. Switched from using "exact alarms" to "inexact alarms" to improve Android permission compliance. Fixed a NumberFormatException crash in the Benchmark Results Viewer that occurred for non-US locales (e.g., parsing "8186,03").**

[-]

Chupa-Skrull@reddit

Wish they would enable it for Pixel 8 TPUs. At least for the Pro. They're pretending it wouldn't work the same way they're pretending basic AI core stuff wouldn't work on the weaker 8 series models to force upsales

[-]

dryadofelysium@reddit

you have been able to enable the AI stuff if you acknowledge the out-of-memory possibilities in the developer options on the base Pixel 8 for years.

[-]

Chupa-Skrull@reddit

Yes, that's not at all what I'm talking about though. For instance, the NPU option enabled today in edge gallery is not available on any pixel 8 series device, and likely won't be

[-]

zxyzyxz@reddit

What does it show you, screenshot? For NPU you need to opt into their developer beta for AICore, that's what I did for my OnePlus 15 and I can now run models on the NPU.

[-]

Silver-Champion-4846@reddit

That's getting better! If I ever get a new pixel I probably won't be disappointed

[-]

dryadofelysium@reddit

LiteRT-LM for desktop also got an update. It works on all desktop OS, supports CPU/GPU/NPU and they work on OpenAI API support. Could be a great llama.cpp alternative for Gemma 4 very soon.

[-]

u23043@reddit

Hopefully they add support for the larger Gemma4 models, seems to just be E2B and E4B. Would also be nice to get a larger quant, based on file size it seems like 3-4 bit.

[-]

amelech@reddit

its definitely a 4-bit quant and not that bright

[-]

OcelotOk8071@reddit

this is great, but watch out, this is corporate takeover.

[-]

tiffanytrashcan@reddit

I haven't even heard about the desktop app before. However, the mobile one is on GitHub. Apache 2 licensed.

I would agree with you if the only way to run MTP or some other feature was on a Google exclusive app. They don't seem to be headed in that direction though. All the code is out there for everybody.

[-]

mtmttuan@reddit

In all of the shit show they pulled in google i/o at least this one is decent. Seems like they haven't forgotten about edge ai yet.

[-]

Quantum_Pigeon@reddit

I love how when you install the app you're immediately forced to agree to Google collecting data from the app. Doesn't this defeat the entire purpose of using local models?

[-]

AnticitizenPrime@reddit (OP)

According to the play store they say they collect 'Diagnostics and Other app performance data' which is of course vague.

It is Apache 2.0, so it should be able to be forked and that stuff removed. Or at the very least examined to see what it actually does report.

[-]

tiffanytrashcan@reddit

This makes me curious if the Play Store version is still downloading different models from Google instead of Hugging Face.

[-]

AnticitizenPrime@reddit (OP)

And after a followup question:

```markdown

Reasons it's relatively privacy-respecting:

On-device storage: Chat conversations and message content stay in local DataStore storage -- they are not sent to Google or any remote server.

On-device inference: The AI inference runs entirely locally, so your prompts and responses never leave your phone.

Optional analytics: Firebase Analytics is optional at build time (the google-services plugin is disabled by default), meaning a self-built version from source could ship without any Google analytics at all.

Open source: The code is open source, so all of this is auditable -- there's no hidden telemetry you can't find.

Local credential storage: OAuth tokens and API keys are stored locally, not transmitted anywhere except to the relevant service (HuggingFace).

Reasons for caution:

Pre-built analytics: If you install a pre-built version (e.g., from an app store), it likely DOES include Firebase Analytics, meaning Google receives events about which features you use, which models you download, and when errors occur.

FCM push notifications: FCM is active, which means Google can send your device messages with deeplinks -- this is a potential attack surface.

Arbitrary MCP servers: The app can connect to arbitrary MCP servers configured by the user, and those servers could exfiltrate data -- the app doesn't appear to sandbox or audit those connections.

Local token vulnerability: HuggingFace OAuth tokens are stored locally but grant access to your HuggingFace account; if the device is compromised, those tokens could be extracted.

Account visibility: The GET_ACCOUNTS permission gives the app visibility into what Google accounts are on the device.

Benchmark data: Benchmark performance data (model speeds, device info) is stored locally and could theoretically be included in analytics.

Bottom Line

For an app made by Google, it's better than you might expect. The core value proposition -- on-device AI -- means your actual conversation content stays private. But if your concern is "does Google get data about me," then yes, a standard build sends them analytics about your usage patterns. If you built it from source without google-services.json, you'd eliminate that entirely.

If you're privacy-sensitive, the open-source nature is the biggest asset -- you can verify exactly what's happening and build a version without the analytics. ```

[-]

Chupa-Skrull@reddit

The purpose of edge-runnable models, from a business perspective, is to consume the user's hardware, battery cycles, etc. in order to save companies money serving inference for low-intelligence agentic workloads

[-]

slavetothesound@reddit

Is there an alternative way to use Gemma4 on iPhone/ipad without the data collection?

[-]

Chupa-Skrull@reddit

I'm not an iOS user but that LM Studio-affiliated option looks like a safe bet

[-]

JuJu_McGoo@reddit

These apps are able to run E2B decently well on a 17 pro.

Esper - my preference. Free version only allows one local model download though. Locally AI - Recently acquired by LMStudio. https://lmstudio.ai/blog/locally-ai-joins-lm-studio

[-]

amelech@reddit

I mean, that's at least part of the reason I started building https://github.com/NickMonrad/kernel-ai-assistant

[-]

ThePixelHunter@reddit

Pixel 9 here. Speculative decoding on GPU is impressively fast, about twice as fast as with speculative disabled. CPU is slower across the board.

[-]

jdchmiel@reddit

I tried to show a coworker gemma e4b today in edge gallery and had the first phone complete lockup i ever had. screen was on but frozen, no buttons or touch worked. I could not power it down any way other than a 30 second hold on power and down button. I thought I had bricked my pixel 9!

[-]

AnticitizenPrime@reddit (OP)

Might need to switch from GPU to CPU in the settings or vice versa, I also had a lockup before switching (though just the app, the whole device did not freeze).

[-]

jdchmiel@reddit

I dunno! I have used both gpu and cpu in the past week.

[-]

VoiceApprehensive893@reddit

basically edge gallery is legit usable now

[-]

AnticitizenPrime@reddit (OP)

Yeah lol. The lack of chat history before now was a head-scratcher.