Mac On-Device AI for clipboard tasks

Posted by Upset_Letterhead@reddit | LocalLLaMA | View on Reddit | 9 comments

My honest read on local LLMs today: most models worth running don't fit on normal consumer hardware anymore. I used to run stuff on an RTX 4090 but the gap to frontier models keeps widening, and I don't want to build an expensive rig for a mid tier local model. So I default to OpenRouter / Anthropic for real work.

But I also don't love sending all my data to a third-party API for throwaway tasks - "summarize this," "rewrite this message," "review this comment." That feels like exactly where a small on device model should win.

I started poking at Apple Intelligence's Foundation Models framework. Built a clipboard manager around it as a test case.

Early impressions: reasonable for the "every day" workhorse. Fine at short summaries, rewrites. Falls over on ambiguous language and detailed tasks, as expected.

- Why does it feel like Apple has stalled on pushing real value from Apple Intelligence? The on device model is already there and somewhat capable.

- Am I missing them, or are very few apps actually leveraging Apple AI in meaningful ways? Most examples I see are Apple's own half-baked features.

- Does anyone have a real pulse on whether Apple Intelligence adoption is picking up, or is it just quietly stagnating?

[-]

ContextLengthMatters@reddit

Slop, right?

[-]

-dysangel-@reddit

Exactly. First sentence is the complete opposite of what we're seeing happening.

[-]

Upset_Letterhead@reddit (OP)

I work pretty extensively on coding applications and architecture. I've been coming back to local models for evaluations and struggle to find them come anywhere close to what frontier models are providing.

We also have very limited open support coming out of the USA providers, which heavily limits any real-world work done with open models out of a lot of US based companies (my case). The only recent shift is with Gemma's release, but still benchmarking.

[-]

ContextLengthMatters@reddit

Gemma? Qwen3.5 is arguably better for coding tasks.

Gpt-oss was even good for agentic use cases.

What tools do you use? What models have you used? What is your hardware? Everything in your post sounds like extreme marketing fluff and it's inaccurate.

[-]

Upset_Letterhead@reddit (OP)

Ryzen 7800X3D; 32GB, RTX 4090. RAM being a huge limitation at the moment. Of course prices went through the roof before I ordered more.

I was running gpt-oss-20B & Qwen-3-coder-30b a while back and got mediocre performance at the time. For the cost, I've just been using OpenRouter to re-evaluate models instead of trying to fit them on my machine for now to see if any would be worth re-investing and re-deploying on my pc. I basically run a suite of evaluations from coding to architecture and documentation work against the models and re-evaluate the results to see if there is enough improvement to re-assess if we should spend the effort to go local again.

My work does have a Hyperplane server (8x H100 GPUs with Epic 9004) that I've been looking to resurrect after some hardware failures. We've been punting bring it back online though because we haven't identified enough progression to realistically bring it back online (the restriction against Chinese models doesn't help either).

[-]

ContextLengthMatters@reddit

What models have you been re-evaluating? How are you evaluating them? What tasks are they failing on? I run opencode with qwen3.5-122B-A10B as my main driver alongside claude code for design work and use it for most of my agentic edits that don't require huge amounts of architecture decisions. It handles all of my basic command work via tool calling just fine.

If your usecase is some kind of clipboard manager using AI, I don't understand what more you could possibly need. It seems like you came in here after a long stint away from AI and posed yourself as some visionary expert and then trying to get people to talk about apple intelligence AI of all things which is so irrelevant to what is taking place at the moment.

Local models are literally just recently starting to bridge the gap to becoming "good enough" for most people's use-cases of autonomous local agent work. That's why I'm kind of blown away by your initial hot take. It makes you seem like a bot who has just scraped a bunch of outdated AI discussions.

[-]

Upset_Letterhead@reddit (OP)

Great point, I have a few majorly different projects I'm working on. Some on the side for learning (clipboard app) and some directly for work.

For the post, I wanted to discuss the side project and smaller models a bit more to understand in the smaller scope of something like a clipboard app (basic management, transformations) - how have models evolved for the everyday computer user. The rigs posted on here are pretty significant for any average person to be leveraging, so I wanted to try the clipboard app actions with Apple Intelligence (with somewhat mixed results).

On the work related side, I have a lot more significant exposure to LLMs with OpenRouter access and AI subscriptions (Claude, Codex). We also have the currently idled Hyperplane server that I'm bringing up - but like I mentioned, having difficulty justifying the investment vs just using OpenRouter or AI subs. The work aspect revolves mostly around major software development on complex apps (Ux design, back-end work, custom internal infrastructure/data) and electrical/software architectures (V-model development, requirements, test-cases, testing, etc..).

Most of my recent exposure to open source models now has been re-evaluating them for work with OpenRouter. I have a semi-automated set of custom tests that cover things like coding, problem solving and UI design. I also have manually run a large amount of evaluations against LLMs (both large frontier models and open source) for managing the left-side of the V-Model (Architecture and requirements generation). That has been one of then most difficult aspects of LLMs to get right it seems.

I basically test every major LLM release against these areas when they come out (OpenAI, Anthropic, Google, xAI, z.ai, Moonshot, MiniMax, DeepSeek, Xiaomi, Qwen, Meta). Moonshot (Kimi K2.5) and z.ai (GLM-5) had some interesting progress, but I can't actually leverage those for real-work loads at work, so I keep an eye on them but can't actually deploy them.

[-]

-dysangel-@reddit

What does that have to do with your claim in the post? You're talking as if smaller models used to be better than they are today. They are only becoming more capable. Yes, they're not as good as current frontier models (why would they be?), but most small models today are better than frontier models from 3 years ago.