Any "mainstream" apps with genuinely useful local AI features?

Posted by intofuture@reddit | LocalLLaMA | View on Reddit | 46 comments

Curious if any of you actually regularly use features in apps with local AI processing?

When I say "mainstream app", I mean more like PyCharm from JetBrains (i.e. making lots of money, large teams behind them, etc.) than an open-source/indie dev app.

And I'm more talking about a feature in an app (which does a bunch of things other than that AI feature), as opposed to an app that's entirely about using AI locally, like Ollama, LMStudio, etc.

I'm also not talking about OS features, e.g. auto-complete on iPhones. More interested in app that you've downloaded.

Currently, the only thing I can think of in my day-to-day is code completion in PyCharm, but even that is now some kind of hybrid local/cloud thing.

[-]

WoofNWaffleZ@reddit

I made a personal app to microphone > whisper > iohook > os paste into any app

I speak at 140-200 words per minute and type significantly slower than that.

Helps me out with work and I respond to my MS Teams and MS outlook using voice instead of typing.

Huge productivity boost for me.

Bounded ctrl+alt to paste the transcription

Bounded ctrl+window to send the transcription to my local AI waifu.

Sometimes I stream on twitch while programming.

[-]

intofuture@reddit (OP)

Oh yeh I saw their subtitle thing. Quite cool

[-]

AgnosticAndroid@reddit

PotPlayer also. Recently got whisper integration for generating subtitles on the fly.

[-]

intofuture@reddit (OP)

Nice. Have you used it? Is it decent?

[-]

xmmr@reddit

I fear none of them use large v3 but something lighter, and considering even large v3 perform poorly without audio tweaks, can't imagine under that...

[-]

The lack of such apps highlights the energy inneficiency of modern LLMs. We need to do a lot more work to have 100T parameter models run on 30W of energy or otherwise, we will never see such apps. Instead, they will simply run on cloud services, APIs

[-]

intofuture@reddit (OP)

Smaller models that are locally feasible can be good enough for certain tasks though, no?

Think JetBrains' model is only like 100MB. I find it pretty good in terms of both code quality and system util

[-]

Defiant-Mood6717@reddit

So LLMs are just for basic code completion?
Where is your ambition? Do you not want agents, AGI, irradicate poverty and create massive value?
Do you not want them to run on everyones local machine for 30W of power? My point is they will never run on your local machine if we can't make hardware more efficient, since they require trillions of parameters, you will NEVER have AGI with 100M parameters, that is like saying a ant sized brain with the right training can get to human level intelligence, physically impossible.

Let's instead focus on making hardware like the brain , that can run 100T parameter models on sandwich power, 30W, only then these apps may start to exist

[-]

intofuture@reddit (OP)

Fair point. Agree we need to see some big improvements to HW. There could also be some big innovations on the SW-side, though, which mean we don't necessarily need 100T param models for insanely useful AI

[-]

Defiant-Mood6717@reddit

I agree 100T parameters is not exactly a minimum requirement, I use it as an example because the human brain has 100T connections, but yeah I would say, for a future really useful model, we need to be able to run at least like 0.5-1T parameter models for 30W energy. Things like architecture help a lot in making use of those parameters efficiently, so yeah software is also important, if we consider architecture software. For example, MoE architectures allow us to have 1T parameters but only use 100B at a time, which is very efficient and actually what the human brain does, we dont use 100T connections all the time, this is the classical saying that we only use 10% of our brain or whatever, that is true, our brain is also MoE

[-]

shokuninstudio@reddit

We use our whole brain all the time. The 10% thing is a silly myth. In the past this meme was 5% and even 3%.

https://en.wikipedia.org/wiki/Ten-percent-of-the-brain_myth

[-]

Defiant-Mood6717@reddit

You completely misunderstood what I meant by only using 10% . I meant we use only 10% AT A TIME , but yes we obviously do use all of it. I later researched a bit and it's even lower than that. Your brain has total 100T synapses (parameters) but only millions to billions are active at a certain time

[-]

shokuninstudio@reddit

That is still nonsense. You might be the only mammal alive who uses 10% of their brain “at a time” but don’t include the rest of us in your issue.

[-]

nicolas_06@reddit

For AGI first it should exist first, but for me a true real AGI would be humanoid and I guess first version could cost millions and have big batteries and consume a few hundred watts. Anyway to move something that may be 100 pounds and can also do so physical activities you are not restricted to 30W. Also the first version might cost more 100K-1 million than 1K$.

I don't see the link with AI and eradicate poverty and that thing running locally or not. But I think that enough demand we can see architectures like project digits of Nvidia or AMD AI 300 series to be what we have commonly in new laptops 5 years from now and in smartphones in 10 years.

So it is just a mater of waiting.

[-]

Aaaaaaaaaeeeee@reddit

It is reasonable to expect 4W models on the snapdragon elite chips. 7B-8B range models are 18 t/s, prompt processing is 700 t/s. I have verified personally. Phi-MoE will have this exact same performance once the software catches up. Need to wait a year though probably, and new models may arise with this exact specification (7B active parameters)

You can boost the model speed further by 2x, with and without further training of the model, described in the TEAL paper, and Q-sparse. It should work with 4bit/2bit.

Its fully possible to get GPT4 level on mobile or tablet with all their bandwidth constraints, with the correct fit of MoE for your RAM. You just need 16-64gb.

[-]

Defiant-Mood6717@reddit

That is very impressive, 4W is that real? What is the quantization on that 8B model you verified?
I find this suspicious. So you are saying its possible to run 80B active parameter model for 40W? Why don't they make a GPU that does that instead of a phone?
From what you tell me, sounds like snapdragon elite chip should be on datacenters instead of the power hungry NVIDIA GPUs , there has to be a catch here I am not seeing

[-]

Fast-Satisfaction482@reddit

First, the technology is still pretty new and second it is not as general as a GPU. It can achieve that size of models only by using their own quantization and only for inference.

On the other hand, AI companies seem to be making money only with their frontier models, so tiny 8b models don't seem to be attractive to them.

[-]

Aaaaaaaaaeeeee@reddit

The 7-8B models on the HTP backend Qualcomm 8 elite use or 4-5W. I didn't say 80B active, I would guess much less is actually needed. (Deepseek V3 is 37B). Nvidia GPUs have a place, they have high bandwidth and high compute. The edge devices often have low bandwidth, but you can minmax on the compute heavy stuff since you can't change the RAM bandwidth speed.

Its very interesting to theorize about the most optimal model you can run with the constraints these things. If we never get better hardware for cheap, we will still be winners.

[-]

MmmmMorphine@reddit

Well shit, yeah, that's essentially the energy efficiency of the human brain. We aren't gonna get there for a long long time short of actually using brain-machine interfaces with brain organoids (or similar mechanisms leveraging biological systems)

[-]

nicolas_06@reddit

Also as most people use phones first, it is more like 1W.

[-]

Lissanro@reddit

Even though there are some mainstream apps (like VLC, for example) that integrate lightweight AI, the main issue that most people do not have enough memory for good general AI. And even when there is enough VRAM, when AI is being used, it blocks other applications from running unless there is huge excess of memory. For example, when I am using Mistral Large 2411 123B 5bpw along with 7B draft model, they consumer nearly all of my 96GB VRAM, even video player may produce out of memory errors.

For a single GPU system, even 7B-32B LLM can consume most or all of available VRAM. But mainstream apps generally aim to be lightweight and efficient - this is why neither apps nor games use AI widely yet, just too resource heavy for today's typical hardware. Especially given the user usually wants to be able to run more than one app, and each app usually wants to use its own AI it was developed for.

I have no doubt that this will change in the future, but the change is going to be gradual - we will see more and more application integrating lightweight specialized AI. Eventually, when a typical system will have many terabytes of fast memory (or at least few hundreds), we will see more advanced features and heavier AI being integrated to the mainstream apps. But it may take a while, this is not going to happen in just few years despite how fast AI is being developed - the best case scenario, by the next decade.

[-]

Ruin-Capable@reddit

Jetbrains single-line code completion is a completely local AI thing.

They also have a co-pilot like AI tool that requires a subscription, even when you run it on prem.

[-]

intofuture@reddit (OP)

I think multi-line completion is now done locally in Jetbrains too [source]

Local code completion enhancements: Multi-line support for Python and contextual improvements

Local code completion has significantly improved, now offering multi-line suggestions for Python. Additionally, optimizations have been made across other programming languages. For Kotlin, retrieval-augmented generation (RAG) enables the model to pull information from multiple project files, ensuring the most relevant suggestions. The support for JavaScript, TypeScript, and CSS has also seen enhancements to their existing RAG functionality. Furthermore, local code completion has been introduced for HTML.

These improvements mean that suggestions appear faster across all languages, creating a more seamless coding experience. Best of all, local code completion is included for free in your IDE, allowing you to start utilizing these powerful features immediately.

Do you know anything about how it decides whether to do things locally or in the cloud?

[-]

Ruin-Capable@reddit

The AI assistant requires a Pro subscription. The single-line completion does not. There is a free plugin called "Continue" that works with local OpenAI compatible endpoints (I've used it with the LMStudio endpoints).

[-]

cobbleplox@reddit

Sadly I had to make a powerpoint presentation, and I never used powerpoint before. So I shoved some text and an image o each slide and constantly pressed the ai designer button and selected the least worst one. I'm glad it was there. Had to manually remove all the shitty animations though.

[-]

Top-Salamander-2525@reddit

You would be better off using a different LLM to generate a markdown version of the presentation for you. Might require a bit of finessing the image names etc but they’re very good at making markdown.

Can then use various tools to convert markdown to PowerPoint (or other presentation formats) and choose styles.

https://gist.github.com/johnloy/27dd124ad40e210e91c70dd1c24ac8c8

[-]

intofuture@reddit (OP)

Lmao. Didn't realize powerpoint had an AI designer thing that runs locally

[-]

cobbleplox@reddit

Oh you said locally. It's ms copilot in this case.

[-]

intofuture@reddit (OP)

Ah I see. Yeh, I'm not really sure which MS AI features run locally. I would've thought some copilot stuff might. They do seem to talk/post things about local AI every now and then

[-]

CtrlAltDelve@reddit

Kerlig has been one of my absolute favorite tools. It works with both cloud models and local models.

For spelling/grammar/instant summarization, I have found it to be phenomenal. It also has opened up my eyes to how extremely capable smaller 8B models are when used specifically for these purposes.

Well worth the money in my opinion: https://www.kerlig.com/

(MacOS Only)

[-]

jaarson@reddit

Thanks for the shout-out! 🙇‍♂️

[-]

intofuture@reddit (OP)

Interesting! Haven't heard of it. Guess it's from an indie dev?

[-]

CtrlAltDelve@reddit

Yep it absolutely is :)

Not me though! Dev is /u/jaarson

[-]

frivolousfidget@reddit

Apple inteligence runs a bunch of stuff locally

[-]

eras@reddit

Firefox does local language translation.

[-]

intofuture@reddit (OP)

Very cool

[-]

Majestic-Quarter-958@reddit

I created this app a while ago that keeps track of your local files using llm:

https://github.com/AIxHunter/FileWizardAi

[-]

intofuture@reddit (OP)

Nice one, looks cool. Post was supposed to be more about non-open source / indie dev apps though

[-]

ABC4A_@reddit

Homeassistant. You can use Ollama to help with it's voice assistant feature

[-]

Aaaaaaaaaeeeee@reddit

FUTO app, its a good keyboard with whisper.tiny ggml built-in. I use it for everything phone related

[-]

FullOf_Bad_Ideas@reddit

Not really what you wanted but I like this Libreoffice extension.

https://github.com/balisujohn/localwriter

[-]

intofuture@reddit (OP)

Haha yeh, this would def fall into my classification of "open-source/indie-dev". Still cool though!

[-]

JuniorConsultant@reddit

It‘s debatable how useful it is in its current form, but Proton Mail has a feature called « Proton Scribe ». It’s a feature to draft emails, change tone, formatting etc. It’s a small model and not useful for anything outside of english at all.

[-]

intofuture@reddit (OP)

Nice, yeh Proton are a cool company