Any "mainstream" apps with genuinely useful local AI features?
Posted by intofuture@reddit | LocalLLaMA | View on Reddit | 45 comments
Curious if any of you actually regularly use features in apps with local AI processing?
When I say "mainstream app", I mean more like PyCharm from JetBrains (i.e. making lots of money, large teams behind them, etc.) than an open-source/indie dev app.
And I'm more talking about a feature in an app (which does a bunch of things other than that AI feature), as opposed to an app that's entirely about using AI locally, like Ollama, LMStudio, etc.
I'm also not talking about OS features, e.g. auto-complete on iPhones. More interested in app that you've downloaded.
Currently, the only thing I can think of in my day-to-day is code completion in PyCharm, but even that is now some kind of hybrid local/cloud thing.
intofuture@reddit (OP)
Great take
xmmr@reddit
VLC
intofuture@reddit (OP)
Oh yeh I saw their subtitle thing. Quite cool
AgnosticAndroid@reddit
PotPlayer also. Recently got whisper integration for generating subtitles on the fly.
intofuture@reddit (OP)
Nice. Have you used it? Is it decent?
xmmr@reddit
I fear none of them use large v3 but something lighter, and considering even large v3 perform poorly without audio tweaks, can't imagine under that...
Defiant-Mood6717@reddit
The lack of such apps highlights the energy inneficiency of modern LLMs. We need to do a lot more work to have 100T parameter models run on 30W of energy or otherwise, we will never see such apps. Instead, they will simply run on cloud services, APIs
intofuture@reddit (OP)
Smaller models that are locally feasible can be good enough for certain tasks though, no?
Think JetBrains' model is only like 100MB. I find it pretty good in terms of both code quality and system util
Defiant-Mood6717@reddit
So LLMs are just for basic code completion?
Where is your ambition? Do you not want agents, AGI, irradicate poverty and create massive value?
Do you not want them to run on everyones local machine for 30W of power? My point is they will never run on your local machine if we can't make hardware more efficient, since they require trillions of parameters, you will NEVER have AGI with 100M parameters, that is like saying a ant sized brain with the right training can get to human level intelligence, physically impossible.
Let's instead focus on making hardware like the brain , that can run 100T parameter models on sandwich power, 30W, only then these apps may start to exist
intofuture@reddit (OP)
Fair point. Agree we need to see some big improvements to HW. There could also be some big innovations on the SW-side, though, which mean we don't necessarily need 100T param models for insanely useful AI
Defiant-Mood6717@reddit
I agree 100T parameters is not exactly a minimum requirement, I use it as an example because the human brain has 100T connections, but yeah I would say, for a future really useful model, we need to be able to run at least like 0.5-1T parameter models for 30W energy. Things like architecture help a lot in making use of those parameters efficiently, so yeah software is also important, if we consider architecture software. For example, MoE architectures allow us to have 1T parameters but only use 100B at a time, which is very efficient and actually what the human brain does, we dont use 100T connections all the time, this is the classical saying that we only use 10% of our brain or whatever, that is true, our brain is also MoE
shokuninstudio@reddit
We use our whole brain all the time. The 10% thing is a silly myth. In the past this meme was 5% and even 3%.
https://en.wikipedia.org/wiki/Ten-percent-of-the-brain_myth
Defiant-Mood6717@reddit
You completely misunderstood what I meant by only using 10% . I meant we use only 10% AT A TIME , but yes we obviously do use all of it. I later researched a bit and it's even lower than that. Your brain has total 100T synapses (parameters) but only millions to billions are active at a certain time
shokuninstudio@reddit
That is still nonsense. You might be the only mammal alive who uses 10% of their brain “at a time” but don’t include the rest of us in your issue.
nicolas_06@reddit
For AGI first it should exist first, but for me a true real AGI would be humanoid and I guess first version could cost millions and have big batteries and consume a few hundred watts. Anyway to move something that may be 100 pounds and can also do so physical activities you are not restricted to 30W. Also the first version might cost more 100K-1 million than 1K$.
I don't see the link with AI and eradicate poverty and that thing running locally or not. But I think that enough demand we can see architectures like project digits of Nvidia or AMD AI 300 series to be what we have commonly in new laptops 5 years from now and in smartphones in 10 years.
So it is just a mater of waiting.
Aaaaaaaaaeeeee@reddit
It is reasonable to expect 4W models on the snapdragon elite chips. 7B-8B range models are 18 t/s, prompt processing is 700 t/s. I have verified personally. Phi-MoE will have this exact same performance once the software catches up. Need to wait a year though probably, and new models may arise with this exact specification (7B active parameters)
You can boost the model speed further by 2x, with and without further training of the model, described in the TEAL paper, and Q-sparse. It should work with 4bit/2bit.
Its fully possible to get GPT4 level on mobile or tablet with all their bandwidth constraints, with the correct fit of MoE for your RAM. You just need 16-64gb.
Defiant-Mood6717@reddit
That is very impressive, 4W is that real? What is the quantization on that 8B model you verified?
I find this suspicious. So you are saying its possible to run 80B active parameter model for 40W? Why don't they make a GPU that does that instead of a phone?
From what you tell me, sounds like snapdragon elite chip should be on datacenters instead of the power hungry NVIDIA GPUs , there has to be a catch here I am not seeing
Fast-Satisfaction482@reddit
First, the technology is still pretty new and second it is not as general as a GPU. It can achieve that size of models only by using their own quantization and only for inference.
On the other hand, AI companies seem to be making money only with their frontier models, so tiny 8b models don't seem to be attractive to them.
Aaaaaaaaaeeeee@reddit
The 7-8B models on the HTP backend Qualcomm 8 elite use or 4-5W. I didn't say 80B active, I would guess much less is actually needed. (Deepseek V3 is 37B). Nvidia GPUs have a place, they have high bandwidth and high compute. The edge devices often have low bandwidth, but you can minmax on the compute heavy stuff since you can't change the RAM bandwidth speed.
Its very interesting to theorize about the most optimal model you can run with the constraints these things. If we never get better hardware for cheap, we will still be winners.
MmmmMorphine@reddit
Well shit, yeah, that's essentially the energy efficiency of the human brain. We aren't gonna get there for a long long time short of actually using brain-machine interfaces with brain organoids (or similar mechanisms leveraging biological systems)
nicolas_06@reddit
Also as most people use phones first, it is more like 1W.
Lissanro@reddit
Even though there are some mainstream apps (like VLC, for example) that integrate lightweight AI, the main issue that most people do not have enough memory for good general AI. And even when there is enough VRAM, when AI is being used, it blocks other applications from running unless there is huge excess of memory. For example, when I am using Mistral Large 2411 123B 5bpw along with 7B draft model, they consumer nearly all of my 96GB VRAM, even video player may produce out of memory errors.
For a single GPU system, even 7B-32B LLM can consume most or all of available VRAM. But mainstream apps generally aim to be lightweight and efficient - this is why neither apps nor games use AI widely yet, just too resource heavy for today's typical hardware. Especially given the user usually wants to be able to run more than one app, and each app usually wants to use its own AI it was developed for.
I have no doubt that this will change in the future, but the change is going to be gradual - we will see more and more application integrating lightweight specialized AI. Eventually, when a typical system will have many terabytes of fast memory (or at least few hundreds), we will see more advanced features and heavier AI being integrated to the mainstream apps. But it may take a while, this is not going to happen in just few years despite how fast AI is being developed - the best case scenario, by the next decade.
Ruin-Capable@reddit
Jetbrains single-line code completion is a completely local AI thing.
They also have a co-pilot like AI tool that requires a subscription, even when you run it on prem.
intofuture@reddit (OP)
I think multi-line completion is now done locally in Jetbrains too [source]
Do you know anything about how it decides whether to do things locally or in the cloud?
Ruin-Capable@reddit
The AI assistant requires a Pro subscription. The single-line completion does not. There is a free plugin called "Continue" that works with local OpenAI compatible endpoints (I've used it with the LMStudio endpoints).
cobbleplox@reddit
Sadly I had to make a powerpoint presentation, and I never used powerpoint before. So I shoved some text and an image o each slide and constantly pressed the ai designer button and selected the least worst one. I'm glad it was there. Had to manually remove all the shitty animations though.
Top-Salamander-2525@reddit
You would be better off using a different LLM to generate a markdown version of the presentation for you. Might require a bit of finessing the image names etc but they’re very good at making markdown.
Can then use various tools to convert markdown to PowerPoint (or other presentation formats) and choose styles.
https://gist.github.com/johnloy/27dd124ad40e210e91c70dd1c24ac8c8
intofuture@reddit (OP)
Lmao. Didn't realize powerpoint had an AI designer thing that runs locally
cobbleplox@reddit
Oh you said locally. It's ms copilot in this case.
intofuture@reddit (OP)
Ah I see. Yeh, I'm not really sure which MS AI features run locally. I would've thought some copilot stuff might. They do seem to talk/post things about local AI every now and then
CtrlAltDelve@reddit
Kerlig has been one of my absolute favorite tools. It works with both cloud models and local models.
For spelling/grammar/instant summarization, I have found it to be phenomenal. It also has opened up my eyes to how extremely capable smaller 8B models are when used specifically for these purposes.
Well worth the money in my opinion: https://www.kerlig.com/
(MacOS Only)
jaarson@reddit
Thanks for the shout-out! 🙇♂️
intofuture@reddit (OP)
Interesting! Haven't heard of it. Guess it's from an indie dev?
CtrlAltDelve@reddit
Yep it absolutely is :)
Not me though! Dev is /u/jaarson
frivolousfidget@reddit
Apple inteligence runs a bunch of stuff locally
eras@reddit
Firefox does local language translation.
intofuture@reddit (OP)
Very cool
Majestic-Quarter-958@reddit
I created this app a while ago that keeps track of your local files using llm:
https://github.com/AIxHunter/FileWizardAi
intofuture@reddit (OP)
Nice one, looks cool. Post was supposed to be more about non-open source / indie dev apps though
ABC4A_@reddit
Homeassistant. You can use Ollama to help with it's voice assistant feature
Aaaaaaaaaeeeee@reddit
FUTO app, its a good keyboard with whisper.tiny ggml built-in. I use it for everything phone related
FullOf_Bad_Ideas@reddit
Not really what you wanted but I like this Libreoffice extension.
https://github.com/balisujohn/localwriter
intofuture@reddit (OP)
Haha yeh, this would def fall into my classification of "open-source/indie-dev". Still cool though!
JuniorConsultant@reddit
It‘s debatable how useful it is in its current form, but Proton Mail has a feature called « Proton Scribe ». It’s a feature to draft emails, change tone, formatting etc. It’s a small model and not useful for anything outside of english at all.
intofuture@reddit (OP)
Nice, yeh Proton are a cool company