Intel NPU cannot run a LLM, can it?
Posted by wossnameX@reddit | LocalLLaMA | View on Reddit | 7 comments
I think so. And the ARC iFGX on many laptops is "good enough" for many use-cases.
I wrote code to for a work-project under GDPR; Worked well enough. 15.000 images compared overnight; Took about 7 hours.
Slow, but secure.
wossnameX@reddit (OP)
...and once that work project was done, I thought: It will be tiresome to rewrite all this code for the next problem.
So; I made an OpenAI-compatible API endpoint.
Then an Ollama-compatible API endpoint.
And is just continued adding on features.
So; Suddenly I had a system that could run VL llm on, say, the ARC iGFX and a text model on the NPU.
Slow, but still usable - and with the speed that small models is getting better these days, it is only a matter of time until this is really realtime-usable.
wossnameX@reddit (OP)
In the end, I ended up with NPU Ollama, Not OLLAMA (by a long shot) or whatever retronym you prefer :-)
https://github.com/aweussom/NoLlama
No_Afternoon_4260@reddit
Intel NPU and ollama
Feels like a nightmare happening
wossnameX@reddit (OP)
Yes, running LLMs locally is terrifying. Also; Not Ollama.
SSOMGDSJD@reddit
I respect the hustle. What is your rig for this, an arrow lake laptop? Or core ultra 269k or whatever
wossnameX@reddit (OP)
Core Ultra 7 258V
anubhav_200@reddit
It can, check this https://www.reddit.com/r/LocalLLaMA/s/ZR4wLZNKCj