Intel NPU cannot run a LLM, can it?

Posted by wossnameX@reddit | LocalLLaMA | View on Reddit | 7 comments

I think so. And the ARC iFGX on many laptops is "good enough" for many use-cases.

I wrote code to for a work-project under GDPR; Worked well enough. 15.000 images compared overnight; Took about 7 hours.
Slow, but secure.

[-]

wossnameX@reddit (OP)

...and once that work project was done, I thought: It will be tiresome to rewrite all this code for the next problem.
So; I made an OpenAI-compatible API endpoint.
Then an Ollama-compatible API endpoint.

And is just continued adding on features.
So; Suddenly I had a system that could run VL llm on, say, the ARC iGFX and a text model on the NPU.
Slow, but still usable - and with the speed that small models is getting better these days, it is only a matter of time until this is really realtime-usable.

[-]