Running DeepSeek locally using ONNX Runtime

Posted by DangerousGood4561@reddit | LocalLLaMA | View on Reddit | 7 comments

Just wanted to drop this here for anyone interested in running models locally using ONNX Runtime. The focus here is on using the NPU in Snapdragon X Elite, but can be extended to other systems as well!

[-]

Willing_Landscape_61@reddit

You forgot the operative word "distill".

[-]

DangerousGood4561@reddit (OP)

That’s fair, I wish I could edit the title since it seems to be a trigger for some.

[-]

Tenzu9@reddit

So damn infuriating to keep seeing the distills keep being referred to as vanilla "Deepseek". I click on a post expecting a scrappy CPU + Ram setup or a crazy GPU cluster. Instead, I see someone asking about the mediocre 8B Qwen3 distill (or the old Qwen2/Llama3 ones)

[-]

DangerousGood4561@reddit (OP)

It’s more about showing an alternative method to running LLMs locally than the LLM itself. I could have put run Mistral locally using ONNX Runtime

[-]

loyalekoinu88@reddit

What flavor of deepseek?

[-]

DangerousGood4561@reddit (OP)

Qwen Distilled 7B and 1.5B

[-]

Conscious_Chef_3233@reddit

that's closer to run qwen, since they do not use deepseek architecture