Running DeepSeek locally using ONNX Runtime
Posted by DangerousGood4561@reddit | LocalLLaMA | View on Reddit | 7 comments
Just wanted to drop this here for anyone interested in running models locally using ONNX Runtime. The focus here is on using the NPU in Snapdragon X Elite, but can be extended to other systems as well!
Willing_Landscape_61@reddit
You forgot the operative word "distill".
DangerousGood4561@reddit (OP)
That’s fair, I wish I could edit the title since it seems to be a trigger for some.
Tenzu9@reddit
So damn infuriating to keep seeing the distills keep being referred to as vanilla "Deepseek". I click on a post expecting a scrappy CPU + Ram setup or a crazy GPU cluster. Instead, I see someone asking about the mediocre 8B Qwen3 distill (or the old Qwen2/Llama3 ones)
DangerousGood4561@reddit (OP)
It’s more about showing an alternative method to running LLMs locally than the LLM itself. I could have put run Mistral locally using ONNX Runtime
loyalekoinu88@reddit
What flavor of deepseek?
DangerousGood4561@reddit (OP)
Qwen Distilled 7B and 1.5B
Conscious_Chef_3233@reddit
that's closer to run qwen, since they do not use deepseek architecture