Ever wonder how much cost you can save when coding with local LLM?

Posted by bobaburger@reddit | LocalLLaMA | View on Reddit | 147 comments

For the past few days, I've been using Qwen3.5 35B A3B (Q2_K_XL and Q4_K_M) inside Claude Code to build a pet project.

The model was able to complete almost everything I asked, there were some intelligence issues here and there, but so far, the project was pretty much usable. Within Claude Code, even Q2 was very good at picking up the right tool/skills, spawning subagents to write code, verify the results,...

And, here come the interesting part: In the latest session (see the screenshot), the model worked for 2 minutes, consumed 2M tokens, and `ccusage` estimated that if using Claude Sonnet 4.6, it would cost me $10.85.

All of that, I paid nothing except for two minutes of 400W electricity for the PC.

Also, with the current situation of the Qwen team, it's sad to think about the uncertainty, will we have other open source Qwen models coming or not, or it will be another Meta's Llama.