Compared actual usage costs for Chinese AI models. Token efficiency changes everything.

Posted by YormeSachi@reddit | LocalLLaMA | View on Reddit | 41 comments

Everyone talks about per-token pricing but nobody mentions token efficiency. How many tokens does it take to complete the same task?

Tested this with coding tasks cause thats where I actually use these models.

glm-4.6: $0.15 input / $0.60 output Kimi K2: $1.50-2.00 MiniMax: $0.80-1.20 deepseek: $0.28

deepseek looks cheapest on paper. But thats not the whole story.

Token efficiency (same task):

Gave each model identical coding task: "refactor this component to use hooks, add error handling, write tests"

glm: 8,200 tokens average deepseek: 14,800 tokens average MiniMax: 10,500 tokens average, Kimi: 11,000 tokens average

glm uses 26% fewer tokens than Kimi, 45% fewer than deepseek.

Real cost for that task:

glm: \~$0.04 (4 cents) deepseek: \~$0.03 (3 cents) - looks cheaper MiniMax: \~$0.05 (5 cents) Kimi: \~$0.09 (9 cents)

But wait. If you do 100 similar tasks:

glm: Total tokens needed: \~820K, Cost: $0.40-0.50 deepseek: Total tokens needed: \~1.48M, Cost: $0.41 - basically same as glm despite lower per-token price MiniMax: Total tokens needed: \~1.05M, Cost: $0.50-0.60 Kimi: Total tokens needed: \~1.1M, Cost: $0.90-1.00

Token efficiency beats per-token price. glm generates less verbose code, fewer explanatory comments, tighter solutions. deepseek tends to over-explain and generate longer outputs.

For businesses doing thousands of API calls daily, glms efficiency compounds into real savings even though its not the absolute cheapest per-token.

Switched to glm for production workloads. Monthly costs dropped 60% vs previous setup. Performance is adequate for 90% of tasks.

deepseeks pricing looks great until you realize youre using 50% more tokens per task. The savings disappear.

Anyone else measuring token efficiency? Feel like this is the underrated metric everyone ignores.