Trying to understand how Claude Code token usage maps to quota consumption.
Posted by Eastern_Campaign5482@reddit | LocalLLaMA | View on Reddit | 3 comments
I ran into something confusing with Claude Code and wanted to check if others have seen this.
My usage:
- 2 conversations total
- One conversation: \~70k–100k tokens (as reported by Claude Code)
- Another conversation: \~11k tokens
- Mostly just reviewing code, no heavy generation or large tasks
So in total, roughly \~100k tokens reported.
However, this already consumed over 90% of my 5-hour quota.
This seems highly disproportionate.
Possible explanations I’m considering:
- Hidden token usage (tool calls, file operations, etc.)
- Context being repeatedly reprocessed
- Reported token count not reflecting actual billed usage
I’ve already contacted support to clarify.
Also, is there any reliable way to track how token usage maps to quota consumption in Claude Code?
Right now, the reported token count (\~100k) doesn’t explain the actual quota usage (90%+), which makes it very hard to predict or control usage.
Curious if others have experienced something similar or can explain how this is supposed to work.
If support provides any explanation, I’ll update this thread.
Hyp3rSoniX@reddit
You might want to ask this in r/Anthropic instead. There you will find more people actually using Claude Code.
Here you will mostly find folks who use locally runnable or open source/weight models instead.
About your question:
Claude Code has some baked in System prompts and tools that fill up the context by itself. In my case, a fresh session without any message already takes up 23.8k tokens of the context window:
Anthropic keeps updating and changing the system prompt, so the number will change from time to time. Also the model itself is secretive about its system prompt. When asked to show, it will refuse to do so.
Also calls have a prompt cache ttl (time to live) of 5 minutes. So whenever the model answers and you wait longer than 5 Minutes to re-prompt - the cache will have died and your entire current context will deduce from your quota. Otherwise if you stay within the prompt-cache timer and if the cache does get hit, only newly added or generated tokens are "billed".
I'm also not sure about the thinking/reasoning of the model. Even though by default it is kinda hidden in the cli itself, I'm not sure if in the background those are sent back-and-forth as well - which also could explain the usage.
But in general, the 20$ claude subscription yields very little usage. Some people can't even get one prompt finished within the quota.
user92554125@reddit
The default system prompt is bloated as f. I use this to patch the system prompt alongside some usability tweaks. Been using it for \~2 months without bans, and context usage is greatly improved with no performance degradation.
I'm not the author of the tool, and not sure if using it can flag your account.
sagiroth@reddit
What you are experiencing is initial system prompt which is bloated that is embedded into Claude code. At work I often just ask simple question and Bam 6% of 4 hour quota spent. Next one much smaller.