Client had 4 agents on GPT-4o. One was classifying documents. That one alone had 91% savings potential.

Posted by Dramatic_Strain7370@reddit | LocalLLaMA | View on Reddit | 15 comments

I do some consulting work with AI startups. One client was upset with their OpenAI bill — they had 4 agents in production and felt like they were overpaying but weren't sure by how much. Nor had great intuition on how to go about evaluating the models.

I looked at what each agent was actually doing:

All four were running on GPT-4o. It costs $2.5/$10 for in/out 1M tokens. They used the same model for every request (not good).

When I broke down what each agent was actually asking the model to do, the picture got interesting:

Agent Simple prompts Potential savings with Model Switching
SEC summarization \~40% 65–77%
Financial chatbot \~75% 77–83%
Document classification \~80% 91%
Monitoring \~80% 83%

The SEC summarization is nuanced — financial filings are complex so a higher percentage stayed on the premium model. Also the input tokens are like 30K at each prompt. But the classification and monitoring agents were doing straightforward tasks on an expensive model for no real reason.

To make this easier to estimate for other setups, I built a quick LLM savings calculator. Enter your monthly spend, primary model, and workload type — it estimates what you'd save routing simple prompts to a cheaper model in the same provider family.

Disclosure: I'm a founder building in this space — the calculator ended up as a free tool on our website. Drop a comment if you want the link, happy to share.

Curious what others are using to track and optimize LLM spend?