Client had 4 agents on GPT-4o. One was classifying documents. That one alone had 91% savings potential.

Posted by Dramatic_Strain7370@reddit | LocalLLaMA | View on Reddit | 15 comments

I do some consulting work with AI startups. One client was upset with their OpenAI bill — they had 4 agents in production and felt like they were overpaying but weren't sure by how much. Nor had great intuition on how to go about evaluating the models.

I looked at what each agent was actually doing:

SEC report summarization — processing long financial filings into summaries
Financial advisory chatbot — answering client questions about portfolios
Document classification — documents categorization by type and urgency
Monitoring agent — checking system health and flagging anomalies

All four were running on GPT-4o. It costs $2.5/$10 for in/out 1M tokens. They used the same model for every request (not good).

When I broke down what each agent was actually asking the model to do, the picture got interesting:

Agent	Simple prompts	Potential savings with Model Switching
SEC summarization	\~40%	65–77%
Financial chatbot	\~75%	77–83%
Document classification	\~80%	91%
Monitoring	\~80%	83%

The SEC summarization is nuanced — financial filings are complex so a higher percentage stayed on the premium model. Also the input tokens are like 30K at each prompt. But the classification and monitoring agents were doing straightforward tasks on an expensive model for no real reason.

To make this easier to estimate for other setups, I built a quick LLM savings calculator. Enter your monthly spend, primary model, and workload type — it estimates what you'd save routing simple prompts to a cheaper model in the same provider family.

Disclosure: I'm a founder building in this space — the calculator ended up as a free tool on our website. Drop a comment if you want the link, happy to share.

Curious what others are using to track and optimize LLM spend?

[-]

Silver-Champion-4846@reddit

I'm confused, isn't gpt4o dead?

[-]

Dramatic_Strain7370@reddit (OP)

it is not dead. it is cheaper and many companies dont change the model they started with

[-]

Silver-Champion-4846@reddit

Oh that thing where companies just stick to what works until they get a reason to change it, which includes it no longer being available?

[-]

Dramatic_Strain7370@reddit (OP)

gpt-4o is available. this is the output of a curl that is just wrote to re-confirm.. I am hiding keys etc

>>> curl .... -d '{"model":"gpt-4o","max_tokens":100,"messages":[{"role":"user","content":"tell me about usa"}]}'

RESPONSE BACK

{

"id": "chatcmpl-DaNgBizukgyhVbH1WGT7gu1Cvs4VS",

"object": "chat.completion",

"created": 1777563203,

"model": "gpt-4o-2024-08-06",

"choices": [

{

"index": 0,

"message": {

"role": "assistant",

"content": "The United States of America (USA) is a federal republic composed of 50 states, a federal district, five major self-governing territories, and various possessions. Here are key aspects about the USA:\n\n1. **Geography**: \n - The USA is the third-largest country by land area, with diverse geography including mountains (such as the Rockies and Appalachians), plains, forests, deserts, and coastlines along the Atlantic and Pacific Oceans.\n - The country is bordered by",

"refusal": null,

"annotations": []

"logprobs": null,

"finish_reason": "length"

}

"usage": {

"prompt_tokens": 11,

"completion_tokens": 100,

"total_tokens": 111,

"prompt_tokens_details": {

"cached_tokens": 0,

"audio_tokens": 0

"completion_tokens_details": {

"reasoning_tokens": 0,

"audio_tokens": 0,

"accepted_prediction_tokens": 0,

"rejected_prediction_tokens": 0

}

"service_tier": "default",

"system_fingerprint": "fp_8aed6409fd"

}

[-]

Silver-Champion-4846@reddit

yes yes I wasn't doubting you.

[-]

Dramatic_Strain7370@reddit (OP)

the list of model leaderboard can be found here as an easy cheat sheet . https://www.cloudidr.com/llm-pricing

[-]

Dramatic_Strain7370@reddit (OP)

LLM cost savings calculator link is this >> https://www.cloudidr.com/savings-calculator?utm_source=reddit&utm_medium=comment

[-]

parasen16@reddit

sounds like you've got a solid grasp on their usage, which is key. switching models can really drive those savings, especially if the tasks vary in complexity. for instance, you could save a lot on the document classification agent by opting for a lighter model, since that doesn't need the full power of something like GPT-4o. easy win there. a friend of mine recently used the Safe AI Starter Kit to set up protocols for handling sensitive data while using AI, which helped them streamline costs and keep everything compliant. worth looking into if they’re handling any confidential info. keep digging, you’ve got this!

[-]

MelodicRecognition7@reddit

bro adjust your spambots, they are too obvious.

[-]

CalligrapherFar7833@reddit

Retarded vibe slop post

[-]

parasen16@reddit

sounds like you've got a solid grasp on their usage. switching models could definitely save them some cash, especially since each agent has different needs. for something like document classification, a lighter model might do the trick with minimal quality loss. when it comes to managing sensitive info, i found the Safe AI Starter Kit pretty handy for creating a safe protocol while using AI tools. that way, they can optimize costs without freaking out about data leaks. keep pushing them to refine their strategies!

[-]