Multi-provider LLM fallback in Python —
rolling your own ProviderPool vs existing solutions
Posted by aminoy77@reddit | Python | View on Reddit | 6 comments
Building a CLI agent that falls back automatically between OpenRouter, Ollama, OpenAI, Anthropic and Gemini when one hits rate limits or goes down. Ended up building a ProviderPool class that tracks exhausted providers with timestamps and retries after a configurable window. Works well but feels like something that should already exist as a library. Searched PyPI and couldn't find anything purpose-built for this. Most LLM libraries handle single-provider retries but not cross-provider fallback. Curious if others have solved this differently or know of something I missed.
TheseTradition3191@reddit
worth noting if anthropic is in your pool: their responses include `cache_read_input_tokens` and `cache_write_input_tokens` alongside the regular input/output counts, and theyre priced very differently (cache reads are roughly 10x cheaper than input tokens). if your normalized shape only maps input_tokens you'll silently miscalculate cost on any cached calls.
also on the health check, anthropic has two separate rate limit buckets: requests-per-minute and tokens-per-minute. a successful 1-token ping doesnt confirm the tokens/minute bucket has headroom. the x-ratelimit-remaining-tokens response header after real calls is more reliable for deciding when to restore that provider.
for existing solutions, litellm has router-level fallbacks between providers if you havent looked at it recently. its not obvious from the top-level docs but its in the router section. might save you some of the plumbing depending on how much control you need over the retry windows
Substantial-Cost-429@reddit
This is a pattern more people should build out — provider redundancy is critical for any production LLM app and most libraries punted on it.
A few thoughts:
**litellm** does handle multi-provider fallback via its `fallbacks` param, though it's more config-heavy than a custom ProviderPool. Worth benchmarking against your implementation to see if it's worth the dependency.
**openai-fallback** on PyPI is another lightweight option but hasn't been maintained well.
For your ProviderPool: one thing to add is **provider health scoring** over time — don't just track "exhausted" state, but keep a rolling success rate per provider and weight selection toward more reliable providers. Timestamps-based backoff is a start but a Bayesian approach to provider selection gets you much further.
We build a lot of agentic workflows on top of multi-provider setups like this and keep an open-source repo of agent configs and patterns at https://github.com/caliber-ai-org/ai-setup — some of the tool integration configs there deal with exactly this kind of resilient multi-provider setup. Might be a useful reference.
Otherwise_Wave9374@reddit
This is exactly the kind of glue code that ends up getting rewritten in every agent project. I like the idea of tracking provider exhaustion with timestamps, its basically a circuit-breaker per provider plus a simple scheduler.
One thing that helped me was a single normalized response shape (tokens, latency, finish_reason, tool_calls, etc.) so the rest of the agent stack doesnt care which vendor answered.
Also if you havent already, its worth adding per-provider health checks (tiny 1-token ping) so you can re-enable a provider sooner than the retry window.
If youre building more agent-y workflows around this, Ive been collecting patterns around tool routing + fallbacks too: https://www.agentixlabs.com/
aminoy77@reddit (OP)
The circuit-breaker framing is exactly right — that's essentially what it is, just applied to LLM providers instead of services.
The normalized response shape is something I partially have but not fully — I standardize enough for the agent to work but latency and token tracking aren't unified yet. Good call, adding that to the backlog.
The 1-token health check is clever. Right now I just wait for the retry window, but proactive re-enabling would make the fallback much smoother in practice.
Checking out agentixlabs now.
Evs91@reddit
This is why tools like LiteLLM Gateway were made.
aminoy77@reddit (OP)
Looked at LiteLLM Gateway — it's a great option if you're running a server. My use case is a local CLI agent so I wanted something embedded, no extra process to manage. Rolling my own ProviderPool ended up being \~80 lines and fits the use case better.