Multi-provider LLM fallback in Python — rolling your own ProviderPool vs existing solutions

Posted by aminoy77@reddit | Python | View on Reddit | 6 comments

Building a CLI agent that falls back automatically between OpenRouter, Ollama, OpenAI, Anthropic and Gemini when one hits rate limits or goes down. Ended up building a ProviderPool class that tracks exhausted providers with timestamps and retries after a configurable window. Works well but feels like something that should already exist as a library. Searched PyPI and couldn't find anything purpose-built for this. Most LLM libraries handle single-provider retries but not cross-provider fallback. Curious if others have solved this differently or know of something I missed.

[-]

TheseTradition3191@reddit

worth noting if anthropic is in your pool: their responses include `cache_read_input_tokens` and `cache_write_input_tokens` alongside the regular input/output counts, and theyre priced very differently (cache reads are roughly 10x cheaper than input tokens). if your normalized shape only maps input_tokens you'll silently miscalculate cost on any cached calls.

also on the health check, anthropic has two separate rate limit buckets: requests-per-minute and tokens-per-minute. a successful 1-token ping doesnt confirm the tokens/minute bucket has headroom. the x-ratelimit-remaining-tokens response header after real calls is more reliable for deciding when to restore that provider.

for existing solutions, litellm has router-level fallbacks between providers if you havent looked at it recently. its not obvious from the top-level docs but its in the router section. might save you some of the plumbing depending on how much control you need over the retry windows

[-]

Substantial-Cost-429@reddit

This is a pattern more people should build out — provider redundancy is critical for any production LLM app and most libraries punted on it.

A few thoughts:

**litellm** does handle multi-provider fallback via its `fallbacks` param, though it's more config-heavy than a custom ProviderPool. Worth benchmarking against your implementation to see if it's worth the dependency.

**openai-fallback** on PyPI is another lightweight option but hasn't been maintained well.

For your ProviderPool: one thing to add is **provider health scoring** over time — don't just track "exhausted" state, but keep a rolling success rate per provider and weight selection toward more reliable providers. Timestamps-based backoff is a start but a Bayesian approach to provider selection gets you much further.

We build a lot of agentic workflows on top of multi-provider setups like this and keep an open-source repo of agent configs and patterns at https://github.com/caliber-ai-org/ai-setup — some of the tool integration configs there deal with exactly this kind of resilient multi-provider setup. Might be a useful reference.

[-]

Otherwise_Wave9374@reddit

This is exactly the kind of glue code that ends up getting rewritten in every agent project. I like the idea of tracking provider exhaustion with timestamps, its basically a circuit-breaker per provider plus a simple scheduler.

One thing that helped me was a single normalized response shape (tokens, latency, finish_reason, tool_calls, etc.) so the rest of the agent stack doesnt care which vendor answered.

Also if you havent already, its worth adding per-provider health checks (tiny 1-token ping) so you can re-enable a provider sooner than the retry window.

If youre building more agent-y workflows around this, Ive been collecting patterns around tool routing + fallbacks too: https://www.agentixlabs.com/

[-]

aminoy77@reddit (OP)

The circuit-breaker framing is exactly right — that's essentially what it is, just applied to LLM providers instead of services.

The normalized response shape is something I partially have but not fully — I standardize enough for the agent to work but latency and token tracking aren't unified yet. Good call, adding that to the backlog.

The 1-token health check is clever. Right now I just wait for the retry window, but proactive re-enabling would make the fallback much smoother in practice.

Checking out agentixlabs now.

[-]

Evs91@reddit

This is why tools like LiteLLM Gateway were made.

[-]

aminoy77@reddit (OP)

Looked at LiteLLM Gateway — it's a great option if you're running a server. My use case is a local CLI agent so I wanted something embedded, no extra process to manage. Rolling my own ProviderPool ended up being \~80 lines and fits the use case better.