Running Claude Code + Codex + a local 35B as a three-tier agent stack. Why I stopped betting on one provider.

Posted by Joozio@reddit | LocalLLaMA | View on Reddit | 3 comments

Two months ago I simplified to Claude Max 20x only. $200/month, one CLI, one model. For my autonomous agent it covered everything.

Last week I renewed Codex Pro at another $200/month after Opus 4.7 made my default workflow untenable. Now I run three tiers:

  1. Claude Code > tuned skills, Anthropic prompt caching (a 5x cost saver when it hits), familiar hooks
  2. Codex (GPT-5.4) > better web search, deeper codebase maps, generous usage ceiling
  3. Local 35B on Mac Mini M4 > classify / route / small-glue tasks, cheap preprocessing

My agent has a dynamic switcher. Same memory, same skills, same routing. Only the harness + model flip. Research-heavy work goes to Codex. Architectural refactors go to Codex. Agent-tuned automation glue stays on Claude Code. Classify and route goes to the local 35B.

The reason for the pivot off Claude-only: Opus 4.7 at default effort is measurably lazier than 4.6. AMD's Senior Director of AI (Stella Laurenzo) filed GitHub #42796 with her team's 6,852-session analysis. Read:Edit ratio dropped 70%, API requests 80x for worse output. The honest caveat: 4.7 at max reasoning still delivers, but max burns 3-4x more tokens and my weekly ceiling arrives on Tuesday instead of Friday.

Local LLMs are my cheap third lane, not a substitute for the frontier. A 35B at Q4 is good enough for classification and small glue, not for architectural refactors.

Net: $300/month in subscriptions plus negligible local cost. Roughly 2x the weekly throughput I had a month ago.

Anyone else running a multi-provider stack with a local preprocessing tier? Curious what splits you have settled on.

Full write-up and the switcher design: https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026