OpenAI's moat didn't leak, three forces broke it at once

Posted by medi6@reddit | LocalLLaMA | View on Reddit | 21 comments

Everyone's framing the past two weeks as OpenAI vs Google. ChatGPT DAUs down 6% after Gemini 3 launched. Sam Altman declares "Code Red" internally. Pauses ads, shopping, health agents. All hands on deck. But i think that's not the story. The story is that three forces converged in the same month, and OpenAI can't outrun all of them at once. **Force 1: China's november** Chinese labs shipped 15 open-weight models in November. Not research previews. Production-ready, MIT-licensed models. **The headline:** Moonshot AI's Kimi K2 Thinking. 1 trillion parameters, 32 billion active. Scores 67 on Artificial Analysis. For context, GPT-5 medium scores 66. All that with quite a price difference: \- GPT-5 medium: $3.44/M tokens \- Kimi K2 Thinking: $1.07/M tokens \- DeepSeek V3.2: $0.32/M tokens for the same capability tier, 10x spread. All the cheap ones are now open-weight. November also brought VibeThinker (beats DeepSeek-R1 on AIME math), Step-Audio-R1 (first open audio reasoning model), HunyuanVideo 1.5 (8.3B video gen), and a half-dozen others. The pace didn't slow down for a single week. **Force 2: efficiency ate scale** Two years ago, bigger meant better. The compute barrier was the moat, but not anymore. OpenAI's own leaked model proves it. "Garlic" is smaller than their flagships but reportedly beats GPT-4.5 on coding and reasoning. They're targeting "big-model intelligence in smaller architectures." Expected as GPT-5.2 or 5.5 early 2026. Mistral dropped 10 open-weight models yesterday. The flagship is a 675B MoE. But the real story is Ministral 3B. It runs entirely in your browser via WebGPU. No server. No API call. No cloud bill. Three billion parameters, SOTA for its size class, running on your laptop. The paradigm flipped, efficient is the new big. **Force 3: silicon broke free** Amazon's Trainium3 launched December 2. 3nm chip, 4x faster than Trainium2, 40% more energy efficient. Anthropic is already using it for Claude. The bigger news: Trainium4 will support NVIDIA's NVLink Fusion. You'll be able to mix AWS silicon with NVIDIA hardware in the same cluster. The CUDA lock-in that kept everyone on NVIDIA is cracking. Same day, Tether released QVAC Fabric. First production-ready framework for fine-tuning LLMs on consumer GPUs and mobile devices. Qualcomm Adreno, ARM Mali, Apple Silicon. Apache 2.0. Cloud inference costs used to be the barrier to entry. That barrier is falling. **bottom line** OpenAI's panic isn't about Gemini beating ChatGPT on some benchmark. It's about the simultaneous arrival of: \- Open-weight models that match proprietary at 1/10th the price \- Efficient architectures that obsolete scale advantages \- Silicon that breaks cloud monopolies The moat didn't spring a leak. Three forces broke it at once. And no single company can outrun physics. Wrote up the full analysis with data tables and more context: [https://www.whatllm.org/blog/three-forces-broke-openai-moat](https://www.whatllm.org/blog/three-forces-broke-openai-moat)