Local AI video pipeline review: Qwen3 27B beat Gemma 4 26B for tool calling

Posted by Practical_Low29@reddit | LocalLLaMA | View on Reddit | 14 comments

Watched All About AI's 100% local Fireship-style video automation experiment over the weekend (link in comments). A few things worth flagging if you're trying the same stack.

Tool calling reliability was where the two diverged. Gemma 4 26B kept getting stuck in tool-call loops on his rig. Qwen 3.6 27B handled the same orchestration cleanly, no wasted thinking tokens. That gap is bigger than benchmark numbers suggest once you push real agent workflows through it.

For images he ran Said Image Turbo locally off Hugging Face. Open weights, no API spend. Solid for meme-style cards. Portrait shots are where you'd probably reach for a Flux or Seedream call instead.

Orchestration was OpenCode end-to-end. Context window climbed to 174K tokens and the to-do list wasn't fully completed in one shot. He stepped away from the rig mid-run and came back to a partial result, which is honestly the realistic version of "AI did the work for me".

For people not wanting to run a 27B model locally, Qwen3 family is on a few inference providers so the API path keeps the same weights without the GPU upfront. Tool-call behavior holds since the model is the same.

If you've benchmarked Qwen3 tool-calling failure rate vs DeepSeek V4 on a specific stack (open-claw, Aider, custom loop), I'd want to see the actual numbers.