Is Gemma 4 26B-A4B worse than Qwen 3.5 35B-A3B with tool calls, even after all the fixes?

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 29 comments

I’m trying it on my home grown tool call setup with llama.cpp and it’s just NOT working. Like it makes the DUMBEST mistakes.

I got the official template from google, I updated cuda to 13.1 (NOT 13.2 which apparently has issues), I’m not quantizing the cache, I’m running it with Q4, I tried bartowski, unsloth, and a heretic version… like what the hell.

It does things like call tools that don’t exist even though my wrapper clearly tells it what tools exist.

I’m super disappointed because I love its personality so much more than qwen’s. Please someone help!