Gemma4 , all variants fails in Tool Calling
Posted by Voxandr@reddit | LocalLLaMA | View on Reddit | 70 comments
Floks who praising Gemma4 above Qwen 3.5 are not serious users. Nobody care about one-shot chat prompts on this day of Agentic engineering.
It is failing seriously and we cannot use it in any of proper coding agents : Cline , RooCode.
Tried UD Qaunts upt to Q8 , all fails.

Force88@reddit
Same with ollama (well I only know how to use ollama lol), it can't search the internet either will ollama windows app or openwebui...
Relative-Republic-27@reddit
for me tool calling was working on continue with ollama. llama.cpp gives me the issue
Voxandr@reddit (OP)
looks like all kind of tools calls are broken , i haven't seen any post about working coding agent tool calls here.
Chupa-Skrull@reddit
Do you think it's common or normal to post "hey everybody, tool calls are working for me, just wanted to let you know. Have a great day."
Monad_Maya@reddit
Works ok with VSCodium + Roocode (3.51.1) and llama.cpp b8665.
Model is Gemma 4 26B A4B, IQ4_XS from Unsloth.
Voxandr@reddit (OP)
I am trying with VLLM , even with VLLM it fails hard.
aldegr@reddit
vLLM still requires a few fixes: https://github.com/vllm-project/vllm/pull/39027
Voxandr@reddit (OP)
looks like gotta wait a few week.
aldegr@reddit
Llama.cpp has a custom template in its repo that helps with agentic flows. It’s very similar to the vLLM changes in this PR.
models/templates/google-gemma-4-31B-it-interleaved.jinja. It does require an agent that properly sends back reasoning, such as OpenCode or Pi. Unsure how the VSCode agents work nowadays.In short, the original templates were hamstrung for agents.
Voxandr@reddit (OP)
I am gonna run with it and report.
ivandagiant@reddit
Update?
Voxandr@reddit (OP)
Now with Cline it can go on for like 4 steps before starting to give tool calls errors, while Qwen models have no problem while I let them build and run for whole day.
yoracale@reddit
Have you tried Gemma 4 toolcalling via Unsloth Studio? It works even for Gemma 4B 4-bit
Processing img bxh3moiicztg1...
Here's an example of Gemma 4 4B 4bit executing code: https://x.com/i/status/2040161518898319728
lenne0816@reddit
that works for me too but after 15k context + all hell breaks loose and it starts hallucinating without ever making sense again until chat reset.
yoracale@reddit
Oh ok interesting do you have an example so we can debug? Thanks for trying it out btw!
lenne0816@reddit
I retry a very basic workflow again and again, explore a remote server via ssh, inventorize its services in a gethomepage style yaml and then merge that yaml with my actual homepage yaml. I can never pass the stage of merging it always collapses around there.
Voxandr@reddit (OP)
I will give a try.
Lorian0x7@reddit
Things are not properly implemented yet, why don't you help resolve the issue instead of just complaining.
Voxandr@reddit (OP)
So i don't have the right to complain what doesn't work? With that mentality all the software we use will be full of bugs coz everybody busy ass licking the developers. I am asking from others who parsing Gemma4 like return of Christ where there are much better working models exist so I am checking whats wrong with me.
kataryna91@reddit
You claimed "Folks who praising Gemma4 above Qwen 3.5 are not serious users." when the actual problem is that you have no clue what you are doing, while people who know what they are doing (or "not serious users" as you call it) have 100.00% successful tool call rate.
Voxandr@reddit (OP)
Can you give me how and where are that claims of 100% working ? I really want to us Gemma 4 on agentic uses , its perfect if it is intelligent , i prefer american model over chinese model running 24/7 taking control of the machine over chinese model but -- its inevitable truth that its is not tuned for it .
Lorian0x7@reddit
Yeah, exactly, and I'm telling you what's wrong with you. Coming here with this entitled attitude saying everyone is not serious because Gemma doesn't fit your specific usage it's a little pretentious, isn't it?. We all know there are still issues with the model and tool calling is not its strength. It's like complaining about an alpha version of something that doesn't work correctly.
Voxandr@reddit (OP)
I am frustrated that that Gemma 4 doesn't work at all on agentic tool uses, yet all the post are bootlicking Gemma 4 where there are a lot better model exists. Whats wrong with it?
And i want to check whats wrong with my setup thats why i posted it.
I don't consider who casually chat with a chatbot a serious user. More Serious uses cases are the one who use it for Coding , Agentic work flow , or ERP .
CommonPurpose1969@reddit
I spent the whole weekend changing a project to make it work with Gemma 4 E2B and E4B. And it is subpar compared to Qwen 3.5 4B, and I am not trying to shill for Qwen here. I really wanted Gemma 4 to work out, but it hasn't. I understand your frustration.
Gemma 4 is also very sensitive to the signature of tool calls, among other things. It generates to then write the answer. Other times it says it is going to call a tool, but it won't. It waits to be prompted to do so. Then it runs in circles, generating the same token sequence.
Lorian0x7@reddit
I didn't see any post bootlicking gemma4, anyone is saying qwen3.5 is superior for coding while Gemma is superior for RP
Express_Quail_1493@reddit
I low-key feel like it has a lot to do with the Security guardrails Google added. When im Reading the model reasoning tag. is like watching and anxious rabbit who treats everything pieces of code like risk management ritual.
qubridInc@reddit
Exactly, if a model can’t reliably handle tool calling, it’s not agent-ready no matter how good it looks in one-shot demos.
a_beautiful_rhind@reddit
You may want to test VLLM. llama.cpp support isn't 100% yet.
Voxandr@reddit (OP)
is that Llamacpp problem? i had synced to latest llamacpp so far
a_beautiful_rhind@reddit
yes, the model is pretty different from past ones and it has been slowly getting better.
Voxandr@reddit (OP)
vllm version 0.19.0
inference-1 | (APIServer pid=1) Value error, The checkpoint you are trying to load has model type `gemma4` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
Using docker.
a_beautiful_rhind@reddit
did you not upgrade to transformers 5.5?
Voxandr@reddit (OP)
so their docker have no transfer 5.5 i guess . gonna try installing directly
sisyphus-cycle@reddit
They have a Gemma specific docker image btw
Voxandr@reddit (OP)
trying that , with awq-4bit , i have 2x 4070TI-Super on this desktop (total 32GB VRAM) 4 bit should easily fit but OOMing. Any specific VLLM configs? I am using :
vllm serve --model "cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit" -tp 2 --port 4000 --gpu-memory-utilization 0.94 --kv-cache-dtype fp8_e4m3 --max-model-len 20000 -ep 2Voxandr@reddit (OP)
I got it working on Strixhalo but resutls are disaster. It cannot even do proper grep search. Gotta wait a while i guess.
sisyphus-cycle@reddit
Hm yeah idk, I can only run it at work where we have an A100. It works fine for us, but using fp16 safetensors so that could be why idk
Vardermir@reddit
I’ve been running Gemma 4 (Nvidia NVFP4 and QuantTrio’s) myself, but I have a weird issue where it never reasons. Using the recommended settings that nvidia provides for both. Is this something you’ve come across? Or still an implementation issue?
a_slay_nub@reddit
I've seen a lot of pull requests in vLLM post 0.19.0, I'm waiting a few weeks before bothering tbh
Voxandr@reddit (OP)
yeah looks like i need to come back after a few week - gonna stick to Qwen 3.5 122B for planning Qwen Next Coder for Coding for now. I tried vLLM and even grepping fails.
DinoAmino@reddit
I couldn't get an fp8 Gemma 4 31B to run in 0.19.0. I could only run it using the gemma4 labeled docker image, branched from 0.18.2. Even then endless tool looping sometimes occurred. Almost there, but not quite.
a_beautiful_rhind@reddit
Yea these models are taking a while everywhere.
a_slay_nub@reddit
I mean, these things take time and the errors are usually very subtle and tricky. With how tuned these models are, a single errant space can cause issues with a prompt template nowadays.
In the meantime, I choose to be grateful for what we have. GPT-OSS still works excellent. On release, it was shite too and took a while to get the kinks worked out. We just fixed the tool calling on our version because we had an older version of the tokenizer.
a_beautiful_rhind@reddit
I am already able to use it with chatML and other templates it saw. The current errors haven't been subtle for me, just intermittent.
Stuff like this: https://i.ibb.co/CpKLp28H/31b-miku.png
send-moobs-pls@reddit
There's always been a weird amount of Google "fans"
FullstackSensei@reddit
I don't think anybody claimed llama.cpp support for Gemma 4 is/was done.
People keep testing the same broken thing, and reporting the same issue every day.
Voxandr@reddit (OP)
They are mindleslly paraising for normal chatbot functions then.
ContextLengthMatters@reddit
I am using Gemma in oMlx and can hit tool calls each time. My problem with Gemma isn't its ability to do tool calls, it's just straight up refusing to consider it because the reasoning isn't as in depth.
I will say I'm not on the Gemma hype train because I have enough ram for a 120b moe and qwen3.5 delivers. My own use cases seem to be handled by qwen better when it comes to agentic stuff. Maybe if Gemma released a larger MoE that would change.
Voxandr@reddit (OP)
Yeah i am going back to 122b MOE and 3.5 for now
FullstackSensei@reddit
No, just good old lack of reading comprehension
ATK_DEC_SUS_REL@reddit
I’m fortunate to have access to an H200 for experimenting with Gemma 4-31b. I’m using manual generation loops and I’m very happy with Gemma. You guys are going to love it when llama.cpp is stable!
(Granted I’m training my own adapters and measuring behavior, not tool calling exclusive.)
Monkey_1505@reddit
I don't think most LLM users use agents.
DrMissingNo@reddit
Not my experience, using lm studio, gemma has never failed to use my MCPs.
Voxandr@reddit (OP)
Thats good to know, now it cannot even run a grerp command properly with Cline.
Have you tried agentic coding?
DrMissingNo@reddit
Ouch...
Not yet. Haven't found the time unfortunately.
egomarker@reddit
Skill issue. Debug tool call problems yourself and update your agentic tools. If you are a serious user.
Voxandr@reddit (OP)
Why stop here? We should write own infrance engine from scratch.
MaxKruse96@reddit
have you considered, i dont know, that cline isnt optimal for small LLMs?
henk717@reddit
Roo works rather well with Qwen3.5-27B for me
Voxandr@reddit (OP)
What do you mean? Clien work amazingly well with Qwen3.5 even 9B , and Qwen3 Coder next.
RetiredApostle@reddit
Google's endpoint can use tools in OC.
somerussianbear@reddit
Working on oMLX. My issue now is thinking loops. It starts to hallucinate and repeat itself like Gemini in recent memes.
nickm_27@reddit
There are plenty of use cases for tool calling other than coding.
For voice assistant use case Qwen3.5 was quite disappointing in my thorough testing, often narrating tool calls instead of actually calling the tool. It also didn't follow some of the more complex instructions for behavior correctly. Qwen3 instruct was actually better at this than Qwen3.5. Gemma4 has been great though, perfectly following the instructions and having no issues calling the tools (after the specialized parser fix 4 days ago).
Voxandr@reddit (OP)
Now it cannot even call the tools , what the points of the usecases.
nickm_27@reddit
Again, with llama.cpp it’s calling tools all day as a voice assistant with no problems. Just because it doesn’t work in code editor doesn’t mean it doesn’t work elsewhere.
I’m referring to https://github.com/ggml-org/llama.cpp/pull/21418
Voxandr@reddit (OP)
Hmm , that was merged 4 days ago so it should work well for me. My version is just a few hr ago.
Danmoreng@reddit
What inference engine?
Voxandr@reddit (OP)
llamacpp , latest as of a few hrs ago.
AstraMythos@reddit
Tool calling hiccups like this in models show why tight oversight is key for agent safety don't skip runtime checks before rolling them out. What's the exact failure mode you're seeing with Gemma4?
Voxandr@reddit (OP)
I don't even know where it start because it won't even start, just fail all day around.
- cannot even call tools with proper parameters. tags corrupts it.
- E4B can call tools but output of Clien
Therefore the model may be smart but normal users but not tuned for agentic tool calls at all.
Such a shame.