Gemma4 , all variants fails in Tool Calling

[-]

Force88@reddit

Same with ollama (well I only know how to use ollama lol), it can't search the internet either will ollama windows app or openwebui...

[-]

Relative-Republic-27@reddit

for me tool calling was working on continue with ollama. llama.cpp gives me the issue

[-]

Voxandr@reddit (OP)

looks like all kind of tools calls are broken , i haven't seen any post about working coding agent tool calls here.

[-]

Chupa-Skrull@reddit

Do you think it's common or normal to post "hey everybody, tool calls are working for me, just wanted to let you know. Have a great day."

[-]

Monad_Maya@reddit

Works ok with VSCodium + Roocode (3.51.1) and llama.cpp b8665.

Model is Gemma 4 26B A4B, IQ4_XS from Unsloth.

[-]

Voxandr@reddit (OP)

I am trying with VLLM , even with VLLM it fails hard.

[-]

aldegr@reddit

vLLM still requires a few fixes: https://github.com/vllm-project/vllm/pull/39027

[-]

Llama.cpp has a custom template in its repo that helps with agentic flows. It’s very similar to the vLLM changes in this PR. models/templates/google-gemma-4-31B-it-interleaved.jinja. It does require an agent that properly sends back reasoning, such as OpenCode or Pi. Unsure how the VSCode agents work nowadays.

In short, the original templates were hamstrung for agents.

[-]

Voxandr@reddit (OP)

I am gonna run with it and report.

[-]

ivandagiant@reddit

Update?

[-]

Voxandr@reddit (OP)

Now with Cline it can go on for like 4 steps before starting to give tool calls errors, while Qwen models have no problem while I let them build and run for whole day.

[-]

yoracale@reddit

Have you tried Gemma 4 toolcalling via Unsloth Studio? It works even for Gemma 4B 4-bit

Processing img bxh3moiicztg1...

Here's an example of Gemma 4 4B 4bit executing code: https://x.com/i/status/2040161518898319728

[-]

lenne0816@reddit

that works for me too but after 15k context + all hell breaks loose and it starts hallucinating without ever making sense again until chat reset.

[-]

yoracale@reddit

Oh ok interesting do you have an example so we can debug? Thanks for trying it out btw!

[-]

lenne0816@reddit

I retry a very basic workflow again and again, explore a remote server via ssh, inventorize its services in a gethomepage style yaml and then merge that yaml with my actual homepage yaml. I can never pass the stage of merging it always collapses around there.

[-]

Voxandr@reddit (OP)

I will give a try.

[-]

Lorian0x7@reddit

Things are not properly implemented yet, why don't you help resolve the issue instead of just complaining.

[-]

Voxandr@reddit (OP)

So i don't have the right to complain what doesn't work? With that mentality all the software we use will be full of bugs coz everybody busy ass licking the developers. I am asking from others who parsing Gemma4 like return of Christ where there are much better working models exist so I am checking whats wrong with me.

[-]

kataryna91@reddit

You claimed "Folks who praising Gemma4 above Qwen 3.5 are not serious users." when the actual problem is that you have no clue what you are doing, while people who know what they are doing (or "not serious users" as you call it) have 100.00% successful tool call rate.

[-]

Voxandr@reddit (OP)

Can you give me how and where are that claims of 100% working ? I really want to us Gemma 4 on agentic uses , its perfect if it is intelligent , i prefer american model over chinese model running 24/7 taking control of the machine over chinese model but -- its inevitable truth that its is not tuned for it .

[-]

Lorian0x7@reddit

Yeah, exactly, and I'm telling you what's wrong with you. Coming here with this entitled attitude saying everyone is not serious because Gemma doesn't fit your specific usage it's a little pretentious, isn't it?. We all know there are still issues with the model and tool calling is not its strength. It's like complaining about an alpha version of something that doesn't work correctly.

[-]

Voxandr@reddit (OP)

I am frustrated that that Gemma 4 doesn't work at all on agentic tool uses, yet all the post are bootlicking Gemma 4 where there are a lot better model exists. Whats wrong with it?
And i want to check whats wrong with my setup thats why i posted it.

I don't consider who casually chat with a chatbot a serious user. More Serious uses cases are the one who use it for Coding , Agentic work flow , or ERP .

[-]

CommonPurpose1969@reddit

I spent the whole weekend changing a project to make it work with Gemma 4 E2B and E4B. And it is subpar compared to Qwen 3.5 4B, and I am not trying to shill for Qwen here. I really wanted Gemma 4 to work out, but it hasn't. I understand your frustration.

Gemma 4 is also very sensitive to the signature of tool calls, among other things. It generates to then write the answer. Other times it says it is going to call a tool, but it won't. It waits to be prompted to do so. Then it runs in circles, generating the same token sequence.

[-]

Lorian0x7@reddit

I didn't see any post bootlicking gemma4, anyone is saying qwen3.5 is superior for coding while Gemma is superior for RP

[-]

Express_Quail_1493@reddit

I low-key feel like it has a lot to do with the Security guardrails Google added. When im Reading the model reasoning tag. is like watching and anxious rabbit who treats everything pieces of code like risk management ritual.

[-]

qubridInc@reddit

Exactly, if a model can’t reliably handle tool calling, it’s not agent-ready no matter how good it looks in one-shot demos.

[-]

a_beautiful_rhind@reddit

You may want to test VLLM. llama.cpp support isn't 100% yet.

[-]

Voxandr@reddit (OP)

is that Llamacpp problem? i had synced to latest llamacpp so far

[-]

a_beautiful_rhind@reddit

yes, the model is pretty different from past ones and it has been slowly getting better.

[-]

Voxandr@reddit (OP)

vllm version 0.19.0

inference-1 | (APIServer pid=1) Value error, The checkpoint you are trying to load has model type `gemma4` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Using docker.

[-]

a_beautiful_rhind@reddit

did you not upgrade to transformers 5.5?

[-]

Voxandr@reddit (OP)

so their docker have no transfer 5.5 i guess . gonna try installing directly

[-]

sisyphus-cycle@reddit

They have a Gemma specific docker image btw

[-]

Voxandr@reddit (OP)

trying that , with awq-4bit , i have 2x 4070TI-Super on this desktop (total 32GB VRAM) 4 bit should easily fit but OOMing. Any specific VLLM configs? I am using :

vllm serve --model "cyankiwi/gemma-4-26B-A4B-it-AWQ-4bit" -tp 2 --port 4000 --gpu-memory-utilization 0.94 --kv-cache-dtype fp8_e4m3 --max-model-len 20000 -ep 2

[-]

Voxandr@reddit (OP)

I got it working on Strixhalo but resutls are disaster. It cannot even do proper grep search. Gotta wait a while i guess.

[-]

sisyphus-cycle@reddit

Hm yeah idk, I can only run it at work where we have an A100. It works fine for us, but using fp16 safetensors so that could be why idk

[-]

Vardermir@reddit

I’ve been running Gemma 4 (Nvidia NVFP4 and QuantTrio’s) myself, but I have a weird issue where it never reasons. Using the recommended settings that nvidia provides for both. Is this something you’ve come across? Or still an implementation issue?

[-]

a_slay_nub@reddit

I've seen a lot of pull requests in vLLM post 0.19.0, I'm waiting a few weeks before bothering tbh

[-]

Voxandr@reddit (OP)

yeah looks like i need to come back after a few week - gonna stick to Qwen 3.5 122B for planning Qwen Next Coder for Coding for now. I tried vLLM and even grepping fails.

[-]

DinoAmino@reddit

I couldn't get an fp8 Gemma 4 31B to run in 0.19.0. I could only run it using the gemma4 labeled docker image, branched from 0.18.2. Even then endless tool looping sometimes occurred. Almost there, but not quite.

[-]

a_beautiful_rhind@reddit

Yea these models are taking a while everywhere.

[-]

a_slay_nub@reddit

I mean, these things take time and the errors are usually very subtle and tricky. With how tuned these models are, a single errant space can cause issues with a prompt template nowadays.

In the meantime, I choose to be grateful for what we have. GPT-OSS still works excellent. On release, it was shite too and took a while to get the kinks worked out. We just fixed the tool calling on our version because we had an older version of the tokenizer.

[-]

a_beautiful_rhind@reddit

I am already able to use it with chatML and other templates it saw. The current errors haven't been subtle for me, just intermittent.

Stuff like this: https://i.ibb.co/CpKLp28H/31b-miku.png

[-]

send-moobs-pls@reddit

There's always been a weird amount of Google "fans"

[-]

FullstackSensei@reddit

I don't think anybody claimed llama.cpp support for Gemma 4 is/was done.

People keep testing the same broken thing, and reporting the same issue every day.

[-]

Voxandr@reddit (OP)

They are mindleslly paraising for normal chatbot functions then.

[-]

ContextLengthMatters@reddit

I am using Gemma in oMlx and can hit tool calls each time. My problem with Gemma isn't its ability to do tool calls, it's just straight up refusing to consider it because the reasoning isn't as in depth.

I will say I'm not on the Gemma hype train because I have enough ram for a 120b moe and qwen3.5 delivers. My own use cases seem to be handled by qwen better when it comes to agentic stuff. Maybe if Gemma released a larger MoE that would change.

[-]

Voxandr@reddit (OP)

Yeah i am going back to 122b MOE and 3.5 for now

[-]

FullstackSensei@reddit

No, just good old lack of reading comprehension

[-]

ATK_DEC_SUS_REL@reddit

I’m fortunate to have access to an H200 for experimenting with Gemma 4-31b. I’m using manual generation loops and I’m very happy with Gemma. You guys are going to love it when llama.cpp is stable!

(Granted I’m training my own adapters and measuring behavior, not tool calling exclusive.)

[-]

Monkey_1505@reddit

I don't think most LLM users use agents.

[-]

DrMissingNo@reddit

Not my experience, using lm studio, gemma has never failed to use my MCPs.

[-]

Voxandr@reddit (OP)

Thats good to know, now it cannot even run a grerp command properly with Cline.
Have you tried agentic coding?

[-]

DrMissingNo@reddit

Ouch...

Not yet. Haven't found the time unfortunately.

[-]

egomarker@reddit

Skill issue. Debug tool call problems yourself and update your agentic tools. If you are a serious user.

[-]

Voxandr@reddit (OP)

Why stop here? We should write own infrance engine from scratch.

[-]

MaxKruse96@reddit

have you considered, i dont know, that cline isnt optimal for small LLMs?

[-]

henk717@reddit

Roo works rather well with Qwen3.5-27B for me

[-]

Voxandr@reddit (OP)

What do you mean? Clien work amazingly well with Qwen3.5 even 9B , and Qwen3 Coder next.

[-]

RetiredApostle@reddit

Google's endpoint can use tools in OC.

[-]

somerussianbear@reddit

Working on oMLX. My issue now is thinking loops. It starts to hallucinate and repeat itself like Gemini in recent memes.

[-]

nickm_27@reddit

There are plenty of use cases for tool calling other than coding.

For voice assistant use case Qwen3.5 was quite disappointing in my thorough testing, often narrating tool calls instead of actually calling the tool. It also didn't follow some of the more complex instructions for behavior correctly. Qwen3 instruct was actually better at this than Qwen3.5. Gemma4 has been great though, perfectly following the instructions and having no issues calling the tools (after the specialized parser fix 4 days ago).

[-]

Voxandr@reddit (OP)

Now it cannot even call the tools , what the points of the usecases.

[-]

nickm_27@reddit

Again, with llama.cpp it’s calling tools all day as a voice assistant with no problems. Just because it doesn’t work in code editor doesn’t mean it doesn’t work elsewhere.

I’m referring to https://github.com/ggml-org/llama.cpp/pull/21418

[-]

Voxandr@reddit (OP)

Hmm , that was merged 4 days ago so it should work well for me. My version is just a few hr ago.

[-]

Danmoreng@reddit

What inference engine?

[-]

Voxandr@reddit (OP)

llamacpp , latest as of a few hrs ago.

[-]

AstraMythos@reddit

Tool calling hiccups like this in models show why tight oversight is key for agent safety don't skip runtime checks before rolling them out. What's the exact failure mode you're seeing with Gemma4?

[-]

Voxandr@reddit (OP)

I don't even know where it start because it won't even start, just fail all day around.

- cannot even call tools with proper parameters.
- E4B can call tools but output of Clien tags corrupts it.

Therefore the model may be smart but normal users but not tuned for agentic tool calls at all.
Such a shame.