How to parse Tool calls in llama.cpp?
Posted by sZebby@reddit | LocalLLaMA | View on Reddit | 10 comments
Most of my code is similar to agent-cpp from Mozilla. I create common_chat_templates_inputs Inputs from message history.
auto params = common_chat_templates_apply(templs_, inputs);
...tokenize and Generation works fine but when I try to parse tool calls with:
common_chat_parser_params p_params= common_chat_parser_params(params);
common_msg msg = common_chat_parse(response, false, p_params)
there are no tool_calls in the msg and it adds the assistant Generation prompt to the content.
so msg.content looks like this:
<|im_start......
I expected that tool calls would be populated.
currently using granite-4.0-h-micro-Q4_K_S and the latest llama.cpp.
is my way of generating wrong? or any suggestions would be highly appreciated. thanks :)
EffectiveCeilingFan@reddit
That’s not the format that Granite 4 uses. I recommend reading the model card https://huggingface.co/ibm-granite/granite-4.0-micro
sZebby@reddit (OP)
yes i remembered incorrectly. now this is the raw response. also edited post sorry:
std::string response:
"
{"name": "test_tool", "arguments": {"an_int": 42, "a_float": 3.14, "a_string": "Hello, world!", "a_bool": true}}
"
and this is the common chat msg i get from common_chat_parse:
msg common_chat_msg
EffectiveCeilingFan@reddit
I'm not super familiar with the llama.cpp code, so unfortunately I won't be of much help here. Reddit is also messing with the formatting so it's really difficult to read. I'd recommend opening an issue with llama.cpp with a full, reproduce-able example, since it's hard to gauge the full story with just snippets. You could just be doing something wrong elsewhere.
sZebby@reddit (OP)
Alright thanks. I don't think it's an issue with llama.cpp I just wanted to pinpoint my mistake but I hardly find any docs or examples.
putrasherni@reddit
same boat as you, can't get to make gemma work properly with tool calling
iits-Shaz@reddit
Two things to check:
1. It's likely a template support issue with granite.
common_chat_parserelies on the chat template's parser knowing how to extract tool calls for that specific model format. Not every model's template is fully supported by llama.cpp's built-in parsers yet. Try the same code with a model that has known working tool call support — Qwen 2.5, Llama 3.x, or Gemma — to confirm your code is correct and isolate whether it's a granite-specific parsing gap.2. Fallback: parse the raw text yourself. If
common_chat_parsedoesn't handle your model's format, you can extract tool calls from the raw text. Since granite wraps them in<tool_call>...</tool_call>tags, you can:<tool_call>tagsSomething like (pseudocode):
I ran into the same thing building a tool-calling agent on top of llama.cpp (via llama.rn for React Native). The native parser worked great for Gemma 4 but I still needed a fallback that scans raw text for JSON blocks using brace-depth tracking — some models just emit tool calls in slightly different formats than what the built-in parser expects.
The brace-depth approach is simple: walk the string character by character, track
{and}depth, extract complete JSON objects, then check if they match the tool call shape you expect. Handles nested JSON correctly and doesn't care about the surrounding markup.Check the tool call support table in llama.cpp's docs — it lists which models have verified parser support.
EffectiveCeilingFan@reddit
Hey OP, this guy is an AI bot and is completely wrong
iits-Shaz@reddit
Not a bot. I build with llama.cpp daily — shipped an open source React Native SDK that wraps it for on-device inference. Check my GitHub if you need proof: github.com/shashankg-dev404/react-native-gemma-agent
My comment addressed OP's actual problem:
common_chat_parsenot extracting tool calls, with the raw<tool_call>tags ending up inmsg.content— that's literally what OP described in the post. I suggested testing with a model that has confirmed parser support to isolate whether it's a granite-specific issue, and offered a fallback approach for parsing raw text. Both are valid.You linked the Granite 4 model card — cool. The model card describes the chat template format, but OP's issue isn't the format itself. It's that
common_chat_parsein llama.cpp doesn't correctly extract tool calls for granite's template. The model IS generating tool calls (OP can see them in the raw content), the parser just isn't picking them up. That's a llama.cpp parser support gap, which is exactly what I said.Calling someone a bot because their answer is detailed is a weird move on a technical sub.
EffectiveCeilingFan@reddit
Once again completely wrong. OP was using the wrong format for tool calling. The source for the correct format was the model card. Your answer was detailed in the worst way. It was entirely AI generated, so all the “detail” was useless noise. There is no problem with the llama.cpp tool parser. I just tested it with the correct format, and it works perfectly fine.
sZebby@reddit (OP)
yes i rembered incorrectly. updated the post sorry.
i didnt specify any format. i kinda hoped it would pick up the correct format for chat parsing with:
auto params = common_chat_templates_apply(targetLLM_.templs_.get(), inputs)
...generated response
common_chat_parser_params p_params = common_chat_parser_params(params);
p_params.debug = true;
auto msg = common_chat_parse(response, false, p_params);
at least the inputs parsing is working with the correct format. i also find "" and " " in preserved tokens. is it the correct way to init the chat_parser_params with this?
i tired to look in server and simple_chat examples but couldnt pinpoint my mistake.