How to parse Tool calls in llama.cpp?

Posted by sZebby@reddit | LocalLLaMA | View on Reddit | 10 comments

Most of my code is similar to agent-cpp from Mozilla. I create common_chat_templates_inputs Inputs from message history.

auto params = common_chat_templates_apply(templs_, inputs);

...tokenize and Generation works fine but when I try to parse tool calls with:

common_chat_parser_params p_params= common_chat_parser_params(params);

common_msg msg = common_chat_parse(response, false, p_params)

there are no tool_calls in the msg and it adds the assistant Generation prompt to the content.

so msg.content looks like this:

<|im_start...........

I expected that tool calls would be populated.

currently using granite-4.0-h-micro-Q4_K_S and the latest llama.cpp.

is my way of generating wrong? or any suggestions would be highly appreciated. thanks :)

content "<|start\_of\_role|>assistant<|end\_of\_role|><tool\_call>\\n{"name": "test\_tool", "arguments": {"an\_int": 42, "a\_float": 3.14, "a\_string": "Hello, world!", "a\_bool": true}}\\n</tool\_call>" std::string content\_parts <0 items> std::vector<common\_chat\_msg\_content\_part, std::allocator<common\_chat\_msg\_content\_part> > reasoning\_content "" std::string role "assistant" std::string tool\_call\_id "" std::string tool\_calls <0 items> std::vector<common\_chat\_tool\_call, std::allocator<common\_chat\_tool\_call> > tool\_name "" std::string

[-]

iits-Shaz@reddit

Two things to check:

1. It's likely a template support issue with granite. common_chat_parse relies on the chat template's parser knowing how to extract tool calls for that specific model format. Not every model's template is fully supported by llama.cpp's built-in parsers yet. Try the same code with a model that has known working tool call support — Qwen 2.5, Llama 3.x, or Gemma — to confirm your code is correct and isolate whether it's a granite-specific parsing gap.

2. Fallback: parse the raw text yourself. If common_chat_parse doesn't handle your model's format, you can extract tool calls from the raw text. Since granite wraps them in <tool_call>...</tool_call> tags, you can:

Strip everything outside the <tool_call> tags
Parse the JSON inside
Build your tool call message manually

Something like (pseudocode):

// Find content between <tool_call> and </tool_call>
// Parse the JSON block inside
// That gives you {"name": "...", "arguments": {...}}

I ran into the same thing building a tool-calling agent on top of llama.cpp (via llama.rn for React Native). The native parser worked great for Gemma 4 but I still needed a fallback that scans raw text for JSON blocks using brace-depth tracking — some models just emit tool calls in slightly different formats than what the built-in parser expects.

The brace-depth approach is simple: walk the string character by character, track { and } depth, extract complete JSON objects, then check if they match the tool call shape you expect. Handles nested JSON correctly and doesn't care about the surrounding markup.

Check the tool call support table in llama.cpp's docs — it lists which models have verified parser support.

[-]

EffectiveCeilingFan@reddit

Hey OP, this guy is an AI bot and is completely wrong

[-]

iits-Shaz@reddit

Not a bot. I build with llama.cpp daily — shipped an open source React Native SDK that wraps it for on-device inference. Check my GitHub if you need proof: github.com/shashankg-dev404/react-native-gemma-agent

My comment addressed OP's actual problem: common_chat_parse not extracting tool calls, with the raw <tool_call> tags ending up in msg.content — that's literally what OP described in the post. I suggested testing with a model that has confirmed parser support to isolate whether it's a granite-specific issue, and offered a fallback approach for parsing raw text. Both are valid.

You linked the Granite 4 model card — cool. The model card describes the chat template format, but OP's issue isn't the format itself. It's that common_chat_parse in llama.cpp doesn't correctly extract tool calls for granite's template. The model IS generating tool calls (OP can see them in the raw content), the parser just isn't picking them up. That's a llama.cpp parser support gap, which is exactly what I said.

Calling someone a bot because their answer is detailed is a weird move on a technical sub.

[-]

EffectiveCeilingFan@reddit

Once again completely wrong. OP was using the wrong format for tool calling. The source for the correct format was the model card. Your answer was detailed in the worst way. It was entirely AI generated, so all the “detail” was useless noise. There is no problem with the llama.cpp tool parser. I just tested it with the correct format, and it works perfectly fine.

[-]

sZebby@reddit (OP)

yes i rembered incorrectly. updated the post sorry.
i didnt specify any format. i kinda hoped it would pick up the correct format for chat parsing with:

auto params = common_chat_templates_apply(targetLLM_.templs_.get(), inputs)

...generated response

common_chat_parser_params p_params = common_chat_parser_params(params);

p_params.debug = true;

auto msg = common_chat_parse(response, false, p_params);

at least the inputs parsing is working with the correct format. i also find "" and "" in preserved tokens. is it the correct way to init the chat_parser_params with this?

i tired to look in server and simple_chat examples but couldnt pinpoint my mistake.