Has anyone tried fine-tuning on framework-specific toolsets?

Posted by AnticitizenPrime@reddit | LocalLLaMA | View on Reddit | 1 comments

One setback of smaller local models seems to be their reliability in calling tools for the harness they're plugged into.

I personally tried out Gemma 4 with Hermes Agent, and Gemma kept ignoring Hermes' tools - for example, it kept trying to call the 'google-search' tool it was trained with instead of the web-search tool it was instructed to use.

I have never fine tuned and don't know much about it, but is this something that can be improved through fine tuning? Say, tuning the model specifically on Hermes tool calls.

Is this a proper use case for fine-tuning?

[-]

tonyboi76@reddit

Yes it can be improved through fine tuning, but I would try a few cheaper things first before paying for compute. Start with system prompt plus few-shot examples showing the exact tool name you want called. Sometimes the model just needs to see two or three correct usages before the bias from training data fades. Next try constrained generation with something like outlines or guidance, where you give it the exact JSON schema of allowed tool names and the model literally cannot emit anything else. The third try is a different base model. Qwen 2.5 and Llama 3.1 instruct variants are generally better at following tool schemas they have not seen in training, because Gemma was not trained heavily on agent harness tool calls and falls back to the search names from its broader training.

If those all fail, LoRA fine tuning on 500 to 1000 examples of your specific tool-call format is the right next step. Use the actual Hermes schema in the prompt template and the model learns to follow it. The risk to know about: catastrophic forgetting where the tuned model gets worse at unrelated tasks, so always benchmark on a general capabilities suite after.