Are you guys actually using local tool calling or is it a collective prank?

[-]

SNThrailkill@reddit

I find openweb UI to not be a great harness. However like others have said, I'm having much more success with it on opencode which is awesome for coding but not so much for personal tasks. Looking for something to handle that for me still.

[-]

Far-Low-4705@reddit

yeah... openwebui sucks.

there are issues with prompt duplication, prompt injection, forcing you to use rag (which for local models forces unloading model, loading embedding model, unloading embedding, reloading model (then full prompt reproccessing)), and prompt cache invalidation forcing full prompt reproccessing every 5 seconds.

and the frustrating thing is that none of these issues should exist, they are so simple. yet they do exist, and they consistently have these problems THAT PERSIST FOR SEVERAL UPDATES...

like how do you struggle with NOT duplicating a prompt 10 times???

Also their entire llm backend relies on langchain...

[-]

cdshift@reddit

I was a huge openwebui user for like 2 years. It was great as a chat interface, once I started using opencode I completely switched over

[-]

cviperr33@reddit

wait till you discover Hermes and linux

[-]

cdshift@reddit

I use opencode web on a Linux box right now. Its not as simple as text but its close because I can still edit the apps I built from my phone

[-]

FatheredPuma81@reddit

I keep seeing people say to use Hermes (and no longer seem to recommend OpenClaw) but few people actually say why its so good...

[-]

cviperr33@reddit

Well man i legit live in the "feature" , i never bothered tried actually going to linux because of the steep learning curve. But linux is like built for those agents , it literally unlocks the full potentional. And because the agent is soo fast , everything that i would do manually , i just do it with my agent.

Here is an example , im chatting with the agent from discord , and we do some bench marking tests , then i decide that i want to save those in a DB , so i tell him to install postresql and create a db and everything and put these results we got there so i can later retrive them instead of storing 100 files in 100 folders. In just under 15 seconds , the agents installs it via pip , creates the db configures it creates the schema , everything instantly.

I basically control my OS with just my text , i could have a TTS hooked up too so its like in the hacker movies but its legit real and usable , if its runs at 100+ tk/s everything happens instantly.

I no longer read guides how to setup things , i just post the link in discord and i tell him install it and he does everything for me in under a minute.

You can also use it to delegate a opencode coding agent , and he likes supervises it , you just specify the project scope and requirments and everything is done automaticaly. Or when i encounter a bug with hermes , i just tell it , submit this issue we had to the hermes repo , and 5 seconds later its submitted with full details. It can control git in any way you want it to.

[-]

cyberdork@reddit

I basically control my OS with just my text , i could have a TTS hooked up

So instead of typing ls -l you say: 'Yeah, uh, so can you show me the contents of that folder, ok?"
The future is now!

[-]

FatheredPuma81@reddit

Okay NGL that sounds pretty insane. Sounds like it makes Linux go from unusable to actually usable for a Windows user lol.

[-]

cviperr33@reddit

Exactly lol and im so happy i made the switch , like linux is literally built for this 30 years ago :D
The funniest thing was when my agent figured out how to use my sudo password , with gemma 4 we always hit a wall when we want to edit like driver files or something deep , but with qwen it just asked me if i want to give it my password and he "echo'ed" the password in the cmd call and it somehow worked and he has like sudo privilges now :D he wrote it inside his memory so i never have to tell it again lol
Ofcourse this is a big security concern and everything i do is yolo , but im just enjoying the ride and i dont care if it fucked up my OS , i dont hold anything of value on my disks.

[-]

themule71@reddit

Well virtual machines exist, so do containers. Both are available out of the box in Linux. You can run your agent in a confined environment, and it may destroy that instead of your OS. Also you can instruct the agent to configure sudo so that you/it don't need a password. If you're comfortable with the agent using your password, make it one you've never used before and won't use again. Last thing you want is your agent accessing your email because he "guessed" the password by trying the stored one.

[-]

rakarsky@reddit

This is nightmare fuel.

[-]

FatheredPuma81@reddit

Here's a link to the docs [link] try and read it some time.

- Yours truly, the OpenWebUI dev.

P.S. You're banned from the Github.

If he was here and you used the same username on Github :).

[-]

Taenk@reddit

What do you like? Because I’m trying to resist „I’ll just code my own“

[-]

FatheredPuma81@reddit

If you're on llama.cpp(llama-server) then just use the built in GUI imo. Its only issue is poor MCP support. It even has the ultra rare feature of letting you edit or delete LLM responses. If you're on other platforms then idk ask Grok lol but don't bother with Msty or AnythingLLM they're both awful for their own reasons.

[-]

Far-Low-4705@reddit

honestly, thats where im leaning as well lol

Im not a front end dev, but id like to design my own actually functional back end, so i was thinking of designing a functional backend and having a local model create a simplistic light weight front end

[-]

IShitMyselfNow@reddit

Try hermes agent it's been great for me with even Qwen 3.5 4B

[-]

SNThrailkill@reddit

Yeah I'm hearing a lot of things about it. Still cautious after the quick adoption and security nightmare that was openclaw. Good to know it's doing well even with a small model!

[-]

IShitMyselfNow@reddit

You don't have to use it with a gateway (e.g. Telegram). You can just use it as a CLI tool if you want yknow?

[-]

SNThrailkill@reddit

Yeah the gateway isn't the issue. It's the security. Like if you look at the project it's got 3.6k PRs and 2k issues. Almost 2 weeks ago it was half that. If they merge one bad PR then that's openclaw 2.0. I love the promise and ideal and Nous seems legit, been reading some of their research. Just cautious.

[-]

candraa6@reddit

if you want to be cautious, just fork it to make the version "semi-frozen", and before running, clone it and ask AI to do security review check of the codebase. at least that's what I would do if I need to use untrusted opensource code these days. matter of fact, that's bare minimum of any opensource project these days, trusted or not, cause the supply chain attack also got away even in popular opensource projects

[-]

erisian2342@reddit

I haven’t tried it either and I share your security concerns. I wonder how easily it can be sandboxed. e.g. running in a non-privileged container, all tool calls routed through a proxy that enforces my policies, mounting only the specific directories on the file system that it’s allowed to have access to, disable direct internet access, etc. Unfortunately I know too little about agent setup and configuration to answer my own questions.

[-]

Dthen_@reddit

I was reading their docs the other day and pretty easily actually. There's a few different options for sandboxing but most interesting is you can make it run its commands in a docker container.

[-]

thirteen-bit@reddit

I've tried it once in podman using this docker guide, so entire app will be in the container: https://hermes-agent.nousresearch.com/docs/user-guide/docker

Not sure why the docker guide is not mentioned in main README.md

[-]

erisian2342@reddit

I’m setting up an Ubuntu VM to try it out. Instead of mounting the source folder directly, I’ll have it submit PRs to the repo instead. It looks like a lot of fun!

[-]

jazir55@reddit

https://github.com/cloudflare/moltworker

This seems viable

[-]

Watchguyraffle1@reddit

I switched. Hermes is a and does a fantastic bit of coding.

[-]

Borkato@reddit

Use opencode to vibe code one for personal tasks! It’s great :p

[-]

Cute_Obligation2944@reddit

I'm pretty convinced opencode can do anything...

[-]

joelanman@reddit

I found llama.cpp to work better than lm studio personally. It even has a web gui these days

[-]

Worried-Squirrel2023@reddit

honestly the tool calling experience depends way more on the harness than the model. I was having the exact same "it says it created the file but nothing exists" problem until I switched away from Open WebUI for coding tasks. opencode or something similar where the tool execution is tighter made a huge difference for me. the model isn't lying about creating the file, it just doesn't have a real tool that actually writes to disk in most setups.

also fwiw Qwen3.6 is solid but you gotta make sure reasoning mode isn't eating your tool call formatting. if your harness doesn't strip the think tags properly it can mess with the structured output.

[-]

r1str3tto@reddit

Are you using Ollama as your inference service? If so, stop. It’s garbage. It defaults to a stupid-small context window and silently clips the context. Second, in Open WebUI, you must enable “native” tool-calling. The default is old prompted tool-calling for some unknown reason. This doesn’t work.

I can assure you that the models you mentioned (except maybe OSS-20B, which I never found to be good) can use the OpenTerminal. But it’s more of a toy than a useful feature in my opinion, because there is no real scaffolding or context management.

[-]

WolpertingerRumo@reddit

Yeah. Two different ways, actually:

One is with „default tool calling“, and I wrote a script that gives exactly the tools I want for each scenario. A whole lot of work, but this is a customer facing bot, and correct tool calling is important. Keyword based, with a small LLM as a fallback for intent.

The other one is native tool calling. Works well if the local tools are well described, and the system prompt has a little bit of guidance.

[-]

Spirited_Chard5972@reddit

I used models as small as qwen 3.5 0.8B with opencode and it was working oki, it was using the write tool to always edit files for example but works, but 4B was editing, creating, fetching from the web, running scripts to test its work with minimal problems

[-]

usakarokujou@reddit

Wait what? 0.8? Podrias decirnos que estás haciendo con el, que resultados has tenido y como lo configuraste?

[-]

Taikatohtori@reddit

I don't know what people are on about openwebui being shit. A lot of it is down to your system prompt and skills. Qwen3.5-27b works great with open terminal for me, it can independently do a lot of stuff on linux. It might not be the best troubleshooter but it can for sure create files according to instructions etc.

[-]

CriticalCup6207@reddit

Running it in production, not a prank. The trick is you can't use it the same way you'd use GPT-4 tool calling. You need to:
(1) keep the tool schema simple — nested objects kill reliability,
(2) use grammar sampling where your inference stack supports it,
(3) add a validation layer that catches malformed calls before they hit your actual tool. With those guardrails, Qwen3.6 and Llama 3.3 are both reliable enough for real workflows. Without them, it's a disaster.

[-]

AI_Tonic@reddit

openweb ui is a ui , you need to actually program the tool use into the agent , so it can write edit files and also run them . it's more involved than just using webui

[-]

jacek2023@reddit

It works for sure with opencode

[-]

lqvz@reddit

And pi.dev as well.

[-]

jacek2023@reddit

I use roo code at this moment, also no issues with tools

[-]

Ok_Chipmunk_9167@reddit

I have trouble with roo sometimes. Especially on smaller models, gemma4-e4b, for instance. But still, it can work. It's just finicky sometimes.

Not that I recommend vibe coding with that. It simply will not work. It's fun to try, nevertheless haha

[-]

jazir55@reddit

I've had trouble with Roo tool calling for almost two years now. I think it's fundamentally a Roo problem and not the models at this point. Have you tried it with KiloCode? Curious to see how your luck would be there, Kilo is a Roo fork developed by a separate team.

[-]

jacek2023@reddit

I use it with 26B now

[-]

nikhilprasanth@reddit

Opencode has been working really well for me. Once in a while there will be some edit failure but the model will correct itself. Also with hermes agent the tool calling has been really good with Qwen models. Tried gpt oss 20B , while it is fast the failures are much more. I guess that's due to the tool calling format of the Gpt oss series .

[-]

valdocs_user@reddit

I've never gotten tool calling working with a local model. Same result as you: it claims it's done something but no file changes.

[-]

BlackMetalB8hoven@reddit

I'm using my own tool calling comfyui API to edit images in open web UI. Works fine

[-]

apVoyocpt@reddit

I normally use Claude code but for a test I used OpenCode and a local qwen code 3.5 27b and it worked really well. Set up a docker, installed flask and made a hello run flask page.

[-]

Awkward-Customer@reddit

the community keeps praising the tool calling feature just to cope

I haven't seen people "praising the tool calling feature". When i last looked at openwebui's tool calling most people agreed it was still pretty weak, partly due to the local models' own abilities. what sized quants are you working with?

to create a single file with data is very finicky when it works

What does this mean? what are you actually asking it to do? what's your prompt?

and I am not being vague

If your prompts are anything like this post, i suspect you are indeed being quite vague ;-).

[-]

Mayion@reddit (OP)

Telling a model to collect data from the net and compiling it into a text file is not vague. It is a single sentence. I don't know what's with local AI redditors just waiting on the chance to blame the user and never the tool/model.

No model is less than Q5K_M.

[-]

Awkward-Customer@reddit

If someone is struggling to build a fence, I'm not gonna blame the tools they're using. Either they're using the wrong tools (their fault) or they're using the tools wrong (also their fault). Why would I blame the tools in this case?

I'm curious about the single sentence you're using here, because "collecting data from the net and compiling it into a text file" sounds like a pretty vague request.

[-]

Big_Actuator3772@reddit

ya but we're not building fuckn fences are we..

[-]

APersonNamedBen@reddit

Telling a model to collect data from the net and compiling it into a text file is not vague. It is a single sentence.

They just said that. Don't dig a hole next to him.

[-]

Awkward-Customer@reddit

No one knows wtf OP is building. Clearly their LLM doesn't either.

[-]

Savantskie1@reddit

A single sentence isn't going to give the LLM all of the information it needs, it's definitely you problem and a prompting issue. LLMs don't have nearly the same intuition as you. It's a file on your computer. It gives you what you give it. I had this exact same problem until I realized I was expecting general intelligence out of a 30gb file. Get gud at prompting.

[-]

HopePupal@reddit

you didn't mention which quants you're using. running an aggressive quant can be an issue, especially with small models. and by aggressive i mean under Q6 or maybe Q5 if the model's very quantization tolerant.

never had a problem with Qwen 3.5 tool calling on OpenCode and llama.cpp, OpenCode and LM Studio's llama runtime, or just LM Studio. as of about a month ago i think the model, quants, llama.cpp, and LM Studio runtimes are all stable and debugged. you might check to see if your quants have been updated since you got them.

except i vaguely remember some problems with GPT-OSS having a weird tool call format, but i think modern versions of llama.cpp have fixed that?

[-]

Jeidoz@reddit

aggressive i mean under Q6 or maybe Q5

I have used only Q4 cuz LM Studio told it max size which can fully offload to my 24GB GPU. It works with tool well, but I always supposed that "aggresive" is lower than Q4. From most of benchmarks Q4-Q5 are alright and not degraded much (relativly to Q1-Q3).

[-]

Sufficient_Prune3897@reddit

Also depends on the model. In general, a dense 30b will suffer less than a 30b3a moe. Although, it still depends on the specific model and quant technique

[-]

FatheredPuma81@reddit

Pretty sure its the other way around isn't it? MoE's should be more tolerant to Quantization than Dense models? Larger models in general are more tolerant too.

[-]

HopePupal@reddit

if it's working for you, great, but besides tool calling issues, i also noticed frequent failures to finish tasks with Q4 quants of small models on the kind of work i'm doing with them. i'd classify Q4 as aggressive and sub-Q4 as "deep fried", at least for small models.

[-]

Jeidoz@reddit

Just for clarification: Define your vision of "small models". IMO 27-35b are already openning/at beginning mid-sized category.

[-]

HopePupal@reddit

my sloppy definition is "anything i can fit on a single dGPU in 2026"

[-]

yoracale@reddit

Actually most toolcalling issues is because of the tooling. Even 2bit Qwen3.6 toolcalling manages to work perfectly. I tried 1bit and that even works

[-]

HopePupal@reddit

oh, wild. which part of the tooling is likely to be the issue? the tool call parser, or something about the prompt?

[-]

yoracale@reddit

The tool calling parser

[-]

ImHiiiiiiiiit@reddit

Yeah, I'm sure of the math.

[-]

HopePupal@reddit

?

[-]

FineClassroom2085@reddit

Yup, you’re using chat harnesses instead of work harnesses. Use OpenCode or something equivalent.

[-]

Mayion@reddit (OP)

I am not really looking for a coding agent. Just an interface capable of scraping and manipulating data, and OWUI seems perfect for that using Docker. Using Fetch for example paired with Searxing, I can have it auto-post articles based on trending news. Or search for something and save it, and the list goes on.

But right now it is not behaving as expected for some reason.

[-]

funkyman228@reddit

Opencode can do all that, although I use pi agent nowadays. There are less coding agents and more terminal agents.

[-]

FineClassroom2085@reddit

Because your harness doesn’t have the tools you’re trying to use. Look for some MCP servers that can do what you’re trying to do and install them in your harness. Just because you aren’t coding doesn’t mean OpenCode isn’t a good harness for a few of those things (like internet search, file manipulation etc.)

[-]

Greedy-Bear6822@reddit

Check tool support by your GUI for the models that you're using.
Model saying that it did file manipulation, while it clearly didn't, means that model tool-calls were not parsed properly / not supported.
On occasion, models can produce different tagging , , etc.
So multi-format fallback parsing of tool calls is necessary, meanwhile being complex and unlikely to be supported by simple agent harnesses.

[-]

FatheredPuma81@reddit

Open WebUI w/ Terminal on Docker

I found your problem. OpenWebUI is a buggy mess and it won't get fixed I doubt its presenting the Terminal to the LLM properly. I've had numerous models use Windows Powershell and CMD commands without any issues in OpenCode though that kind of is an issue in itself because the model should be prioritizing tools but it is what it is..

[-]

FinBenton@reddit

Im using llama.cpp to host qwen and cline to infer it, does tool calling all day with no problems.

[-]

boutell@reddit

Have you shared exactly what you are doing in every detail?

It matters. I was mistakenly using a llama.cpp command line option that caused models to respond as if I had asked a random question. It was fun but not useful. I stopped using that option and they became a whole lot more useful.

Also, what is your hardware?

[-]

gurilagarden@reddit

I don't mean this to sound condescending, but if you want to do serious ai-assisted coding, use a serious ai-coding assistant. Opencode, Pi, Kilo, whatever. Use a real coding harness, openwebui ain't it.

[-]

DataPhreak@reddit

Have you even looked at the logs yet?

[-]

Confident_Ideal_5385@reddit

I've managed to get the qwen3.5 small models (9b and 4b) to successfully make tool calls, but that's in a very custom stack with grammar constrained sampling to enforce the schema after the model emits the token (which is a distinct token in Qwen's grammar.). The 27b (and the older 32b qwen3) "just work" even without the sampling constraints (although you obviously dont wanna use DRY or XTC while sampling tool calls.)

The 35B and 27B are both perfectly capable of calling tools in coding harnesses via an openai completions api endpoint from what I've seen, too (as insane as the openai api is, it doesn't get in the way too badly here).

FWIW i wasn't ever able to get tool calls to work in open webui. That's probably a me problem. Idk.

For Qwen, specifically, I'd suggest: - put a list of tools in the system message with a jsonschema for each tool's arg list. Even 4B-sized models can parse json pretty damn well. - detect the token and swap samplers to something that enforces pure JSON until you sample (make sure your json sampler still allows this token) - push <|im_end|> to the KV cache before starting the tool turn if you didn't wait for EOG before interrupting the assistant turn

I can't speak to Gemma or GPT-OSS, I'd assume this advice is broadly applicable although you'd need to adjust for the syntax the thing was trained on (json vs xml vs whatever) and the specific tokens (i guess not every vocab has dedicated tokens for this stuff, YMMV.)

[-]

Skelshy@reddit

Try something like Opencode. You need a tool to handle the tool calling. Possibly some mcp servers like filesystem and fetch.

[-]

CapeChill@reddit

Open web ui is great for the chat side. There better code options and they need to nail tool calls for smaller models. Qwen3 coder next at 80b was the first local model I got any consistency out of and everyone gets better. The code is the special par now it seems.

[-]

o0genesis0o@reddit

Something is wrong with your setup. Even the tiny ones like Gemma 4 e2b can reason and call tools reliably to get some tasks done with janky home cooked harness. The model generates tool call output, Llamacpp intercepts and parses and returns in OpenAI format, the harness execute tool call and send back llamacpp to feed into the model. No problem.

[-]

LienniTa@reddit

local tooling is good since gpt-oss-120b you were under the rock for more than half a year

[-]

different_tom@reddit

Work with it, tell it that it's wrong and should look for the file to confirm it exists. I've been able to overcome most issues like this just by working together with it and solving the problem with evidence.

[-]

mlhher@reddit

Usually the issue is (since you already went through multiple models), the quant you are using or the harness. For quants you should definitely try Q4_K_XL or bigger of whatever model you are using.

For the harness you have to understand that most (all) harness currently out there are dumb wrappers. They are made with the assumption that you feed them some big beefy cloud model.

I have been using Late ( https://github.com/mlhher/late ) and have never looked back since (yes I am the dev disclaimer).

It works so well, that I legitimately do not remember when the last time was it got a tool call wrong (if at all). I rarely even need to guide it I can just tell it a prompt and for the vast majority of tasks it surprisingly does not even need any guidance. All in 5GB VRAM (around 30t/s). From the feedback I have been hearing other peoples experience has been pretty much the same, including people telling me that the same model feels smarter with Late than with other harnesses; likely due to the way Late handles context and orchestration or rather other harnesses lack thereof.

But don't take my word for it and gladly try it out for yourself if you want to.
I use it with Qwen3.5-35B-A3B-Q4_K_XL for virtually all of my dev work.

[-]

yoracale@reddit

Actually mostly toolcalling issues are with the tooling. Even 2bit Qwen3.6 toolcalling managed to work perfectly. I tried 1bit and that even works

[-]

Mayion@reddit (OP)

Will make sure to look at it later. Appreciated.

I am running Q5K_M for my models with LM Studio and OWUI updated.

[-]

wombweed@reddit

In openwebui do you have native tool calling enabled? The difference was dramatic for me after I turned it on. As others have said I think opencode is better if you want to do terminal stuff. I’ve found Gemma 4 to be pretty ok on openwebui, qwen3.6 is generally higher quality for chat and code but for some reason seems to get more confused trying to run shell commands asynchronously in openwebui specifically, not sure why.

[-]

Mayion@reddit (OP)

Yes. Here is me trying Gemma again. Tool calling is set to native and so are the rest of the models. Sorry the image is too big, but if you zoom in, the html file is zero bytes and every file in the rest of the directories is also empty.

[-]

whichsideisup@reddit

It’s not parsing the Gemma output correctly from the look of it. Might be a chat template issue or similar.

[-]

logic_prevails@reddit

Yeah opencode works great for me

[-]

Unable-Lack5588@reddit

Having been through this whole, why does my SOTA FP8 model suck so bad, the magic was turning native tools on the admin->settings->model and setting the "Function Calling" to native

[-]

cershrna@reddit

Seconded, it didn't work well till I enabled native function calls but I agree with most people here that OWUI isn't a great harness for agentic work in general

[-]

No_Swimming6548@reddit

It's actually "function calling" in the controls and yes native must be selected. I tried qwen 3.6 with owui + open terminal and it indeed works perfectly. OP, the problem is configuration, not the model.

[-]

jon23d@reddit

I’m doing fully autonomous feature development. I assign it a ticket, and come back to a PR. All local.

[-]

trycatch1@reddit

You are certainly doing something wrong. Judging by symptoms - you set too tiny context, the memory about the tools got trimmed and the model doesn't know about the tools. Or it can be something similar. Either way, your models are not aware about your environment for some reason.

[-]

Flamenverfer@reddit

The only reason I still use gpty is the interstates web search. If anyone here has suggestions I’m all ears

[-]

ayylmaonade@reddit

I've been using local AI "seriously" since about Jan last year, and around april I got seriously into tool use. I use Open-WebUI as my main WebUI, and I can't say I've experienced these issues. I mainly use Qwen models, which have been near flawless. But everything from Gemma3+, Mistral Small 3+, Ministral 3, GLM, LFM to NVIDIA and Kimi have worked great for tool calling.

There's of course a spectrum of quality/ability, and models hallucinating tool calls will happen sometimes. Not to mention you're running Gemma, and Gemma 4 is extremely bad at actually deciding to call tools, but does well whenever it does use them. Make sure you've got models set to use "native function calling" in OpenWebUI if you haven't already.

Qwen3.5, 3.6 and GPT-OSS should be really good at tool-calling. Surprised you mentioned them, tbh.

[-]

Savantskie1@reddit

I solve the mcp not having good explanation by giving explanations in the system prompt in a model card in openwebui. I then assign a model to that model card.

[-]

Savantskie1@reddit

Have you made sure that open terminal is configured right with openwebui?

[-]

gwillen@reddit

Your harness (open webui) is likely misconfigured somehow. Hard to say how. It's also possible your model configuration is broken. The early downloads of gemma4 had issues; if you downloaded it on day one and never again, it probably has broken tool calling.

[-]

chibop1@reddit

It's the combination of engine, prompt template, configuration, and client.

I tried combination of Openclaw, Ollama, qwen3.5-27b, and it was able to sign up for an email account as well as sending me an email by itself using Chromeum browser.

I had assigned only 64K context, so it was able to compact and create memory files along the way to accomplish the task.

[-]

ionizing@reddit

yes. here is qwen3.5-122B-A10B enjoying shell usage:

[-]

ionizing@reddit

[-]

crossfitdood@reddit

I use it with my inventory app for my company. I use qwen 3.5 and I can ask for stock levels on a particular item or say my top 10 most used items this week or month and it will tell me and give me the link to those items etc. I’m currently working on getting it to be able to parse packing slips and add to the inventory automatically. Almost there. But yes they can tool call. Not anywhere near the same quality as subscription models but still usable.

[-]

vex_humanssucks@reddit

Good breakdown. The sweet spot for local tool calling right now seems to be structured output via constrained decoding rather than raw function calling -- models are more reliable when you give them a JSON schema to fill than when they're generating arbitrary function calls. Worth benchmarking both approaches if you haven't.

[-]

ravage382@reddit

Make sure you are using the correct sample settings in LM studio. Wrong settings will cause that type of hallucinating behavior. Also, if the terminal docker disconnects, it will either hallucinate what it was working on or spit everything back out into the chat UI. It happens more frequently if you have more than 1 window open, with the terminal connected.

I used my open webui and terminal setup for 3 actual days of work this week and it was amazing. I use llama.cpp though. You may consider giving that a go. Mine was nailing tool use with qwen 3.5 122b, qwen 3.6 did well today and gpt 120b is still doing ok, though i have to tell it repeatedly to use the terminal environment.

[-]

universesnm@reddit

qwen 3.6 35B with opencode and enable preserve thinking

[-]

StanPlayZ804@reddit

Maybe its your quantization? In Open WebUI with native tool calling enabled, I got Qwen 3.5 27B (my current go-to for agentic stuff and coding) to set up an Open WebUI instance in OpenTerminal all by itself with one simple prompt. It looked up the docs, tried to set it up, realized that Docker Daemon wasn't running, pivoted to python, and successfully got an instance up. One prompt. It is highly likely its your quantization or that you didn't have native tool calling enabled. I run all my models in BF16.

[-]

Legitimate-Dog5690@reddit

Very much liking Qwen Code CLI, I've been using it with a local 3.6 35b for a Claude Code light experience. It's more than happy to hunt through a big codebase, find bugs, suggest changes, loving it.

Should really try OpenCode.

[-]

Pleasant-Shallot-707@reddit

You’re just bad at this

[-]

FORNAX_460@reddit

Dude the issue is definitely with your harness, cause ive done that and much more in just lm studio with mcp tools! Although i mostly use opencode but sometimes when its not necessary i just use plain lm studio with mcp tools and it does get the job done most of the time.

[-]

samorollo@reddit

Maybe you are using cuda 13.2 with q4 quants? Apparently it's CUDA regression, should be fixed with 13.3

[-]

aldegr@reddit

OpenWebUI is awful for newer models. It does not handle reasoning as expected, i.e. it returns it back in <think>..</think> tags which only works for certain models. The expectation is to return it back in the reasoning_content field in the API. It also defaults to the "prompted" tool calling approach, not native tool calling, last I checked. It works fine for chat, poor for anything requiring tool calling.

[-]

dinerburgeryum@reddit

You can enable native tool calling in the model settings. Bone stick defaults however do default to old “prompted” tool calls.

[-]

aldegr@reddit

True, and that helps to an extent. The main issue is that reasoning traces are not handled properly. This is important for agentic sessions, since those traces are kept within the conversation as the model reasons between tool calls. Important for gpt-oss, minimax m2+, kimi k-2, pretty much every new reasoning model that came out since mid 2025.

[-]

dinerburgeryum@reddit

Oh yeah I was talking exclusively about tool calling but you’re also correct about reasoning traces. I migrated away from OWUI some time ago but I’m not surprised it’s still janky.

[-]

createthiscom@reddit

I’ve seen multiple reports that tool calling doesn’t work very well with Gemma 4. Tool calling works great in many other models.

[-]

StardockEngineer@reddit

My coding agent has made 984 tool calls just this morning with Qwen 3.6 35B

cat * | rg 'toolName":\s*"([^"]+)"' -o | sort | uniq -c | sort -rn                                                                                           [0]
 515 toolName":"bash"
 206 toolName":"read"
 163 toolName":"edit"
  30 toolName":"write"
  27 toolName":"task_create"
  11 toolName":"read_inbox"
   5 toolName":"spawn_teammate"
   5 toolName":"send_message"
   5 toolName":"process_shutdown_approved"
   3 toolName":"task_update"
   3 toolName":"task_list"
   2 toolName":"team_shutdown"
   2 toolName":"team_create"
   2 toolName":"mcp"
   2 toolName":"list_teammates"
   1 toolName":"lsp"
   1 toolName":"clear_tasks"
   1 toolName":"broadcast_message"

Thinking it might be you.

[-]

LumbarJam@reddit

Pi.dev + Qwen3.6 35B Q8 is working very well with preserve_thinking=true. Please make sure preserve_thinking=true is enabled—it makes a big difference. I’m seeing very few tool-call mistakes, and when they happen it usually self-corrects.

[-]

PathIntelligent7082@reddit

mybe you actually just don't know where's the working dir where files have been written

[-]

H_DANILO@reddit

You must be doing something wrong, opencode has been working wonders for me

[-]

Mayion@reddit (OP)

I really can't think of anything else I might be doing wrong. Q5, latest Unsloth, updated everything and still not working - so either the models are faulty or Open WebUI is.

Will give OpenCode a go though. Does it also have a terminal sort of thing similar to OWUI's? E.g. asking it to create a file, search the web etc.

[-]

H_DANILO@reddit

I believe the websearch capability is off by default because it has increased security risk(it might end up on a website that has a poisoning prompt and that might turn its own execution into malicious)

[-]

deejeycris@reddit

Nah it should work especially if you nudge the model amd you didnt get one of the heavily quantized ones which arent great at tool calling.

[-]

HumbleTech905@reddit

Just curious, what is your hardware setup ?

[-]

iChrist@reddit

Works for me with OpenWebui, model can use the terminal to execute many different commands. tried yt-dlp, editing images, creating gifs, all work with Qwen3.5-27B

Have you set tool calling to enabled in the model settings? context should also be at least 32k-64k and not like 8k.

I use llama cpp directly, so maybe something with LM Studio + OpenWebui could potentially cause issues.

Its not trolling, local models can do wonderful things with tools.

[-]

fragment_me@reddit

I just use vs code Kilo Vode version 5. Even if I’m not coding I still really like using kilo code or similar extensions. They just work really well for working with files.

[-]

sword-in-stone@reddit

with Hermes or Claude code, qwen 3.5 27b or qwen 3.5 35b work like butter

[-]

1ncehost@reddit

Also an opencode user here, and having a good time with tools.

[-]

Several_Industry_754@reddit

I run Claude CLI against local. Works like a dream.

[-]

havnar-@reddit

Use pi and add the “official” addons

[-]

ProfessionalSpend589@reddit

running Open WebUI w/ Terminal

These are not the agents you’re looking for… Move along!

Joking aside, I’ve tried only opencode until now and it generates code and specifications and todo and implementation instructions. Also follows instructions to add logging and other parts.

Whatever it’s building it’s not working at all, but creating files works great!

[-]

Aizen_keikaku@reddit

Harness matters. I’ve had bad experiences with long contexts on Roo Code & Continue.dev.

https://pi.dev/ has been excellent tho. In my experience Gemma 4 is poor in general with tool use

[-]

Waarheid@reddit

https://pi.dev is the goat, super small system prompt and very extensible. I run it in a container while experimenting though since it by default never asks for permission

[-]

IONaut@reddit

I've been using kilo code extension on VS code with Qwen3.5 27B and Qwen 3.6 and it uses tools flawlessly.

[-]

OkFly3388@reddit

Even with qwen3.5 35b I managed to have custom working agentic pipeline, running 4b quants on my rtx4090 with full context. And also roo code\cline extensions works perfectly inside vscode. There are questions about models being not that smart to do long tasks or something. But they at least tried, and editing\creating tens files while refactoring codebase is kinda regular things for me.

IDK, you doing something wrong.

[-]

charmander_cha@reddit

Olha, eu uso opencode com modelos locais e ele definitivamente faz coisas para mim

[-]

robogame_dev@reddit

OP post a link to the tools, a screenshot of the OWUI agent’s chat box showing the tools enabled, and a screenshot of it’s thinking process as it uses the tool.

I use OWUI and the same models as you and they work fine for me, so I’m sure it’s a configuration issue.

[-]

Elegant_Tech@reddit

Connect the mcp's to lm studio and try again just in the chat window. Qwen3.6 can run it's own agent loop and do all the work without opencode or an IDE. At least that would help remove the webui variable and help isolate if a quant issue.

[-]

dsartori@reddit

OpenWebUI is the weak link in your chain. Try using another product and see if tool calling improves. I use Cline in VSCode and it works great with all the local models.

[-]

robogame_dev@reddit

Sure OWUI tool calling might not be optimal, but it works fine for me for all the use cases that so is having issues with on identical models … but it is not the cause of OP’s problems - they clearly have either misconfigured it or got bad tools.

[-]

Eyelbee@reddit

I am also yet to find a no-nonsense toolcalling workflow for local llm use. I am picky when it comes to workflows so I hate using stuff like openwebui and lm studio for several reasons. I use barebones llama.cpp with my own launcher and its own web ui is not good for tool calls. Only local tool call I use is when I'm using Roo Code. That has its own harness which seems to work nicely with both qwen and gemma dense models.

[-]

rvistro@reddit

Try using roo code or opencode.

[-]

Thrumpwart@reddit

Roo is fantastic with 3.6.

[-]

eugene20@reddit

Kilo code v5 in VS Code worked great for me connecting to models running with LM Studio.
But chatting to the same model with the chat in LM Studio and it would just pretend to write files, I didn't get round to digging into what was the missing link to get that to work as through the IDE was what I needed anyway.

Kilo v5 is based on roo code, I don't like Kilo v7 which is based on opencode they have a lot to fix.

[-]

film_man_84@reddit

I have been using now couple of days Gemma 4 E4B with tool calling using Pi agentic framework. So far I have been able to create simple website monitor what checks if site is up or down and it reads sites and what text there should be from text file. This script was purely coded with Pi + Gemma 4 E4B model.

Also I was able to create another script what fetches images from wallhaven.cc and reads configs and API from text file if file exists.

Both scripts were done with Python and that python script was written by Pi with Gemma 4 E4B model, so yeah, it is usable.

[-]

benevbright@reddit

Gemma4 is not usable with agentic flow at all for me. It just never works, behaving completely dumb confident engineer and FAST, which is problem when you're dumb. Qwen3.6 is amazing on the other hand for coding. The thing is you need to find a right variant for you. I had issues with 3 different variants (mostly MLX) on tool calling. The input value was stringified when it's calling a tool, which is wrong. But once I found the right one (unsloth/qwen3.6-35B-A3B-GGUF-q8 for my 64GB Mac). And I'm just so happy to use this one because it's working really well. I guess you don't use Mac since you mentioned dense models in your post. But try to find one variant that works well.

[-]

benevbright@reddit

feel free to use my tool: https://www.npmjs.com/package/ai-agent-test it logs the raw json session so you can observe what exactly went wrong.

[-]

StupidityCanFly@reddit

I have qwen-27b-nvfp4 running browsing agents with 98% reliability in production. Each agent grows its context up to 200k tokens, per my stats. I do a lot of pre/post processing in code to ensure input/output has the right syntax and contents. But that’s really just mostly sanity checks and JSON fixes/cleanup.

[-]

BrightRestaurant5401@reddit

I used gemma and qwen from unsloth in cline and llama-server and that worked fine, so its definitely possible.
Gemini also helped me set it up in python to do requests to llama-server and that also worked fine.