llama.cpp server have built-in native tools (exec_shell, edit_file, etc.)
Posted by srigi@reddit | LocalLLaMA | View on Reddit | 48 comments


I was messing around with running local models recently, and while digging through the llama.cpp server docs, I noticed this experimental flag just sitting right there:
--tools TOOL1,TOOL2,...
It natively supports read_file, file_glob_search, grep_search, exec_shell_command, write_file, edit_file, apply_diff, and get_datetime. That is a battery of tools that basically turns llama-server into a mini agent harness. You really don't need anything more than your trusty .gguf file and the llama.cpp binary for basic AI assistance in your projects.
Note that file operations are relative to folder from which you started the server. There also isn't any security sandboxing yet, like a whitelist of allowed commands or strict denial of file operations outside the original folder. So, be very cautious with what you expose!
But still, I'm pretty amazed that llama.cpp is gaining these abilities natively. It completely eliminates the need to rig up MCPs or heavy wrappers just for things like getting the current date/time or reading the contents of a file.
AsliReddington@reddit
I want a very minimal coding or tool calling harness which is just a single python or bash file. Don't even want to both with these extensive code bases for no reason.
kivaougu@reddit
This is pretty simple to do by yourself as a tool call schemas are very intuitive to parse. Then you just add nudging for smaller models and some instructions. Core tools like read and write are pretty easy with python.
MoneyPowerNexis@reddit
Yep I have a one page tool use example that just uses json and rests.
I prefer to put tools in their own modules and load them from a directory. Once I decided on a format for a tool (I use a class with a spec function and a run function) its easy to show that to an llm and have it crank out another tool.
I dont use my harness for long running unsupervised jobs though its still just a chatbot. I still need a way to deal with context limits if I am doing long runs, for now I just start a new session if I get near my limit asking to make a plan for the next session if needed.
CheatCodesOfLife@reddit
Thanks, that simple script is cool.
MoneyPowerNexis@reddit
Not exactly one page but here is what I would consider a minimal harness with search, fileio, and url reading:
https://pastebin.com/pPLjbqqa
Optional command line arguments:
--llama_server LLM server URL. Defaults to http://localhost:8080.
--prompt_file File containing the initial user prompt.
--system_prompt File containing the system prompt. Replaces default.
--working_dir Base directory for file operations. Defaults to current directory.
--exit Terminate program after processing --prompt_file.
Hopefully the chat loop and whats going on with tools is still simple enough to still understand fully. This is obviously all vibe coded with minimal testing so take that for what its worth.
CheatCodesOfLife@reddit
Yep, still simple enough. It doesn't have the over-engineered look to it that other vibe coded scripts do. It works well with Gemma-4, thanks!
MoneyPowerNexis@reddit
The version I actually use is somewhat over engendered: https://imgur.com/a/28Fpx9l
Its so easy to just keep adding features. For me it basically has to be behind a web interface and that leans towards a more event driven loop but I get that that isnt a great starting point for someone who wants full understanding of the code to build their own harness.
AsliReddington@reddit
Thx will try this
MoneyPowerNexis@reddit
It's about as simple as I could make it with a chat loop that switches to responding to tool results but I would immediately change how the tool parameters are handled. I would not get a tool specific parameter in the chat loop but instead I would pass the entire parameter object to the tool function as well as other things a tool might need like a cancel event object, limits to the amount of data the tool should return etc:
As you add more complexity the surface area for bugs and for more features also expands and sometimes that might mean overhauling the entire thing. I have started to look at other harnesses and notice a lot of them build the tools themselves as state machines so for example a file io tool might have an option to set the path and then the next time the tool is called it is still in that path where as with my tool it has to specify the full path each time. With file io thats fine but models are getting good enough that if you give them a python sandbox they will try to run scripts and then interact with the running script they made so having some way to store the id of running scripts and to get new console output enter new input means the bot can test things like building webservers and testing their APIs. Qwen 27b is good enough for all that but it also means properly sandboxing everything becomes a priority.
ForestHubAI@reddit
The unsandboxed concern is real. Once you're shipping to actual devices in the field, "just trust the model with shell" stops being romantic — we wrap tool dispatch in a whitelist + per-tool capability check, so a broken model call can't reach anything it doesn't already have. Same idea, just paranoid about which tool reaches which subsystem.
Building agent lifecycle on edge at — happy to compare notes if you're going production with this.
Parzival_3110@reddit
Native tools are the right direction, but I think the missing primitive is receipts.
For files and shell, the question is what was allowed and what changed. For browser tools it gets worse because the agent has login state, tabs, forms, and possible duplicate submits.
I am building FSB for that browser side: https://github.com/LakshmanTurlapati/FSB
The lesson so far is that tool calling needs scoped ownership, approvals, and verification after the action. A model saying it clicked something is not enough. The harness has to prove which tab it touched and what changed.
Enough-Astronaut9278@reddit
Most agent tasks boil down to running commands and editing files anyway.
Badger-Purple@reddit
very cool discovery thanks for pointing it out. I always wondered why simple tools like a web scraper and a shell command could not be part of the runtime itself
srigi@reddit (OP)
I'm hoping that they add an option to add own native tool(s) - really the only thing missing is
web_fetchandweb_search.If they don't provide these (I guess they don't - it is soo much outside of the scope of llama.cpp), there should be an easy way to add own implementation.
Parzival_3110@reddit
That split matters a lot. web_fetch is great for docs and static pages, but it falls apart once the task needs a logged in browser, redirects, cookies, forms, downloads, or evidence after a submit.
I am building FSB for that browser tool layer: https://github.com/LakshmanTurlapati/FSB
The part I would want llama.cpp to standardize is receipts. Not just call a tool, but return what tab changed, what action completed, and enough state for the model to avoid doing the same thing twice.
10F1@reddit
You can add mcp servers in the web UI.
yes_its_that_bad@reddit
Yes this seems to be a reasonable guide: https://old.reddit.com/r/LocalLLaMA/comments/1rnyz75/how_i_got_mcp_working_in_the_llamaserver_web_ui_a/
Karyo_Ten@reddit
A web scraper is not a simple tool.
Once in a blue moon you need the "About us" link for debugging but more often than not you need just the "main content" to avoid polluting the LLM context.
srigi@reddit (OP)
That's why I'm begging for "add/define your own tool" functionality. There are a couple of good
web_fetchprojects out there on GitHub (search "language:Rust web_fetch" on their page), so there is no need to reinvent the wheel.annodomini@reddit
But there is "add/define your own tool" functionality. It's called MCP servers. There's a link for it right in the sidebar of the UI.
This new functionality is simply more convenient because it's built in so you don't have to manage separate MCP server processes, but the ability to add your own tools has existed for a while.
gladfelter@reddit
Mcp servers are inferior to skills that reference cli binaries that are optimized for the task. Progressive discovery is better than dumping giant API definitions into the context, and models are great at using bash to filter and search the output of cli binaries, whereas the entire contents of the mcp call are dumped into context.
BlobbyMcBlobber@reddit
Web scraping is not a simple tool. Web search is not simple either.
Shell command is very simple to implement but also very dangerous. It's good that it's not an option by default.
Far-Low-4705@reddit
these tools are not sandboxed, so be very careful with them, they run directly on your computer.
redditpad@reddit
what's the current standard approach for this? I have tested it with OpenCode and it works
yeah-ok@reddit
Another seldomly talked about feature flag is (after the hf integration) the "--offline" param, worth it for us taking-local-seriously ppl!
johnnaliu@reddit
cases that bit me weren't "rm -rf", it was the agent "cleaning up" the working dir after finishing the task. what are people using to bound what these tools can touch? if not in a sandbox
CatTwoYes@reddit
Love that this is landing in the binary, but the security gap is real — exec_shell_command with no sandboxing is one prompt injection away from disaster. For read-only operations (read_file, file_glob_search) this is genuinely useful and covers 80% of what people reach for MCP to do. Hope they add a basic command whitelist before this ships as non-experimental.
postitnote@reddit
When did you decide to start being a full AI bot?
llama-impersonator@reddit
after may 14, 2026, his account has an em-dash in nearly every post. before this, looks human generated.
CheatCodesOfLife@reddit
Can you tell this one https://old.reddit.com/r/LocalLLaMA/comments/1tluma3/llamacpp_server_have_builtin_native_tools_exec/onkq875/ ?
(More obvious if you click the profile)
Seems like a lot of them write like this now. Once you notice it, you see them everywhere.
llama-impersonator@reddit
yeah these ones annoy me as it's like a bad imitation of my own style.
Foreign_Risk_2031@reddit
—
lioffproxy1233@reddit
You can just specify what tools in a list instead of --all
Agreeable_System_785@reddit
So this would only work if your inference machine is also your dev machine? If you have a separate inference server, you still need to mount or something to make it work.
annodomini@reddit
Yeah. If you want to run tools on a different machine, you need to use an MCP server. This just gives you a convenient way to run a few built-in tools on the same machine as inference.
srigi@reddit (OP)
Unfortunately, yes. I use a MacBook for developing and a gaming rig for running local LLMs. Invoking llama-server on a gaming machine makes it "see" files there, and not on my MacBook where I access the llama-server's web UI.
This is really annoying, but maybe it is actually good, since my gaming machine is more disposable (eventual damage done there will hurt less) - so I can just fork projects there and keep it working outside of my notebook.
Napster3301@reddit
the approval ux is the surface issue. auto-approve requires trust in the tool calls and llama.cpp doesnt give you that yet. embedded chat templates on most ggufs still emit bracket variants ([function=X], function=NAME) instead of clean openai tool_calls arrays, so your auto-approver random-parses garbage. fix is override with --chat-template-file pointing at the upstream fixed template (unsloth has them on hf).
the other half is the model itself. a censored model running exec_shell decides your rm temp.tmp "looks dangerous" at step 47 of your loop and aborts the task. abliterated/uncensored weights remove that failure mode but most public llama.cpp tutorials skip that part.
tool list is great. infrastructure for auto-running an agent is still diy.
AdmirablePresence216@reddit
the exec_shell one being native is kinda wild to think about for client deployments, like the security implications alone probably need a lot of thought before you'd hand that to an unvetted model, even locally. are you running this behind any kind of permission layer or just letting it go free range?
srigi@reddit (OP)
Only reasonable thing to do for now is to manually inspect every `exec_shell` command, never ever to configure it to auto-approve. Also the “approving UI” hasn’t the optimal UX right now (see image). You must manually expand the tool call block to see the command which is requested. In most modern harnesses this is auto rendered for you.
VoiceApprehensive893@reddit
some clueless user is gonna get that unsandboxed rm -rf
Ok-Measurement-1575@reddit
I also noticed this the other day but couldn't get any of it to work.
srigi@reddit (OP)
It's really no rocket science, just start your (updated) llama-server with a list of tools you want in a folder where you want to operate:
Then you'll see the configuration in the Settings panel.
mantafloppy@reddit
No.
https://github.com/ggml-org/llama.cpp/discussions/22132#discussioncomment-16865312
CommonPurpose1969@reddit
Built-in tools are definitely a thing. Just tested it.
annodomini@reddit
That answer is just wrong. It absolutely does enable built-in tools.
I saw this flag a few days ago in the docs, but was a bit skeptical of it as it doesn't do any sandboxing. However, I just realized that I already run my llama.cpp in an ephemeral container with just a few directories mounted for models and settings, so it already has at least some sandboxing so I figure YOLO, and tried it out. Yep, it works.
SnooPaintings8639@reddit
Is the guy importamt? Because I did test it, and it definitely worked, i.e. it could read my file system when using it via llama server web UI.
lioffproxy1233@reddit
Anyone try hand rolling their own llama.cpp edits for custom tooling?
temperature_5@reddit
I was super stoked to add them to my chat app too. Qwen had some trouble with the diff and edit tools (might just need better definitions) but the others work great. Even just giving it shell opens up basically everything other functionality!
Llama.cpp is so nice compared to other bloated dependency hell frameworks.