Home-rolled loop agent is surprisingly effective

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 28 comments

I created a small demo to illustrate how agents work compared to a standard chat bot.

Afterwards, I played with the simple loop and added 5 tools: grep, glob, read_file, write_file, edit_file and gave it a code editing task to see how it fared with no system prompt or other guidance.

Remarkably, this minimal harness not only managed to complete the task, it was able to do it quickly using small local models. The absence of massive prompts and safeguards also made it very fast.

I didn't expect something this crude to work so well, but it did. I encourage those interested to try rolling your own and you may be surprised by how effective it is.

[-]

thepetek@reddit

Yea this is pretty much what Langchain found with deepagents. Give the model a very small set of tools and it works far better

[-]

One of the most powerful patterns for small local models, is using the tool call results to reprompt the model. Small models will generally follow these instructions outright, preferring them over the context or their own internal direction

For example, when a file diff is returned to the model, you can add instructions to the end of the tool call, that the model should verify what it just wrote using X linter or Z type checker, or that the model should immediately run a grep with the function names of any functions it just updated, and dispatch sub-agents to update any files affected by the change

In general, tool calls do not need to return data, they can also include instructions.

In a general agent harness, these instructions are difficult to generalize.

In your own controlled environment, you can control them to make small model grind endlessly on stuff.

That's what the small models excel at. Grinding endlessly.

[-]

HopePupal@reddit

why not just use deterministic hooks? run the linter or type checker after any modification to files in that language

[-]

TokenRingAI@reddit

We already do, diff + linting. I was just giving a simple example.

One of the things I am working on adding to our coding product is file base triggers that would allow you to prompt inject specific instructions based on a file glob. So if a file in a certain path gets updated, you might prompt inject an instruction to update the docs.

You can do that in the tool call, or at the end of the agent run, and there are benefits to each method. In the tool call is slower but more comprehensive. Doing it at the end of the agent run often misses things.

Small models tend to need a ratchet mechanism to keep them pushing forward, we usually start with a plan or todo and then force the small model to keep going with a ratchet mechanism based on the todo.

If at each step of the todo you can force the model to run deterministic steps based on a prompt injection, you can then get more work done in a single run

[-]

MotokoAGI@reddit

Interesting. What small models have you had success doing this with? How are you adding instructions to the end of the tool call? As part of the tool call output?

[-]

TokenRingAI@reddit

All of them, from small to large. But you get the best results from small models since they are less deterministic.

What I described is essentially the friendly equivalent of a prompt injection attack

[-]

traveddit@reddit

tool calls do not need to return data

So you just omit the result for next turn? Don't think that's the way.

[-]

TokenRingAI@reddit

I might have misphrased this. When returning results from a tool call (whatever that might be, data, text, etc.), you can append instructions to the end of the result, that advise the model as to what the next step is that they should tak.

To put it simply, the result of a tool can instruct the model what to do next.

Basically, you are prompt injecting the model.

[-]

tavirabon@reddit

The absence of massive prompts and safeguards also made it very fast

I got a solid chuckle out of this. I hope you at least limit its system permissions, can't have the model thinking it needs to upgrade your kernel to solve a problem.

[-]

Qwen30bEnjoyer@reddit

My perspective is that if you just shove the little bastard in a rubber room (VPS, or VM) and let him loose on tasks that don't require pay as you go APIs or personally identifiable information, you don't need all those pesky safeguards anyways.

[-]

FullstackSensei@reddit

Would would it ever need system access at all? Good old development tools never needed system access to do their thing. So, why would an LLM now need that?

I'm building a similar harness and the LLM doesn't get any console access at all. It can run predefined commands via an MCP like JSON interface, like read file, write file, commit, etc.

[-]

DeltaSqueezer@reddit (OP)

I guess one end of the spectrum is to just give it 'bash' as that's a universal tool :D

[-]

tavirabon@reddit

https://www.theregister.com/2024/10/02/ai_agent_trashes_pc/

[-]

DeltaSqueezer@reddit (OP)

I didn't give it bash access, but I guess in theory it could still do a lot of damage just with the edit_tool.

[-]

tavirabon@reddit

It was an exaggeration, but also the first example that always comes to mind when being reckless with agents. I read a blog once on experimenting with giving agents full admin rights and at some point, it decided it needed to update the kernel to fix a performance issue, left it in a broken state and rebooted into a bricked OS.

[-]

minnsoup@reddit

This sounds almost identical to a talk the creator of Pi gave...

[-]

DeltaSqueezer@reddit (OP)

This is so addictive. I added a calculator tool, and more importantly, a tool which gives AGI:

[-]

Far-Low-4705@reddit

this is what i like about local models

you really dont need claude code. you can make your own agent/assistant.

It is so much more transparent, you can see what is going on, you understand it, the code is super short and minimalistic, and you control everything, its also just so much more fun

[-]

MoneyPowerNexis@reddit

I rolled my own and find myself using it quite a lot, encouraging more and more vs online services. I just have a tool loader that loads all the classes in all the modules in my tools folder that have a run function and spec. With that I can give an example of a tool class to an llm and it will build more tools based on that pattern. so I got search, fileio and a python sandbox up and running pretty quickly. Just search and fileio is 90% of my use cases but I can see myself adding complexity over time. Its really nice to setup an image generation server and give the thing a tool to use it but I'm not exactly getting anything done playing with that so I can quickly disable tools by changing the file extension in my tools folder and reloading.

I have instructed the llm if I give it a hashtag to look in that folder for the specified file and follow the instructions in it which is pretty nice for common tasks https://imgur.com/a/iSCZJMc

I know something like this I could probably just do a regex without using an llm but say if I get it to follow instructions to embed google maps and it does not know the coordinates then it will search for them: https://imgur.com/a/NinyIfD

in this case it decided to save the search results to a file: I gave it that ability after limiting the file size of what web results go into context and now it figures out whether to read the file or parse it with its python sandbox if its too big.

Ive been surprised quite a bit how different models chain together tool use.

[-]

Hot-Employ-3399@reddit

What you've used for edit_file? Search and replace? Writing diffs? Manual diffs?

IME Qwen is kinda bad for diffs even if you polish them afterwards (line count, prefix/suffix context).

[-]

DeltaSqueezer@reddit (OP)

It is search and replace, you have to supply the string to match and the string to replace it with. And to avoid wrong match, the match string you provide also has to be unique (unless you tell it to search and replace all occurrences).

[-]

ionizing@reddit

Not op but in my own interface I finally got Qwen 3.5 to excel at using sed - i for targeted edits. It does almost all of its work in bash commands, gets context around the edit area first then makes the edit and then often checks the edit as well, then moves on to the next. Before this I had tried many variations of an edit_file tool, and each time it was making some sort of mistake. Eventually I removed most tools and now I give it read file, write file and restricted shell access where it actually has lots of training data to work from, versus some unique tool definition it might not understand. Anyhow it's a beast now.

[-]

megadonkeyx@reddit

have been using my own avaloniaui based C# "agent loop" for a few months. its just tool calling in a chat loop with feedback.

the best part about your own loop is you can have it in your fav language and focus on the bits that matter to you.

[-]

DeltaSqueezer@reddit (OP)

File/Directory	Lines of Code	Format/Description
.
├── agent
│ ├── commands.py	275	Slash command registry for REPL
│ ├── config.py	23	Agent configuration with defaults
│ ├── conversation.py	142	Conversation history management
│ ├── display.py	310	Rich terminal display functions
│ ├── init.py	0	Package init
│ ├── input_utils.py	67	Readline integration and completion
│ ├── interrupt.py	53	Graceful Ctrl+C handling
│ ├── repl.py	433	Interactive chat REPL main loop
│ ├── state.py	64	Thread-safe agent state
│ ├── streaming.py	38	Live streaming markdown display
│ └── tools
│ ├── edit_file.py	473	File editing tool with diff support
│ ├── glob_search.py	116	Glob pattern file search
│ ├── grep_search.py	316	Regex content search via ripgrep
│ ├── _helpers.py	62	Shared helper utilities
│ ├── init.py	34	Tool plugin system
│ ├── mcp_source.py	202	MCP server integration
│ ├── read_file.py	175	File reading tool
│ └── registry.py	49	Tool auto-discovery registry
├── config.yaml	3	MCP server configuration
├── llm_config.py	10	LLM connection settings
├── requirements.txt	4	Python dependencies
└── run_repl.py	26	REPL entry point
└── TOTAL	3846

[-]

TomLucidor@reddit

GItHub or it didn't happen

[-]

segmond@reddit

Yeah, they are so simple. You can have any of the big models vibe up one in one shot in under 1000 lines and it would work.

[-]

DeltaSqueezer@reddit (OP)

Yup. Initially it was just 2 files: loop.py for the main loop and tools.py which had the tool definitions. That's still the core functionality.

I then just broke it down to make it more maintainable and add some quality of life features.

[-]

Mike-devs@reddit

Repository?