Looking for feedback: local AI agent that executes tasks (planning → code → execution → verification)

Posted by Outside-System-3698@reddit | LocalLLaMA | View on Reddit | 7 comments

Hi all,

I’m experimenting with an agent design and would like some feedback.

I’ve been working on a local-first AI engineering agent called ZERO.

Instead of just generating text, the system actually executes tasks:

Requirement → Planning → Code → Execution → Verification

Current demos:

Requirement demo:

Takes a requirement.txt and produces structured outputs like:

- project_summary.txt

- implementation_plan.txt

- acceptance_checklist.txt

Mini build demo:

Takes requirement + input data and:

- generates Python code (number_stats.py)

- executes it

- produces verified output (stats_result.txt)

Everything runs locally, with visible artifacts and task state.

This is more of an engineering agent runtime than a chatbot.

Repo (with demos and execution traces):

https://github.com/setsuna701031/ZERO_AI

Would love feedback on:

- Does this approach to agent loops make sense?

- Where would you draw the boundary between planning and execution?

- What would you prioritize next (reliability vs capability)?

[-]

pdycnbl@reddit

i am doing something similar but in typescript. in principle its same approach but i am creating much more focused agent for file editing only, where it can be given instructions in English and it figures out how to make the changes to file. its just a single file and takes prompt, creates a plan and executes the plan. i am targeting sub 2B models. It works but reliability is still an issue. It sometimes adds spurious changes like removing unrelated comments from code.

[-]

Outside-System-3698@reddit (OP)

This is super interesting — I’m seeing very similar issues on my side.

Especially the part where the agent introduces unintended changes (like removing unrelated code or comments). That seems to be a common failure mode when planning and execution are too tightly coupled.

In my current setup, I’m trying to separate:

planning (what should be changed)
execution (what is actually modified)

and then add a verification step after execution to catch unintended modifications.

It’s still not perfect, but it helps reduce silent corruption.

Curious — are you doing any post-execution validation or diff checking?

[-]

pdycnbl@reddit

yes it reads file again and confirms that changes are present after every change. problem is in the planning stage itself, it adds instructions for unrelated changes which were not there in the original task but which it "thought" should be there after that it works relatively well.

[-]

Outside-System-3698@reddit (OP)

That makes a lot of sense — I’m seeing the same pattern.

Execution checks help ensure that changes are applied, but they don’t prevent the planner from introducing extra intent.

I’m starting to think the core issue is that the planner is allowed to “extend” the task instead of strictly interpreting it.

One idea I’ve been experimenting with is:

forcing the planner to explicitly map each change back to the original requirement
rejecting any step that can’t be traced to the input

Almost like a “requirement → plan traceability” constraint.

Have you tried anything like constraining the planner output or validating it before execution?

[-]

genielabs@reddit

Try with HomeGenie. I've also been experimenting with a 1.7B model (qwen3) and it is quite reliable so far.

[-]

genielabs@reddit

Reading what you are trying to achieve, made me think of this example I wrote: Multi-Agent Newsroom The good thing is that once the AI "creates" its agents, they are deterministic because their loop is translated into code you can actually verify and edit if needed.

[-]

Outside-System-3698@reddit (OP)

Additional context:

This is not meant to be a polished product yet — more like an engineering experiment around agent loops.

What I’m trying to figure out is:

- how far a local agent can go without cloud orchestration

- how to keep execution transparent (artifacts, logs, outputs)

- where the boundary between planner and executor should be

If anyone has built similar systems or experimented with agent runtimes, I’d really like to hear your experience.