TDD enforcement primitive for coding agents — Tests the agent literally can't modify (formal spec + 120-line Python impl, MIT)
Posted by S-J-Rau@reddit | LocalLLaMA | View on Reddit | 0 comments
Spent the last months building AI agent frameworks locally after running into
the same failure mode over and over: the agent writes code, tests fail, and
instead of fixing code it "fixes" the test. AutoGen, LangChain, CrewAI — none
of them prevent this structurally. It's not a model-quality issue.
So I extracted a standalone primitive from the larger framework I've been
validating and wrote it up with formal invariants. Diagram attached.
Four primitives:
1. Blueprint Layer — plans tests in a context the agent can't see
2. Test Queue — append-only, ordered
3. TestLock — SHA-256 seal at commit time; agent gets the hash, never the source
4. Gate Condition — code gen only if a sealed test is RED
Three invariants (falsifiable):
I₁ TEMPORAL: commit(t) < generate(c)
I₂ STRUCTURAL: locked(t) → ¬modifiable(agent, t)
I₃ BEHAVIORAL: generate(c) iff ∃ t: locked(t) ∧ fails(t)
Feedback loop: Test fails → agent retries CODE, never tests. Tests are reality.
Works with any local LLM — I've tested the larger framework across 9 actors
(Qwen, DeepSeek, Claude, Codex, Gemini, Nemotron) with score variance under
15 points, which is the actor-exchangeability claim. Reference impl is Python
stdlib only.
Paper: https://doi.org/10.5281/zenodo.19393854 (STP standalone)
Context: https://doi.org/10.5281/zenodo.19378044 (Triple-A Thesis)
Code: https://github.com/SebazzProductions/sealed-test-paradigm.git
Happy to answer questions about the invariants or why this matters more for
local/autonomous agents than cloud ones.