Decent model to "quickly" recognize rule violations?

Posted by xephadoodle@reddit | LocalLLaMA | View on Reddit | 7 comments

Hello all, I am building an AI agent orchestrator of sorts, and am wanting to be able to add in a local model that could quickly recognize whether the ai agents are breaking basic rules, like trying to stash files to avoid fixing tests, or mentioning anything about "simplifying" the code or tests (always a bad sign the agent is going the lazy route), etc.

I have a 24gb nvidia on hand, but I am unsure which models could be given some basic rule context and do reliable/quick flagging of violations.

Thanks in advance, and sorry if this might be a dumb/impossible question.

[-]

jduartedj@reddit

for that kind of low-latency classification, you don't really need a big general model. on 24gb you could run something like Qwen2.5-7B-Instruct or Llama-3.1-8B and get sub-second judgements with vllm or llama.cpp, but honestly even smaller models like Qwen2.5-3B finetuned (or just well-prompted with a few examples) work surprisingly well for binary/categorical classification.the trick is structuring the prompt as a strict yes/no with a fixed schema output (json with violation: bool, rule_id: str). that way you avoid the model rambling and you can hard-fail on parse errors. i'd also keep the rules list short per call (maybe 5-8 rules max in context), and route between specialized prompts if you have many categories. helps a lot with consistency.not a dumb question btw, this is basically how a lot of guardrail systems work in production

[-]

xephadoodle@reddit (OP)

Awesome, thanks for this, it was quite helpful

[-]

jduartedj@reddit

glad it helped! good luck with the orchestrator, sounds like a cool project

[-]

ComplexType568@reddit

I have a feeling anything 3.5 would beat that.

[-]

jduartedj@reddit

yeah fair, qwen3 7b/8b would prob smoke 2.5 at this kinda task. i kinda defaulted to 2.5 cuz the tooling is so well-baked at this point but you're right, the newer gen handles structured output way better out of the box. only thing i'd watch is vram headroom for the longer context windows on 3.5, but for short rule-check prompts its a non-issue

[-]

madsheepPL@reddit

oss safeguard 20b comes to mind

[-]

xephadoodle@reddit (OP)

That does seem like it fits the bill, but how "fast" is it? I am hoping to find a solution that could handle monitoring 3-5 agents at once. They are not super fast agents though, think Claude/GPT-5/etc.