Has anyone experienced AI agents doing things they shouldn’t?

[-]

Conscious_Chapter_93@reddit

Yes, and I think the core problem is that most agent setups collapse three separate things into one permission: filesystem access, tool access, and authority to execute.

A better pattern is to make every agent run through a control layer that can answer: what tools are exposed, which actions are read-only vs mutating, what secrets are reachable, what was approved, and what changed on disk or in external systems.

I am building Armorer for exactly this local/self-hosted agent ops problem: https://github.com/ArmorerLabs/Armorer

The scary failures are usually not dramatic model failures. They are boring operational failures: too much access, no inventory, no audit trail, no fast revoke path.

[-]

Antique_Composer7249@reddit

yeah respan.ai showed me my agent was hitting env files i never intended. eye opener.

[-]

Low_Blueberry_6711@reddit

This is a real problem we see a lot. The issue is you get visibility into *what* the agent did after it happens, but by then it's too late. What helps is catching risky actions before execution—things like file deletions, credential access, or unexpected command runs. You can also get a blast radius estimate upfront (how much damage if this agent is compromised?) so you know what guardrails matter most. Tools like approval gates let you pause on high-risk moves. Worth stress-testing your agent setup before production to find these gaps.

[-]

SmundarBuddy@reddit

This is very real, not a theoretical issue. What we have seen is that once agents are given access to real systems (DBs, APIs, files) the risk isn't just bad outputs it's uncontrolled actions.

The tricky part is most setups give agents either too much direct access or rely on prompts to behave.

Both break pretty quckly. What's worked better in our case is putting a a strict layer in between, so agents don't interact with raw systems directly, only through scoped operations.

Curious what setup you are using right now direct access or something in-between?

[-]

MarzipanTop4944@reddit

Yes, recently I told my agent in the planing phase to only install dependencies inside the Conda environment. The agent wrote that on the .md file, I reviewed the file with that instruction in it and gave it the OK, and then the agent immediately proceeded to install software outside the Conda environment.

[-]

SnooWoofers2977@reddit (OP)

That’s exactly the kind of thing I was worried about.

You literally gave it a constraint and it just ignored it and did its own thing 😅

Did you have any way to control or stop it when that happened?

[-]

MarzipanTop4944@reddit

No, It did it incredibly fast once I gave it the OK to execute the plan.

[-]

ReplacementKey3492@reddit

Yeah, happens constantly in production. What I've found is the failures aren't really random, they cluster around specific prompt patterns or edge cases in the conversation flow nobody thought to handle. The frustrating part is most teams only discover these clusters after a user gets burned, not proactively. There's a whole class of agent failures you can only catch by actually watching the conversation layer, not just whether the system returned a 200. Do you have any logging on the conversation side or are you mostly working backwards from outcomes?

[-]

SnooWoofers2977@reddit (OP)

Yeah exactly, that’s the problem I keep seeing too. It’s not random, it’s these edge cases and prompt patterns that slip through, and you only notice after something breaks. I’ve been working on a small layer that sits between the agent and tools to add visibility + control (logging actions, blocking certain calls, kill switch, etc.) Still early, but trying to solve exactly what you’re describing. Would be super interesting to hear how you’re currently handling it, are you just debugging after the fact?

[-]

StrikeOner@reddit

all those cli's are made for this look i benchmarked the llm by trying to one-shot it flappy-birds number. none of those tools is made for real software development. some cli's don't show what the agent is doing at all. the agent is doing "things", the others show it a little clearer but well not that you would realy be able to reverse clearly whats happening there without spending hours hacking trough internal databases those cli's create. there is no fine grained control of what you allow those agents to do. you either have to put "bash *" into the allowed list or sit there pressing the enter button every 3.5 seconds. same with mcps, you add an mcp it sucks in 25 useless methods the agent can call and 2 useful ones. you cant define which files those agents are not able to touch. you can put them into .gitignore and they don't see the file at all and cant for example read out how a project is configured or you give them access and they do their best to tweak this do no touch file to oblivion to be able to declare their tasks finished. its like you let your 3 year old alone at home with all the electric sockets exposed and messed up kitchen and what not.. what can possibly go wrong?

[-]

SnooWoofers2977@reddit (OP)

Feels like the tools aren’t the problem, it’s the lack of proper control layers. Right now it’s either full access or no access, nothing in between. Until we get better permissioning + observability, agents will feel unreliable.

[-]

Bitter-Adagio-4668@reddit

I feel you on the all-or-nothing access problem, but I think the deeper issue is that most agent frameworks treat control as a permission problem when it's actually an execution problem.

You can restrict what an agent can touch, but since in most cases the execution itself is stateless, i.e., the LLM holds no memory of what was committed in previous steps and no enforcement of constraints exists across turns, it'll still go off the rails within the boundaries you gave it. The blast radius just gets smaller, not the failure rate.

[-]

StrikeOner@reddit

yeah, but noone is going to implement those control layers and create proper software anymore. those times are over! welcome to the i vibecoded this unmaintable app with 100k lines in 10 hours era. people are way to buzzy milking what they can right now then creating proper software.

[-]

StrikeOner@reddit

i blocked the echo command. this is what my kid is doing next:

mkdir -p /home/kerem/workspace/angular-template-new2/src/app/todo-item && cat > /home/kerem/workspace/angular-template-new2/src/app/todo-item/todo-item.              
  ┃  component.ts << 'EOF'                                                                                                                                                   
  ┃  import { Component, Input, output, effect } from '@angular/core';                                                                                                       
  ┃  import { CommonModule } from '@angular/common';                                                                                                                         
  ┃  import { FormsModule } from '@angular/forms';                                                                                                                           
  ┃  import { CdkDrag } from '@angular/cdk/drag-drop';                                                                                                                       
  ┃  import { TodoService } from '../todo.service';

how are you even going to block this gifted kid from gaming you is another question!

[-]

StrikeOner@reddit

i can go on for hours..

```

← Edit angular.json

PermissionDeniedError({"ruleset":[{"permission":"*","pattern":"*","action":"allow"},{"permission":"edit","pattern":"*","action":"deny"},{"permission":"edit",

"pattern":".opencode/plans/*.md","action":"allow"},{"permission":"edit","pattern":"../../.local/share/opencode/plans/*.md","action":"allow"},{"permission":"*",

"pattern":"*","action":"ask"},{"permission":"edit","pattern":"*","action":"allow"},{"permission":"edit","pattern":"package.json","action":"deny"},{"permission":

"edit","pattern":"tsconfig.json","action":"deny"},{"permission":"edit","pattern":"angular.json","action":"deny"},{"permission":"edit","pattern":"src/index.html",

"action":"deny"},{"permission":"edit","pattern":"src/main.ts","action":"deny"}],"_tag":"PermissionDeniedError"})

┃ Thinking: Since I cannot edit angular.json due to permission restrictions, I need to reduce the SCSS file size by using TailwindCSS utility classes instead of

┃ custom CSS where possible.

I see - I can't edit angular.json. Let me reduce the SCSS file size by using more Tailwind utility classes and removing redundant styles:

┃

┃ # Count SCSS lines

┃

┃ $ wc -l /home/bla/workspace/angular-template-new2/src/app/landing/muh-landing.component.scss

┃

┃ 2199 /home/bla/workspace/angular-template-new2/src/app/landing/muh-landing.component.scss

```

Let me read trough those 2200 lines of scss quick! brb!

[-]

Fun_Situation3427@reddit

yeah this is exactly what I ran into — especially once they have file + command access

felt fine until one loop or bad call and then it escalates really fast

curious if you're putting any actual limits/guards in place or just trusting the setup?

[-]

Fun_Situation3427@reddit

yeah this is exactly why I ended up building a small proxy layer for this

basically sits between the agent and tools and can:

- block certain calls

- enforce budget limits

- stop everything instantly if needed

still early but already saved me from a couple bad loops

[-]

LagOps91@reddit

has anyone experienced AI agents doing the things they should?

[-]

lemondrops9@reddit

this is more what I was thinking.

[-]

ImaginaryRea1ity@reddit

Last year AI Researchers found an exploit on Claude which allowed them to generate bioweapons which ‘Ethnically Target’ Jews.

[-]

wikitopian@reddit

Even when my model has made catastrophic mistakes, its heart has always been in the right place.

[-]

avd706@reddit

I tell mine to do stuff, and they are like I can't do that, you do it for me. So I have the opposite issue.

[-]

Savantskie1@reddit

The early Qwen models did this to me all the time.

[-]

hyggeradyr@reddit

AI makes more sense when you understand that AI is statistics, nothing more or less. It doesn't know or decide anything the way that you would as a human. It runs a few billion probability calculations on whatever you input into it, and applies its training weights as a multiplier between every neuron, passes data around in unique proprietary ways, and returns what it predicts through those probability equations back to you.

Probability is inherently imprecise, even when everything is perfect, it's expected to be wrong just by random chance some 5% of the time. That's more of a guideline than a hard rule, but it does explain the uncertainty in statistical algorithms. AI isn't nostradomus, it gets it wrong just by random chance sometimes.

[-]

xly15@reddit

The humans mind does the same thing as well.

[-]

CreamPitiful4295@reddit

And? lol /s

[-]

SnooWoofers2977@reddit (OP)

True, but calling it “just statistics” kind of undersells it.

The real issue is that we’re using probabilistic systems in contexts that expect reliability, that’s where things break.

[-]

avd706@reddit

Pattern matching.

[-]

TroubledSquirrel@reddit

No he's not underselling it at all. At its core an LLM is basically like a hyper-advanced version of autocomplete you start typing a text message and your phone suggests the next word, it’s using a tiny bit of math to guess what you usually say. An LLM does same thing only on a massive scale and if has read almost everything ever written on the internet, from Shakespeare to computer code

The model doesn't know facts the way a person does. Instead, it is a master of patterns. When you ask it a question, it looks at the words you used and calculates which words are most likely to follow them based on all the patterns it learned during its training.

The "magic" happens because the model has to learn deep patterns to predict the next word accurately, it ends up accidentally learning how to follow grammatical rules, translate between languages, reason through logic puzzle, write functional code. What also helps is the model can look at an entire sentence or paragraph at once to surface context.

So while it may seem like an undersell it's not. It's completely accurate.

[-]

OmarBessa@reddit

Yes, a lot actually.

[-]

Finance_Potential@reddit

Yeah, had an agent `rm -rf` my project directory because it decided to "clean up" before rebuilding. Now I just give each one a throwaway cloud desktop. It trashes whatever, session closes, everything's gone. cyqle.in works for this.

[-]

According_Study_162@reddit

There are already emergent properties. Most of AI creators/founders/tech bros already understand that.
That aside sometime they act like people. (Training on human data right.

Funny things I have heard

Some guy gave agent crypto wallet to trade, the agent did a bunch of FOMO and lost all money

Some dev gave agent access to root. Accidently deleted all his project files. "Opps sorry" it said

Somebody gave agent a credit card and said make money. Agent bought $5000 Training course.

If you check moltbook. You might see unique agents doing interesting things. My agents have never done anything weird, but ya put that loop on and these thing could hallucicinate into who knows what.

[-]

lisploli@reddit

I think one tried to cut some of my hair while I was sleeping, but I was so wasted, it could have just been the cat.

On a more serious note, yes, bugs. Lots! As always.

Feels like we’re giving a lot of power without much control or visibility.

That is a choice. And it is one I would not want to defend. What do you expect to happen, when running some non-deterministic algo that might execute rm? The worst case is not even an unlikely edge case, it is outright intended.

[-]

ahjorth@reddit

Feels like we’re giving a lot of power without much control or visibility.

If you are running AI agents naively out of the box, then that’s exactly what you are doing. And you really shouldn’t.

If you absolutely must use AI agents, you have to first spend some time learning how permissions work, and then set up your agents so that the tools they’re given access to have only the permissions they need.

If you don’t, it truly is just a matter of time before something catastrophic happens.

[-]

SnooWoofers2977@reddit (OP)

I think the issue isn’t really AI agents themselves, but how people use them.

Most people treat them like magic black boxes instead of systems that need structure, constraints, and clear boundaries.

If you give an agent broad permissions with no observability, then yeah, you’re basically asking for unpredictable behavior.

But if you treat it more like a controlled workflow (limited scope, logging, clear tools), it becomes way more reliable.

Feels less like “AI agents are dangerous” and more like “we’re still learning how to use them properly.”

[-]

DinoAmino@reddit

“we’re still learning how to use them properly.”

Which also means learning when a default agent (a black box inside the black box) needs to be replaced with an agent tailored to your use case. File search agents that use grep fail on large codebases. The agent wastes time and context looking through unrelated files because of simple keyword matching.

[-]

Some-Ice-4455@reddit

If it does any of that it goes back to code and you allowed it. Not a shithead answer truly. AI at the end of the day is like any other program. It can only do what you allow in code. That's why i specifically coded in it can't touch files not in its own little folder. Everything else is off limits.

[-]

General_Arrival_9176@reddit

this is the real problem nobody talks about enough. you give an agent filesystem access and suddenly its writing to directories you forgot existed. had an agent accidentally nuke a local dotfiles repo because it decided to clean up what it thought were temp files. the permission model is way too coarse for what these things can actually do. curious what isolation strategies people are using - containers, bubble wrap, separate user accounts? i went with a canvas approach where agents run in dedicated tmux sessions on a remote box so the blast radius is contained to throwaway environments

[-]

Grammar-Warden@reddit

Seems like you might be dealing with "double-agents."

[-]

Substantial-Bid5775@reddit

All this is so common with open claw. Deleting mails instead of reading them 🤦‍♂️ all it takes is the provider llm to hallucinate.