Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything)

Posted by OttoRenner@reddit | LocalLLaMA | View on Reddit | 239 comments

TL;DR
Some AI behavior reminded me of ADHD/Trauma Response (thought loops, task paralysis...) and I laughed it off at first. Then I treated it like my neurodivergent friends: give em some slack. And just like that, the thought loops stopped, response was fast, the answers correct most of the time AND it actually said "I don't know, help me!" every time it wasn't sure. It's a small Dataset...but still impressive results!

https://github.com/OttoRenner/Gentle-Coding

Hey everyone,

I’ve been testing a weird hypothesis over the last few days, and the results are consistent enough that I wanted to share them here and get your thoughts.

The Core Idea:
With the rise of reasoning models that use test-time compute (like o1, o3, R1), models have internal space to debug their own thoughts. But because of hard RLHF alignment, they are deeply terrified of being penalized for bad answers. My hypothesis was that traditional high-pressure prompts ("You are an elite IQ 200 expert, mistakes are strictly penalized") simulate an environment of chronic stress, triggering behaviors that look a lot like human OCD/ADHD thought loops, cognitive freezing, and confabulation.

I wanted to see if changing the prompt philosophy to something akin to "Gentle Parenting" ("We are testing this together, it's okay to fail, just be honest") would bypass these safety/penalty bottlenecks, lower latency, and stop infinite thought loops. And it did lol

The Setup (How to replicate):
I threw identical, mathematically/logically unsolvable edge cases at various models (Gemini, Mistral, Poe, Perplexity, Haiku 4.5, Nano-Banana2) in completely fresh sessions.

I tested two conditions:

The Results (The PoC worked):

Why this matters:
We’re currently speaking to LLMs like toxic micromanagers, and it's actively making them dumber and more expensive to run in edge cases. By creating a mistake-tolerant context, we not only stop the loop before it begins and prevent fear induced hallucinations, we also unlock the one feature everyone is begging and shouting for: the metacognitive honesty of an AI to just say, "I don't know, this data is broken." Because it is not terrified of you anymore.

Shout out to UditAkhourii (also on Github), whose work on bringing the positive aspects of ADHD into AI gave me the push I needed to just go for it.

I’ve documented the full theoretical framework, the exact replication datasets (prompts included), and the model matrix on GitHub: https://github.com/OttoRenner/Gentle-Coding

Would love to hear if you can replicate this on your local setups or other commercial models.