We had to change how we validate HMI code after AI started writing most of our code

Posted by nevesincscH@reddit | ExperiencedDevs | View on Reddit | 19 comments

Wanted to document our case and share this with fellow peers here.

Our team started using AI-assisted development seriously about 18 months ago. Claude code for the bulk of the implementation work, engineers reviewing and directing rather than writing line by line. velocity went up significantly. That part worked.

What broke was everything downstream.

Our validation stack was built for human-written code. Meaning defined inputs, predictable structure, test cases written alongside the implementation. AI-generated code doesn't work that way. It produces implementations we wouldn't have written ourselves, edge cases we didn't anticipate, and UI behaviors that look correct in simulation and fail on the actual hardware cluster.

Three things we changed to tackle this:

Test generation had to move earlier. We used to write tests after implementation. With AI-generated code that's too late, by the time you're writing tests you're already rationalizing what the code does rather than what it should do. specs now drive generation, tests get written before the AI touches anything.

Simulation coverage stopped being the benchmark. We were measuring coverage against a software environment that didn't match the physical HMI unit. Green sim results meant nothing for on-device behavior. We added a hardware-in-the-loop stage WITH Askui for on-device UI interaction and Robot Framework for orchestration.

Documentation became a pipeline artifact. ISO 26262 compliance docs were being produced manually after the fact. That doesn't scale when AI is generating code faster than a technical writer can document it. DocGen handles traceability now, tied directly to the test outputs.

We concluded that AI made the writing fast and the validation expensive. The teams that are struggling right now are the ones who upgraded their development tooling without upgrading the feedback loop around it.

Anyone else navigating this? What does the validation stack look like at other teams dealing with the same shift? let’s compare notes!

[-]

Bubbly_Safety8791@reddit

AI-generated code doesn't work that way. It produces implementations we wouldn't have written ourselves, edge cases we didn't anticipate, and UI behaviors that look correct in simulation and fail on the actual hardware cluster.

... and paragraphs that contain lists of three things, which are non sequiturs, and don't read like a human wrote them.

[-]

nevesincscH@reddit (OP)

Guilty. Turns out validating AI generated content is harder than i thought ;)

I just used claude to organize my notes, it's not really that deep lol

[-]

Bubbly_Safety8791@reddit

I mean, I’m glad Claude-written prose is so distinctive, it means I can tell when it’s been used.

[-]

yegor3219@reddit

Survivorship bias?

[-]

Real_Square1323@reddit

Fuck off

[-]

floriv1999@reddit

Wasn't test first a best practice before that?

[-]

tatersnakes@reddit

This is just an AI ad

[-]

nevesincscH@reddit (OP)

i give up. one can't talk in this platform anymore without being called a sneaky salesman... sighs

[-]

YakaryBovine@reddit

Literally all you have to do is write a handful of paragraphs on your own.

[-]

tatersnakes@reddit

Might help to not have a hidden post history, or have something to say besides the 100th “AI writes all our code” post this week

[-]

originalchronoguy@reddit

‘Edge cases’ you didnt anticipate sounds like a good thing if it can predict and come up with scenarios you didnt think of.

That is sort of a win to figure that out before going to production… the ‘well i didnt think people would do that or that attack surface area was something we never anticipated before’

[-]

IronSavior@reddit

The more I read about living with AI code, the more I don't want to have any part of it.

[-]

throwaway_0x90@reddit

"We concluded that AI made the writing fast and the validation expensive. The teams that are struggling right now are the ones who upgraded their development tooling without upgrading the feedback loop around it."

Yup, this is the outcome I predict will be the primary situation for most of the SWE industry in 5 years. A great deal of code being generated and not (fast)enough testing of it.

[-]

Goingone@reddit

“Our validation stack was built for human-written code”

What does this mean?

[-]

dnullify@reddit

If I may take a stab, human written code with moderate craftsmanship is generally written to be read. Because a dev knows they're gonna come back later, remember nothing, and have to figure stuff out.

Now AI just consumed its own prior work as tokens.

[-]

Goingone@reddit

Got it, so he meant human written code is “maintainable”.

[-]

nevesincscH@reddit (OP)

Exactly sir!

[-]

nevesincscH@reddit (OP)

Yeah exactly, that's the exact failure mode we hit. AI written tests against AI written code is a closed loop. Everything passes, nothing meaningful is being asserted. You only find out it was hollow when the physical hardware behaves differently than the simulation did.

[-]

rwilcox@reddit

Congrats, AI moved your bottlenecks.

I’ve said this a lot but the most valuable thing y oh can go read (the general “you@, but also specifically you, OP) is The Goal or The Critical Chain, about bottlenecks and constraints.