We had to change how we validate HMI code after AI started writing most of our code

Posted by nevesincscH@reddit | ExperiencedDevs | View on Reddit | 19 comments

Wanted to document our case and share this with fellow peers here.

Our team started using AI-assisted development seriously about 18 months ago. Claude code for the bulk of the implementation work, engineers reviewing and directing rather than writing line by line. velocity went up significantly. That part worked.

What broke was everything downstream.

Our validation stack was built for human-written code. Meaning defined inputs, predictable structure, test cases written alongside the implementation. AI-generated code doesn't work that way. It produces implementations we wouldn't have written ourselves, edge cases we didn't anticipate, and UI behaviors that look correct in simulation and fail on the actual hardware cluster.

Three things we changed to tackle this:

Test generation had to move earlier. We used to write tests after implementation. With AI-generated code that's too late, by the time you're writing tests you're already rationalizing what the code does rather than what it should do. specs now drive generation, tests get written before the AI touches anything.

Simulation coverage stopped being the benchmark. We were measuring coverage against a software environment that didn't match the physical HMI unit. Green sim results meant nothing for on-device behavior. We added a hardware-in-the-loop stage WITH Askui for on-device UI interaction and Robot Framework for orchestration.

Documentation became a pipeline artifact. ISO 26262 compliance docs were being produced manually after the fact. That doesn't scale when AI is generating code faster than a technical writer can document it. DocGen handles traceability now, tied directly to the test outputs.

We concluded that AI made the writing fast and the validation expensive. The teams that are struggling right now are the ones who upgraded their development tooling without upgrading the feedback loop around it.

Anyone else navigating this? What does the validation stack look like at other teams dealing with the same shift? let’s compare notes!