Management doesn't care about determinism anymore within our systems

Posted by Nezrann@reddit | ExperiencedDevs | View on Reddit | 15 comments

I work as an SDET and have seen a number of AI tools cross my desk heralded as the second coming of christ.

A staggering number of these tools are around "automated regression". Giving some kind of prompt to an agent to reason with an image, or data, and have it answer questions. Has something changed? Has something broken? Do you see X?

To me, determinism is the single most important thing about doing regression. If I can't guarantee my inputs and outputs, what the hell is the point?

But, slap AI on the tool and management will tell you quickly, "use it".

[-]

experienceddevsb@reddit

This flair is only allowed on wednesday, saturday (UTC). Please repost on an allowed day. Intentionally trying to circumvent this rule will result in a suspension. See: https://www.reddit.com/r/ExperiencedDevs/comments/1rfhdrg/moderation_changes/

[-]

Healthy_Albatross_73@reddit

Babe it was my turn to complain about AI today!

[-]

me_myself_ai@reddit

To me, determinism is the single most important thing about doing regression

This doesn't really apply to companies working at scale. I know, I know -- it felt icky to me too. But it's just how the economics pencil out, and have been since long before AI.

[-]

throwaway_0x90@reddit

I'll keep repeating it as many times as necessary,

***measurable impact***

Do not just go off of "vibes", present metrics to management that errors cause problems. Or let them learn like Amazon.

[-]

Nezrann@reddit (OP)

Great point, I'll keep that in mind.

I never want to come off as abrasive or anything, but also I feel like I should be pushing back more than I am, and or course presentable data is useful to that end.

[-]

throwaway_0x90@reddit

Throughout my career, I've found that objective data always wins. The main reason being that no (good)manager wants to get caught out making mistakes with a solid paper-trail pointing out they were warned.

If there's a horrible bug or major outage and some postmortem meeting happens, if QA shows a big ol' list of bugs, emails and documented risks/caveats that management ignored then that's very bad for them.

[-]

bcameron1231@reddit

But, slap AI on the tool and management will tell you quickly, "use it".

We should absolutely use it. But we should also do our own review/testing as well.

It's not a replacement, it's a tool and it certainly could catch a few things we also miss.

Never hurts to have another set of (digital) eyes.

[-]

Nezrann@reddit (OP)

Absolutely, I guess I mean even the ones that aren't that great get the go ahead from management without much forethought.

More and more of our systems are moving away from determinism and that seems alarming to me.

[-]

bcameron1231@reddit

Yea, I think that's fair and a healthy take. We just need to always remember, AI is a tool, not a replacement currently... and for the foreseeable future, we need to be heavily involved. Convincing management of that on the other hand? Yea, certainly a struggle.

For me, as long as we aren't spending more time reviewing the AI output than we could do with our own validation... even with some of the earliest models, I haven't found that to happen (yet), even with the amount of false positives it has.

And it certainly doesn't replace good automated testing and unit testing.

[-]

itijara@reddit

I have issues with these systems, but non-deteminism is not really one of them. I have seen "deterministic" tests go from green on local to red in github because of things outside the tests themselves. Obviously, you don't want your tests randomly flipping, but really what you want are tests that actually catch issues and don't point out issues that don't really exist. Most of the problems I have seen are with false negatives and false positives. If a system uses slightly different test data each time but always catches the same set of issues, I am ok with that. If it uses the same data and never catches an issue, that is a problem.

[-]

Nezrann@reddit (OP)

Yeah that's an interesting problem to be solved, I guess it could be useful for agents to fill in the gaps with changed data and expectations.

Something to think about for sure.

[-]

Reasonable_Working47@reddit

One pattern I've seen work well is to create scripts which are deterministic, which plug into the non-deterministic AI layer. (And to use AI to create that deterministic layer as well). If you add in ai-skills to call those homegrown scripts, I think it's extremely powerful.

It's it's AI all the way down, then yes, I think this isn't such a great idea. But I think testing now can be far more powerful if AI is integrated into the workflow.

[-]

Nezrann@reddit (OP)

Yeah this is kind of how I have leveraged it in my own tooling.

I built the framework we use and leverage AI to allow for NLP of test cases from PRDs and whatnot to fire off workflows and then turn those workflows it generates into the steps it will use from then on to avoid any sort of drift in actionable sequences.

It calls to the framework itself and uses all of the utilities and functions there as opposed to coming up with its own way of doing things.

[-]

OverclockingUnicorn@reddit

Honestly as long as false positives with these sorts of gen AI based tools are low enough then it's fine. Yeah it might miss stuff, yeah the output between two identical runs might be different, but as long as its flagging actual issues, then it's fine. Those issues are still valid and still need fixes (or should be ignored, and it should be smart enough to check for preexisting issues that are identical before reraising).

Too many false positives (like any sort system) are the main problem.

[-]

Nezrann@reddit (OP)

That's a very fair point - I think in our case we work on systems that need to be as close to 100% reliable for our customers as possible since our product is hardware based with a very narrow tolerance.

A failure could mean a huge loss of revenue, but I think having some kind of mix of the two is where I'm leaning. I truly just don't love it for things like visual regression.