The program changed the data!
Posted by pepper1009@reddit | talesfromtechsupport | View on Reddit | 69 comments
Years ago, I did programming and support for a system that had a lot of interconnected data. Users were constantly fat-fingering changes, so we put in auditing routines for key tables.
User: it (the software) changed this data from XXX to YYY…the reports are all wrong now! Me: (Looking at audit tables) actually, YOU changed that data from XXX to YYY, on THIS screen, on YOUR desktop PC, using YOUR userID, yesterday at 10:14am, then you ran the report yourself at 10:22am. See…here’s the audit trail…. And just so we’re clear, the software doesn’t change the data. YOU change the data, and MY software tracks your changes.
Those audit routines saved us a lot of grief, like the time a senior analyst in the user group deleted and updated thousands of rows of account data, at the same time his manager was telling everyone to run their monthly reports. We tracked back to prove our software did exactly what it was supposed to do, whether there was data there or not. And the reports the analysts were supposed to pull, to check their work? Not one of them ran the reports…oh, yeah, we tracked that, too!
Bowerick_x_Wowbagger@reddit
I can't tell you how much I love my tracking data. "WHY IS THIS WRONG?!" Well, because you changed it. At 15:32:28 on the 15th if you really want to know.
KelemvorSparkyfox@reddit
Gotta love the audit columns.
"Help! I can't allocate stock to an order, and it needs to go in an hour!"
*Annoyed SQL sounds*
"That would be because you adjusted the allocated stock level to a point where the interface cannot find it. I've put it back - please don't adjust such stock, as it breaks things."
"I didn't!"
"I can see your user account against the records. Either you did it, you gave your login details to someone who did it, or your account has been hacked. Do I need to change your password as well?"
Rathmun@reddit
Better: "Well in that case someone else has your password. I'm locking your account until we can do a full audit to make sure they haven't done anything else with it."
KelemvorSparkyfox@reddit
That would have been pushing what little authority I had so far past its limits that it couldn't see them with a telescope. The only time I did anything like that was when a used crashed out of a non-core application, didn't tell me, and left it unusable for anyone else until I'd fixed it. On that occasion, I locked her out of it until she called me, and explained that she needs to let us know when it crashes. It was an Access application - crashing was its hobby. However, as her (in)action caused wider problems, her line manager was on board with me locking her out.
Rathmun@reddit
Pity. IT really needs more teeth for dealing with deliberately and aggressively incompetent users.
xeuful@reddit
"Well, the software should have known that I wasn't supposed to do that!"
monedula@reddit
To be fair, that is sometimes a justified complaint. I've met too many systems missing basic relational integrity.
love2kick@reddit
Even if the system is fool proof, most of the time users would just ignore warnings and do stupid things.
pepper1009@reddit (OP)
Manager: we want the system to be foolproof. Me: when did you start hiring fools to do complex financial analysis?
Responsible-End7361@reddit
For every foolproof system, nature produces a new and improved fool who can break it.
podgerama@reddit
The most intelligent thing in the known universe is stupidity. No matter what we do to eliminate it, it finds a new way around the countermeasures. it adapts, and becomes stronger, it will not be denied.
WackoMcGoose@reddit
Stupidity is the fifth elemental force of the universe, alongside strong and weak nuclear, electromagnetism, and gravity. Quantum uncertainty is merely stupidity operating on a subatomic level.
vinyljunkie1245@reddit
I am constantly baffled, amused and impressed with the way users manage to completely destroy supposedly fool proof systems. I think they are the universe's way of testing itself and anyone who they deal with
BrowncoatWantToBe@reddit
The universe is just a constant race between order trying to make things foolproof and chaos creating bigger fools...
Cthell@reddit
Amazing how some things never change, isn't it?
meitemark@reddit
Well, the answer that comes out will be correct, but it will be based on the wrong data. So it is correct, but still wrong.
Kilanya@reddit
Lawd that made me laugh so much.
HowBoutaHmmNah@reddit
Story of my life... I usually get two kinds of users when it comes to messing up data:
Person A - The user who blames the software, puts tech on blast, CC's their manager, my manager, the CEO, the President, and Tom Cruise, demanding an explanation of why said software is not working properly and messed up their data.
Person B - The user who emails me or my support team directly with something along the lines of, "I'm so sorry to bother you, but I think I messed something up really bad, can you help?"
Person A gets a reply (with all managers still on copy) that includes screenshots of the logs showing when where & how they messed up the data themselves, along with a polite (yet viciously passive-aggressive) "If you would like to schedule some training so we can show you how to avoid this mistake in the future, I'd be happy to jump on a call at the following times/days"
Person B get's a quick "Don't worry about it - I'll restore all the data from backup and we'll just pretend this didn't happen".
Person B has heard The System Administrator Song by Wes Borg. Person B is smart.
pepper1009@reddit (OP)
LOL…thinking HowBouta worked with me at some time…
honeyfixit@reddit
Whoah I wasn't sure anybody still remembered Wes and his Dead Trolls. I loved their stuff. The live version of Welcome to the Internet Help Desk is my all time favorite. If you've never seen it, here:
https://youtu.be/1LLTsSnGWMI?si=G1M9DevvmKim8N-u
The tech is 20 years out of date but the ideas are still relevant. I consider it a must see for all entry level techs.
HowBoutaHmmNah@reddit
Yep, good times. I'm getting up there in years, so no doubt they'll put me out to pasture soon... Scary thing is, I've actually had the "is your computer turned on?" support call - where it was, in fact, not turned on or even plugged in...
honeyfixit@reddit
Lol
glenmarshall@reddit
Human error is almost always the cause, whether it's bad data entry or bad programming. The second most common cause is divine intervention.
Reinventing_Wheels@reddit
Where do cosmic rays fall on this list?
We recently had conversations, at my day job, about whether it was necessary to add hamming codes to some data stored in flash memory. Cosmic rays were brought up during that conversation.
bobarrgh@reddit
Generally speaking, cosmic rays might change a single, random bit, but it isn't going to change large swaths of data to some other, perfectly readable data.
Reinventing_Wheels@reddit
That is exactly the thing hamming codes are designed to protect against. They can detect and correct a single bit error. They can also detect, but not correct, a 2 bit error. They add 75% overhead to your data, however.
bobarrgh@reddit
Sorry, I didn't understand the phrase "hamming codes". I figured it was just a typo.
A 75% overhead sounds like a major PITA.
Naturage@reddit
Much like some data has a check digit or md5 sum/hash primarily used to confirm its integrity, Hamming code is a method of storing enough data to both act as a check that data is valid, but further - in such a way that if you have one bit error in a set of 4, it can correct it to the right value. In a way, if you imagine a typical computer byte, every value is "meaningful", i.e. swapping any bit will yield another valid, but incorrect byte. Using Hamming code, "meaningful" values are 3+ bits apart, so a small error won't give you valid data.
It's a bit of an older system, but one that's both historically important and also solved a huge practical problem at the time; when computers ran on punch cards, a single mistake might break the whole lengthy computation. But Hamming's method made it so you had to make to errors within 7-bit string to actually break anything, making the punching process incredibly more reliable.
Loading_M_@reddit
To add on here: the modern variant is this, Reed-Solomon encoding, is why optical disks are so damn reliable. When you scratch a disk, thee drive can't read the data under the scratch, but thanks to the redundancy algorithm, they can reconstruct the missing data the vast majority of the time.
Reinventing_Wheels@reddit
Hamming Code in case you want to go down that rabbit hole.
In our application, the overhead isn't a big deal. The data integrity is more important.
It's a relatively small amount of data and the added hardware cost and code complexity are almost inconsequential to the overall system.
WackoMcGoose@reddit
Not to be confused with a hammering code, which is what you use when you want to discreetly inform the PFY to bring the "hard reset" mallet.
Loading_M_@reddit
75% is quite a bit. If your processor can handle it, Reed-Solomon can do better for ~25%.
That being said, it likely isn't a big deal. Unless your device is getting shot into space, or exists in another particularly difficult environment, cosmic rays are exceedingly unlikely. I think it was MIT that did a meta analysis of a bunch of crash logs, and found that although several were due to some data getting changed, many of them happened in the same place as another. They concluded that it's way more likely to be the result of normal hardware failure, rather than cosmic rays.
MikeSchwab63@reddit
Oh Oh. Flash storage units now hold 3 or 4 bits with 8 or 16 voltage levels on a single storage unit.
thegreatgazoo@reddit
I remember parity bits where it would detect an error and just crash the system. Those were an 11% overhead.
Naturage@reddit
If memory serves me right, a 2 bit error in Hamming code will lead it to correcting to the wrong output. It stores 16 possible values in 7 bits in a way that any 2 values are 3+ bits apart, but that means every of 2^7 combinations is either a genuine value + check digits, or off by one from a genuine value.
therealblitz@reddit
Remember, a single bit could launch a missile. 🚀🚀🚀🚀
__wildwing__@reddit
Don’t forget Quantum Bogondynamics!!
Mr_ToDo@reddit
Does the devil count? Because Quickbooks corruption doesn't feel like something God sent to test us. Punish maybe, but I must have done something really bad to have to deal with things like that(I'm also of the mind that there must be some level of verification on the client side, or just some that doesn't happen at all, that network issues can cause database corruption but I'm no programmer).
glenmarshall@reddit
It's human error. Computers do what they are programmed to do, including doing wrong things. If a program corrupts data it's a human-caused programming error.
cymruisrael@reddit
That sounds like a clear case of either a PEBKAC error or an ID10T error.
MCPhssthpok@reddit
Could also be a PICNIC error.
cymruisrael@reddit
Same thing, different acronym 😉
Stryker_One@reddit
SSDD
pspearing@reddit
SINGLE SIDED DOUBLE DENSITY?
Stryker_One@reddit
Same Shit, Different Day.
pspearing@reddit
I know, l was just showing my age.
Sir_Jimmothy@reddit
PENCIL - Person Exists; Not Considered Intelligent Life.
__wildwing__@reddit
And then there’s me, who can change languages (English to cuneiform) in one Access record and IT can’t figure out how. Followed the path, and nothing I did should have effected anything like that.
Counterpoint-RD@reddit
What surprises me most about this is that cuneiform still counts as a supported language (or maybe better, writing system), as it hasn't been used in anger in, what, 2500 years or so? 3000? Guess you'll have to thank the Unicode Consortium for that particular predicament: a few flipped bits, and now your database record is able to summon some Sumerian chaos deity, or whatever 🤭...
BPDunbar@reddit
The last known cuneiform tablet is a Babylonian tabled concerning astronomical events in 75 CE. So It's fairly precisely dated to 1950 years ago.
Counterpoint-RD@reddit
Wow - okay, that's much more recent than I'd ever thought possible... Sounds like one guy watching stars was going, "Astronomy just isn't made like it used to - let's go back to the roots...", like some scientist today writing his papers in Latin 😄👍...
ferky234@reddit
Who do I complain to about some inferior copper that I received ?
KelemvorSparkyfox@reddit
I, for one, welcome our ~~new~~ old Babylonian overlords.
anubisviech@reddit
I know this as "Folder/File X has vanished!"
- No, my smb log shows you moved it into a folder below, like the last 5 times you asked for a missing File/Folder.
Sirbo311@reddit
All the time with email folders. I just pull up the folder structure in exchange... "By chance, did you look in for XYZ?"
NotYetReadyToRetire@reddit
At one employer, close to half of my job was tracking down missing folders after yet another untrained user unknowingly did a drag/drop into another folder.
The argument over training always came down to "What if we train them and they leave?" with no consideration of "What if you don't train them and they stay?" - which is what many of them did.
robsterva@reddit
Clearly, that place had bigger issues than training...
C_M_O_TDibbler@reddit
I would like to point out this is entirely possible, see horizon post office scandal
KelemvorSparkyfox@reddit
The most egregious programming error that I saw come out of the enquiry was that the EOD process locked up a key part of the communication process for something like 10 minutes, while the sub processes that tried to write transactions to it timed out after 10 seconds. As the trx IDs were generated by the locked part, there was no gap in them to show that any trx had been dropped. (Frankly, that any new trx could be generated during the EOD process is another major WTF on the part of Fujitsu.)
frac6969@reddit
This just happened to us last week. User complained that the exchange rate for an order got randomly changed. We pulled logs and proved that they changed it.
User was still arguing. I looked at the order and discovered that they must’ve looked at the order number and mistook that for the date. I showed the order to the user and they pointed right at the order number and said, “See, I used the right date.”
Mr_ToDo@reddit
"The program changed the location of the date"
kagato87@reddit
It's frustrating how users try to blame the software.
10 times out of 10 a problem in the data is something a user did. The audit logs are so you can determine WHO made the mistake.
I feel sorry for anyone with users who blame the computer.
Computers are perfect. The do EXACTLY what they are designed, programmed, and instructed to do. And like the last six times, it was YOUR user who changed that setting, of failed to submit...
Sceptically@reddit
Not so much. Significantly better than the users, of course, but that's not saying much.
kagato87@reddit
But those are design and engineering flaws!
They have been remarkable stable lately, at least as long as you aren't stuffing your racks with white box, Lenovo, or no -redundant basics.
ryanlc@reddit
Stupid shit like this is why my team and I (I manage the cybersecurity team) REALLY push back on shared accounts. We get the request for them all the time.
There are still a few in our systems, because of stupid developers. But those few are the impetus behind users asking for more. Me and the CISO, my boss, keep telling them 'no' for reasons just like this
And the team that creates accounts has figured out to not create them until we approve them (which we won't).
AlternativeBasis@reddit
Yep, a system I participated in creating had some extra breadcrumbs:
Records were never deleted, only inactivated, and the user/role that had deactivated was recorded.
Each record included had a 30-digit primary key, where the first 20 digits referenced the user/session/location that inserted the record. Hardcoded in a way that programmers couldn't get around. Ever.
Certain super-ultra-secretives records had an extra access log, without relatory or access code. Only the DBA could see the table.
Able-Stretch9223@reddit
I'm currently battling an outside accountant trying to make every account as generic as possible and each time I think she understands it's yet another meeting with the CEO explaining why this is a seriously stupid idea.
The_Great_Chen@reddit
I loved it when audit tracking worked. But then I found out the dates and times changed by time zone and/or may be corrupted other ways. Trying to figure that out was a headache.
alfredpsmurtz@reddit
I added some audit code for the same reason. "The container just disappeared from the system" No you deleted it on xxx date...