The disappear fault
Posted by Ndog4664@reddit | talesfromtechsupport | View on Reddit | 19 comments
Time for a couple more badly written stories, words are hard and I never went to college. You get what you get.
Get some doordash, maybe some Adderall or whatever your vise is and enjoy
My job is tech support related but not directly. I work on anything from servers, networking, to automation(belts, motors, bearing) and PLCs. I'm a jack of all trades and definitely a master of none.
:The disappearing fault
So one day Operations calls us due to an output module fault. It looks like 7 modules lost communication. Well we first check the com cables, 2 40pin cables that create a loop for 4-19 modules. They seemed fine but Admittedly avoid these cables because I hate them, bulky, bad retention mechanism, and likely to have more problems just from touching them. All the cables for controlling coms, gates, actuators, and safety loop go to a backplane that slots into the main control PCB. So we replaced the main PCB, nothing happens except for even more faults. Then we got a second one, kinda worked just different faults this time. So we got a third one, most faults gone except one but communication is back for everything. At this point I called a remote SME, system matter expert. Who says to swap the board with another module to see if the fault follows the main board or stays with the module. One problem, it does nether, it just disappeared. Doesn't make sense to me but it's gone and the machine works.
Main lesson learned, just expect all your parts are bads.
:When the SME is wrong
So Operations calls about a machine intermittently stopping for a safety loop fault that never calls out where the fault is in the machine. The machine will act good when not processing but after 30s to 5m it will fault out. We arrive and start looking at interlocks but couldn't find anything. We keep pressing start till we get the fault to show up and not immediately disappear. We checked a 24v safety aux contact attached to a relay, even though it didn't test bad it's so common we replaced it. After checking every interlock we can't figure it out. So we call the remote SME. One piece of info I did have to diagnose the issue is the aux contact has no power. Letting the SME know this he first says to replace it, I told him I already did, also told him I'm not trained on the equipment though so I'm at a lost and need schmatics. He emailed me the schematics and also wanted me to follow the schematics after the aux contact which didn't make sense to me because it wasn't getting power. I felt I should trace where the power comes from and see where I get it back. So I lied to him that I would then with the self confidence of a stupid person, I did my own thing. Found power going into the safety loop at the breakers, the breakers have the ability to tell the computer if they're tripped, but not coming out. Started shaking the connectors on the back of each one until I found one that would make the machine go ready and not ready just from wiggling it, yes I wore gloves if anyone from OSHA is asking. After the machine was down for 5 hours, one aux contact and one breaker later it was fixed.
To explain what was happening, the machine vibrates when running due to motors, bearing and belts. This vibration would cause the tab inside the breaker to disconnect momentarily causing the machine to stop due to the safety loop opening.
:when the senior tech doesn't compute
So on day shift they had a machine go down due to the output modules not communicating. The senior tech(for day shift) found the module not communicating and replaced the board but still wasn't able to fix it. Even shift with the 2 most senior techs out of everyone refused to touch it. Finally when i came in on night shift and was questioning why we had a machine down i decided i would look at. Still not trained but from knowledege of last time I asked if anyone configured the board. All evidence pointed to no. So one call to the SME for the document on dip swich configuration and crawling inside the machine later the machine worked...
Unsure if anyone else shares my frustration but fixing stuff more experienced and trained people shouldv'e give mangement unreasonable expectations of your ablities. I love solving problem, i don't love being put on a pedistol.
Btw the down time of that machine probably cost $150k-300k
:how to solve a random persons problems from 500 miles away.
So the techs at my company have a facebook group for memes but also for help when SMEs are no help.
A person in another state posted they have had a machine down for over 7 days. The machine would only fault out if you tried to run it. With the fault being a communication fault from the operator PC to the on site OCR server, Optical Character Recognition. The issue was they could ping the server, and PC and server would show connected in their respective software. They even ran a new cable from the switch to no avail. I guess no one on site or the SME thought to actually see what the switch was reporting. I had access to see the monitoring of every facility just not make switch configurations. I was bored and looked them up and saw a ton of errors. The port was configured correctly, so most likely bad port.
So I messaged the guy. We got me =me, tech= guy from that facility, and supervisor = his boss
Me- hey, i saw your FB post i think it's the switch port
Tech- we are going to reboot again
Tech- I'm going to make a group chat with my supervisor
---new chat---
Me- hi, i think this is an issue with the IDF switch, do you guys have anyone with cisco CLI training and log in.
Supervisor- i think so but he hasn't logged in awhile
Supervisor- SME says to check switch at machine, we replaced it but that didn't work. SME now says to replace IDF switch
Me- before stopping all operations lets just try another port
Tech- we need a ladder
Me- i see the switch lost power recently, did you guys have a power outage.
Supervisor- actually yes, thats when the problems started
Me- please take the cable from port 6 and plug it into port 8
---Note, port 6 is for the machine having problems, port 8 is for a machine that is working
---23 messages and 4 hours later of being ignored
Me- please take the cable from port 6 and plug it into port 8
Supervisor- that worked we think port 6 is bad
Me- plug the cable ftom port 8 into port 6 and see if it faults out too
Supervisor- it does
Me- that comfirms 6 is bad, have your tech open and cofigure another port and label port 6 as bad.
Supervisor- thank you!
Moral of the story sometimes you need to repeat yourself i guess. Still working on being assertive.
On the plus side this interaction helped me pass the interview to become a SME, just waiting for an open postion.
:the normal tech support call
So us machine techs are only supposed to fix anything related to machinery and their functions "processing infrastructure side". We consider anything not related "Lan side"(printers and supervisor computers).
One problem, one onsite "Lan side" tech covered like 6 plants almost all 120 miles apart. They could drive 5 hours for one call and responce time is like 2 weeks.
Due to how over streched this guy was, even though he didn't want my help, and my interest in tech I would help when i can. It was against the union contract but keeping the bosses happy was in everyones interest. I mainly would just help with printer problems and was well known by management for solving printer problems. After the print server/directory failed i was the only one to get the printers working while we waited for it to get fixed. Anyways here's the story.
-Over the dispatch radio
Supervisor- hey OP can you help me with the printer by machine 9
Me- On my way
---i arrive stage left
Supervisor- I can't get the printer to print, i think it's broken
Me- please bring up what you're trying to print
Me- press print or ctrl p please
Me- can you select the printer labeled "printer by machine 9" please instead of "print to pdf"
--- exit stage right as it starts printing ---
When i was asked to work on their networking though i would say, "only if you can provide a network diagram/topology" . I perfectly well knew they couldn't because they never made one for their side of the network. Their network closet was an actual birds nest. Like you had to walk on the cables to get to the rack, like the rack looked like vines covering a tree and all the walls. There was more un used cables ran in there then used ones. Patch panels, what patch panels. Idk how it looked like that for only having lile 8 switches, 2 firewalls, and 2 routers.
Grammerly broke like half way through this so sorry not sorry.
Mikotos@reddit
My "favorite" is when they call you out to a tool and they're all like " WE'VE TRIED EVERYTHING" (but yet surprisingly nothing) and so you press reset on an HMI and the tool runs just fine for the rest of the day.
androshalforc1@reddit
The flipside is the IT aura. I’ve had machines that have given an issue, ok it’s happened before, just need to reset. One reset later still issue, ok try again, still issue, check the manual, manual says reset, follow manuals step by step reset procedure, still issue.
Call tech over, reset machine while they are on the way, tech walks into room, machine boots up no issue.
Ndog4664@reddit (OP)
I hate that. It's like i swear there was a problem
Engineer_on_skis@reddit
I'm not IT*, but savvy. I can frequently can summon that aura. "Before we call IT, come over here EoS." They try it again and it works.
mafiaknight@reddit
The machine saw the tech
"!"
And got its shit together
Ndog4664@reddit (OP)
We had a tech who loved percussive maintenance
Ich_mag_Kartoffeln@reddit
A former colleague once encountered a printer issue. It wouldn't print a certain document. Didn't matter who tried to print it, or from which computer -- it would not print this particular document. It would print anything and everything else, but not this document. This critical document.
So Brian grabbed a spanner (the sort of spanner you need two hands to lift) and vented ALL his frustrations upon said printer. And since then, he has NEVER had a printer issue.
It is said that printers will refill their own paper trays in terror at the sounds of Brian's approaching footsteps....
Ndog4664@reddit (OP)
I'm not sure if it's malicious or ignorance but sometimes the operators will press off instead of start. It's not immediately apparent to them that they turned it off because the PC is on a different circuit, i guess. All they need to do is press the ON button
meitemark@reddit
OFF stands for On, Function Fine. ON stands for Off, Negative.
UristImiknorris@reddit
You mean 3 light-milliseconds?
pockypimp@reddit
Oof that's how it is at my current job. It's an older site and they use bix for networking. Bix is a Belden setup that's far too complicated for it's own good. You go from the wall jack to a "cassette" where you punch up to 5 cables into a comb shape on one side. Then on the other you either have a something punched to terminate in a RJ45 to the switch or have a cross connect to a hydra which is basically an cat5/6 punched to a cassette and you use twisted pair to connect the two cassettes.
Somehow you get gigabit connection through this.
Ndog4664@reddit (OP)
Damn that doesn't sound fun. Luckily, they weren't that bad. I've definitely seen worse online. I've since moved, but if they would've given me a topology, i would've gone and cleaned the closet out of pure urge to do so. My side was very clean with printouts, Excel spreadsheets, monitoring, quarterly backups, and a common SOP.
pockypimp@reddit
Yeah that's what's odd. So previously I worked for technically the same company I work for now, just in a different operating unit. Local IT was... missing since it was a retail environment. So they tested having local techs and since it saved money they made it an actual position. Part of our duties back then was a couple of Excel spreadsheets that someone higher up had created. One had a switch diagram and jacks. The switch diagram had drop boxes so you could identify which VLAN was assigned to that port and it updated it on the sheet with the jacks. You were supposed to fill out both, identifying what was plugged into which jack and what port on the switches. Everything was cat 5 at the time, all terminated to patch panels. I mean I had been at the company since 97 working retail and they had the patch panels back then.
My current job is part of a different division in IT in what used to be a different operating group. The things I find here are so different and old compared to what the retail side came in with when they got bought it's astounding.
Ndog4664@reddit (OP)
So i moved to a new facility, but i think the same applies. With the machine processing side, they teach a 5-week course on SOP, documentation, and how to do the job. As far as i can tell, they don't get that. But their job requires IT certs, and mine doesn't super confusing. From what i was told is anytime they had an issue, they would just run another patch cable without removing the old one because it was too tangled, making the problem worse. I think he's just spread so thin he doesn't have time to do it properly
opschief0299@reddit
Great job, your writing is just fine 🙂
kg7qin@reddit
My favorite is someone is supposed to have been on machine X all day, and then about 3:15 the lead/supervisor comes to find you because they can't load a program. The controller has been offline for maintenace/upgrades so it didn't have anything loaded, but now finally they are getting around to loading a program after about 9 hours of being on said machine.
It has been a 50/50 mix of them either being familiar with the controller (there are several differnet ones in use) or the Quinx box needing to be rebooted/power or network cable pulled and reseated.
Ndog4664@reddit (OP)
Ours are pretty simple. Everything is controlled by software on a Windows xp pc. Easy to use, gui, where they select and load the program they need for the operation they are trying to do. Not that the operator won't just purposely break the machine to go on a long break, though. It is convenient that the software reports Everything to a monitoring system so we can see what's happening remotely.
Ndog4664@reddit (OP)
Ours are pretty simple. Everything is controlled by software on a Windows xp pc. Easy to use, gui, where they select and load the program they need for the operation they are trying to do. Not that the operator won't just purposely break the machine to go on a long break, though. Is convenient the software reports Everything to a monitoring system we can see what's happening remotely.
archina42@reddit
Actually - well written - quite interesting