Network outage in the mornings
Posted by Finn_Storm@reddit | talesfromtechsupport | View on Reddit | 12 comments
The last two posts reminded me of a continuous network outage we had at one of our customers sites. It initially wasn't my problem, but decided to help out because of its stubbornness.
Customer comes in (after like two weeks, because why would you want to speed things up) and says their C&C machines lose Internet in the morning, from startup until anywhere from 15 minutes to 5 hours later. No other devices had this issue either.
Colleague didn't trust the small desktop grade switch it had, and replaced it with a new one, but this didn't solve the issue. We discuss with the vendor for a while, but they don't want to come onsite to troubleshoot with us and they can't remote in while the problem is occurring.
At this point I step in having trusted that my colleague has done the basic troubleshooting steps, which will come back to bite us later. Perhaps the internal nic of the machine is defective so we use a USB nic adapter, unsuccessfully.
I also setup an iperf/pingplotter kit and come across some wierd values. The network will come back online for 6 seconds every minute like clockwork, but this isn't enough for windows (or the application) to realize Internet is back up and running.
Okay, so something is definitely going on with the network. I rack my memories and recall we had an external contractor call us two months before if we had an issue with one of our AP's at this site (the answer was yes), so I called them up and asked what they did that day.
After a lot of back and forth, I learn that we had contracted them to install a switch and two AP's in/near a conference room. Now, normally this isn't a problem, you'd say right?
Wrong. Every day, this company turned off the main breaker to the production machines. And because the contractor pulled a cable from one of the C&C machine switches (instead of the core switches), it would cause the newly installed switch and AP's to lose Internet connectivity and establish a new one via mesh.
The switches and AP's we have are not smart enough to release a mesh connection if a wired connection appears again, so this would make a loop. Disabling mesh instantly fixed the issue, even though it caused a network disruption late in the day for the conference room.
Hours spent fishing for red herrings and talking to managment: 32 ish
Hours spent actually fixing the issue: 0.5
Hours spent trying to talk some common sense in my colleague and myself to check the basics first: infinity + ongoing
fatty1179@reddit
Repeat after me, mesh networks do NOT mean your devices can roam from one ap to another
Lowe-me-you@reddit
True, mesh networks can be tricky. just because they're marketed for seamless connectivity doesn't mean they'll handle every scenario gracefully, especially with wired connections involved...
anubisviech@reddit
I know a few suppliers for personal routers who mix that terms up on purpose, because they expect the customer to NOT have a line between APs.
Finn_Storm@reddit (OP)
Yeah I'm aware what it is, I'm just not the primary engineer for this customer so I didn't have the chance to disable it outright
asmcint@reddit
Technically correct but most modern access points support Fast Roaming now, so that technicality is becoming less relevant over time.
jobblejosh@reddit
Just trying to work out, do you mean CNC (Computer Numerical Control) machines, not 'C&C'?
Reason being they sound very similar and I've never heard of C&C before in a similar context.
Moneia@reddit
How they going to keep their Command & Conquer rankings if they can't logon first thing🙹
K-o-R@reddit
NOT EVEN NETWORK OUTAGES CAN STOP KANE!
Finn_Storm@reddit (OP)
Yeah, I guess I forgot they are spelled cnc, but basically sound the same.
Whoopsie
Id10t_techsupport@reddit
Try putting a bandaid on it and Continie to add more hardware for stacks and move cables to a new subnet to fix the problem
ascii122@reddit
that's like when I got called out because my solar powered wifi repeater on this farm failed. I get out there and welp.. the grape leaves grew over the panel
InfiltraitorX@reddit
Disabling mesh connection should always be done unless specifically needed