Why I can never be a sysadmin; or, Why is software like this?

Posted by OnTheEdgeOfFreedom@reddit | sysadmin | View on Reddit | 85 comments

This is not a very serious post; I'm just screaming into the void and hoping a few laughs and nods echo back; though there is a serious question at the end of it all. Below is an email I sent to my friends at 5am, after I spent all night getting a linux laptop running again. Of note: I know what I'm doing when I write code, but I'm completely useless at systems administration. My palms sweat if I need sudo for anything. I cringe at touching config files. dpkg? I don't do drugs, man, keep that hard stuff out of my life...

Without google I'd never be able to maintain anything. So when my laptop boots and there's not even an option to connect to the network... I'm sure you guys all nod and know exactly what happened, but I didn't, and while there's humor in trying to resurrect a laptop on Easter morning, it's not the kind of humor I like at 3am.

My email to my friends follows. Intended for humor but please consider the question at the end: why is it even like this? We've has OSes for 50+ years, and this happens?

---

I remember an old "Peanuts" quote: I love humanity, it's people I can't stand.

While I agree with that, I have my own version: I love programming, it's computer systems I can't stand.

I bought a new cell phone recently, because if you live in Costa Rica you need a Costa Rican phone number to do anything, and I didn't want to give up my US number, so yeah. I got something Samsung/Android based, cleaned off all the crapware games that immediately started nagging me to play them, got it all set up... the very next day, it died. Black screen no matter what I tried, but I could still wave the phone to turn on the flashlight so I knew something in there was working. I just couldn't use it. On new hardware? Why?

Tonight I thought I'd wind down from the game with some music, and fired up my laptop because for just music I don't need the full tower system.

Hm, no internet. Starlink glitched again?

But Starlink was working fine... hm, no list of available wifi. In fact no option to show the available wifis

What?

I plugged in the ethernet cable. Nothing. I plugged in the apple phone for a hotspot over USB.

Nothing. How is this possible? The laptop's been working fine for days.  I didn't do an update. How can so much hardware fail at once?

Google time (on the tower system because the laptop clearly wasn't going there). lsusb, lspci... the hardware is there. Searching for other causes.. no, I'm sure the drivers are fine, I didn't update anything. 

Wait. Where did the drivers go

Modprobe. Nothing.

Half the system is missing. Disk failure? I mean my wife's tower has a dying disk, maybe it's contagious. Run badblocks. Crunch crunch crunch...

Disks are fine. My personal files are all there. The disks are ok, so...?

More google. All it's coming up with is some sort of failed update. Which I know I didn't do because I have an unholy dread of updates. Ok, let's look...

The last update happened... 3 days ago?! Without telling me!? And based on the file sizes, it ran without completing, probably when the battery died, because initrd is a fraction of the size of the last good version.

Try to reboot into grub so see if there's an option to boot into the previous version. There should be. Maybe there is, I'll never know. It's about impossible to time the keypress right to get into grub, and when you do get in it freezes as you type commands. Mid-command, before you hit return. Ten or so cycles of reboots, nope...

I'm not sure why there's not a simple command to say "I don't care, delete the current OS and go back to the previous one." But apt wasn't working, and it's now 3am. Google kept lying. Fail. Fail. Fail.

In the end I had to make a rescue disk. It turns out that rescue disks don't have a tidy command to move the OS back either. More Google. You have to mount a handful of different directories, and what is chroot anyway, and then modify root's path, and in the end apt-install purge still doesn't work and you end up taking a sledgehammer to things with dpkg --remove --force-all. And don't forget to reconfigure grub because dpkg isn't your nanny, even if I need one.

Finally, reboot... oh look the internet is back. 5am. I can see the pre-dawn light out my window.

I've been using Linux for years. I remember the untimely birth of Windows, 40 years ago. And I know the horrid truth about them: Neither of them are yet ready for primetime.

Fundamentally, no system should ever boot into an incomplete install. There should be a pointer to the active install and it shouldn't be moved to a new one until the install finishes cleanly and passes some sort of self check. Roughly speaking, the failed updated was like putting a pie in the oven before you put the pie together; it makes no sense. But no, grub just looks for the highest version number and has no idea what's valid or invalid. Oh, it doesn't work and the commands to change things fail? Sucks to be you, pathetic userland victim.

So now I've discovered the unattended-update daemon and taken a sledgehammer to that too, because I never want a machine doing stuff behind my back.

WHY is it like this? 50+ years of OS development and all we have is systems that can't survive a low battery?

I'm going to bed, annoyed.