Cannot spawn processes. Best way to shut down?
Posted by TheLinuxMailman@reddit | linuxadmin | View on Reddit | 8 comments
My Ubuntu 20.04 server is in an odd state. I cannot execute any command:
-bash: fork: retry: Resource temporarily unavailable
I can echo * (shell builtin) and see file names.
This is in a bash I previously ssh'd into, which has root. Ya, I'm one of those people who likes to keep root ssh open (sudo -i) for root commands I am frequently doing right now, in addition to ordinary user shells.
I am fairly certain I have free disk space on /.
Postfix is still running and receiving and storing mail, which I can see on my alpine on my logged-in user account shell. Both were running when this no-fork situation started.
What steps can I do next with my constrained situation before pressing reset? FS is ext4 on RAID1, so I don't expect anything worse from that than a RAID resync, maybe.
I guess I could disconnect the network and let the FS caches flush before rebooting. How long?
What can write I write to in /sys from the open shell that will shut down more gracefully and/or flush caches just before resetting?
Finally, any idea what is going on?
CyberKiller40@reddit
Raising Elephants Is So Utterly Boring 😉🐧
michaelpaoli@reddit
Probably process table full.
If you've got shell as root, great, that makes it much easier.
But yeah, if you consistently can't fork any processes, you'll probably want to do the cleanest shutdown (or reboot) as feasible under the circumstances ... and that means bypassing pretty much anything that would attempt to fork - as that may just hand the shutdown attempt indefinitely.
So, e.g.:
cd / && exec halt -f -f
Can use reboot instead of halt, if desired. Check the relevant man page for your distro, as, depending upon command(s) installed, and init system, the particular halt/reboot commands provided may vary somewhat in their syntax and behavior. Most (all?) Linux versions of halt(8) and reboot(8) command will by default do (or at least attempt) sync before halt/reboot.
One can also use sysrq as others have commented and pointed to relevant further information, so I won't duplicate that info here, but that's another approach - that can also be useful if/when one has console access, but can't even get login (at least if the system is configured to allow such from console).
Also, even without shell or login capability, having a peek at console may be quite useful - it may well be spewing diagnostics about the fork failures - most notably is it because the process table is full, or is it being caused by some other critical resource exhaustion (e.g. out of RAM - though that would typically have a somewhat different set of symptoms along with the fork failures).
stormcloud-9@reddit
You've may done something to launch a bajillion processes, and have exhausted some sort of limit (ulimit, or pid_max). You could also have exhausted the available open file descriptors.
There are ways to investigate this since you have an open shell. But they're not simple to describe.
If you have some sort of lights-out management, use it to send a CTRL+ALT+DEL.
If not, you can try
kill -INT 1
. This will cause systemd to gracefully reboot.If your system is borked to where that doesn't work, your next option would be sysrq:
If you don't have sysrq, then you're pretty hosed.
nospacebar14@reddit
Just for my own learning, what does that block of commands do?
TheLinuxMailman@reddit (OP)
I had no idea either so looked this up. Neat. I was unaware of all these options.
Linux Magic System Request Key Hacks
TheLinuxMailman@reddit (OP)
Fantastic detaii. Thanks! Because of other activities that delayed me today I'm going to delay this to tomorrow AM so I don't have to do messy restoration work late today, if it arises (unlikely I think).
After I posted I was remembering that I could kill init/1 and maybe things would nicely shut down, but I was away and could not remember if kill was a shell builtin.
mriswithe@reddit
Based on it being a mail server and unable to start processes, my bet is you are out of open file handles. This means you are not keeping up with postfixs needs. Whether disk IO or CPU or memory, something is behind and hard enough your machine is super hosed.
For a clean way to restart it? No idea. I would probably poke the power button once and see if Linux starts shutting down or if it is too hung. If it is too hung, I would yank the power and eat my failure and try and recover.
TheLinuxMailman@reddit (OP)
Thanks. You helped me remember that I think the power button does cause an even on a short press, not a power off. So I'll try that on Thursday first thing, when I am fresh and have the day to deal with any worse outcome after a reset.
I've had to do that very occasionally and at worst the RAID had to resync (and maybe I had some detached tempo files) so that will definitely be my fallback. Thank you.