Best way to limit total memory used by all users on a shared multi-user system

Posted by pi_epsilon_rho@reddit | linuxadmin | View on Reddit | 16 comments

Our site has many CentOS7, Rocky8/9 linux systems that are shared by many users concurrently via ssh login for random interactive uses. Many of these are large 128GB+ desktops at one person in a a groups desk where that person logins in person but many other users in the group SSH in to that desktop to run various analysis programs and development.

Anyway, one thing that happens a lot is one user will run a MATLAB or other program that consuses all the RAM in the box slowing it down to a crawl for all others. Eventually the kernel implements its OOM procedure. However, many system processes, though not killed by the OOM procedure get in a stuck non-operating state.

One of these is SSSD the main account services daemon which does not recover and then prevents any new logins and hangs other processes on things like user name/id lookups. One can restart sssd to fix it but one cannot ssh to the box or even login locally to do this. So most of the time we have to hard powercycle the box.

One attempt I made at "fixing" this was to create the following rsyslog configuration in /etc/rsyslog.d/oom-sssd-restart.conf

:msg, contains, "was terminated by own WATCHDOG" ^/usr/etc/sssd-restart.sh

as one usually sees that message in /var/log/messages when sssd gets in its hung state but this has only worked about 50% of the time

Ultimately, I want to make sure that 4GB or so of the RAM of each system is reserved only for system processes (UID < 1000) or just limit RAM to 96% of the systems ram to users with UID > 1000. Is there any simple and accepted way to do this? I am NOT looking for a per user memory limit via the /etc/security/limits.d/ system. That does not work for what I want.

One thing I am looking at is using cgroup slices and running

systemctl set-property user.slice MemoryHigh=120G

for example on a 128G system. It is unclear to me if this requires cgroups v2 meaning changing GRUB on all boxes to have kernel paramater systemd.unified_cgroup_hierarchy=1 and rebooting them.

BTW, I do use SLURM on a HPC cluster and consider that a too heavy handed and difficult solution for an interactive user desktop shared by users where local GUI login is used.