Our sysadmins have setup a user escalation script, which there's nothing wrong with. It does some prechecks, sudos a script under the requested user, does some logging, then does an
exec <configured shell>. When there are too many processes, it manages to run the 2nd script as the user, however the exec command blocks, it never gets to the profile script to execute the ulimit command.
I traced it down to the default soft limit for the number of process for all users to 1024.
Our process count is way above that, which means when it tries to create a new process for bash when we run the sudo script, it can’t. So we are unable to sign into the account.
As the profile scripts never get run, this one can't be solved at the user level.
I started doing some searching around, in
* - nproc 31768I found if I specifically added a user to that file then it would work, i.e.:
serviceaccount - nproc 4096
This serverfault/stackexchange disccussion highlights the problem. This redhat bug request from 2008, shows that they requested this file
/etc/security/limits.d/90-nproc.confbe added with this setup:
* soft nproc 1024 root soft nproc unlimitedWhich shows where it comes from and why manually specifying it in
/etc/security/limits.confworked (the more specific rule won).
The soft limit is 1024 for all non-root users but their hard limit is 31768, which means it's initially limited to 1024 until a shell raises it's own limit. All other processes that were started without a raised limit are unable to create more processes, including our bash shell invoked during our sudo script.
So for power users' systems you'd need to change
/etc/security/limits.confto at lest 4096, Oracle apparently recommends this for java. For a service account with multiple app servers 8192 is probably safest.
In our profile script I output the unlimit and the current process count so we know if we're getting close to the limit.