Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.

Nginx socket() failed (24: Too many open files)

Published on May 9th 2019


Our monitoring informed me about a HTTP 500 error from a central reverse proxy running with Nginx. Checking the error logs revealed the following issue:

2019/05/09 08:43:35 [crit] 25655#0: *524505514 open() "/usr/share/nginx/html/50x.html" failed (24: Too many open files)
[...]
2019/05/09 09:04:27 [alert] 28720#0: *59757 socket() failed (24: Too many open files) while connecting to upstream,

This basically means that the Nginx process had too many files open, which could also be checked on the Nginx status page. Here the graph from check_nginx_status.pl:

Nginx stats

The default is set to a limit of 4096 files per (worker) process, which can be seen in /etc/default/nginx:

# cat /etc/default/nginx
# Note: You may want to look at the following page before setting the ULIMIT.
#  http://wiki.nginx.org/CoreModule#worker_rlimit_nofile
# Set the ulimit variable if you need defaults to change.
#  Example: ULIMIT="-n 4096"
#ULIMIT="-n 4096"

However don't be fooled. Changing this file doesn't help. Instead this needs to be set in /etc/security/limits.conf:

# tail /etc/security/limits.conf
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4

# Added Nginx limits
nginx       soft    nofile  30000
nginx       hard    nofile  50000

# End of file

Here a soft limit of 30k and a hard limit of 50k files are defined per nginx process.

Note: I tried this here with www-data first (the user under which Nginx runs), but this didn't work. Although a user name could be used as a "domain" in this config file...

Additionally Nginx should be told how many files can be opened. In the main config file /etc/nginx/nginx.conf add:

# head /etc/nginx/nginx.conf
user www-data;
worker_processes 4;
pid /run/nginx.pid;

# 2019-05-09 Increase open files
worker_rlimit_nofile 30000;

After a service nginx restart the limits of the worker processes can be checked:

 # ps auxf | grep nginx
root      7027  0.0  0.3 103620 13348 ?        Ss   09:21   0:00 nginx: master process /usr/sbin/nginx
www-data  7028  8.6  1.0 127900 40724 ?        R    09:21   2:37  \_ nginx: worker process
www-data  7029  8.9  1.0 127488 40536 ?        S    09:21   2:44  \_ nginx: worker process
www-data  7031  9.5  1.0 127792 40896 ?        S    09:21   2:53  \_ nginx: worker process
www-data  7032  8.1  1.0 128472 41244 ?        S    09:21   2:29  \_ nginx: worker process

# cat /proc/7028/limits | grep "open files"
Max open files            30000                30000                files     

The "too many open files" errors disappeared from the Nginx logs after this change.

But what did cause this sudden problem? As you can see in the graph above this Writing (and Waiting) connections suddenly sharply increased. It turned out that an upstream server behind this reverse proxy did not work anymore and this particular virtual host received a lot of traffic, causing general slowness and holding files open while waiting for a timeout from Nginx (504 in this case).

Different solution when using Systemd

Update: February 1st 2021

The above fix was written two years ago and was working fine on a system without Systemd as init system. However when Nginx is started and controlled by Systemd, the limits defined in /etc/security/limits.conf seem to be ignored. Instead Systemd applies its own default limits. See Fredrik Averpil's blog post for additional info.

This can be nicely verified. An unlimited nofile limit was defined for multiple domains in /etc/security/limits.conf to see which would be applied to Nginx's processes:

root@nginx:~# cat /etc/security/limits.conf
[...]
# Added Nginx nofile limit
nginx       soft    nofile  50000
nginx       hard    nofile  80000
root       soft    nofile  unlimited
root       hard    nofile  unlimited
www-data    soft    nofile  unlimited
www-data    hard    nofile  unlimited

But after setting worker_rlimit_nofile in nginx.conf and a restart of Nginx, the limits still exists:

root@nginx:~# ps auxf 
[...]
root     21114  0.0  0.4  64508 18264 ?        Ss   09:36   0:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data 21115 40.0  0.6  70340 26880 ?        R    09:36   0:01  \_ nginx: worker process
www-data 21116  2.3  0.6  68888 25252 ?        S    09:36   0:00  \_ nginx: worker process
www-data 21117  7.0  0.6  68888 25376 ?        S    09:36   0:00  \_ nginx: worker process
www-data 21118 16.0  0.6  68888 25196 ?        S    09:36   0:00  \_ nginx: worker process
www-data 21119  0.0  0.5  68888 21312 ?        S    09:36   0:00  \_ nginx: cache manager process
www-data 21120  0.0  0.5  68888 20912 ?        S    09:36   0:00  \_ nginx: cache loader process
[...]


root@nginx:~# cat /proc/21114/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             15598                15598                processes
Max open files            1024                 4096                 files     
Max locked memory         16777216             16777216             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       15598                15598                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us       

Even the nginx master process, executed as root, still has a file limit of 1024 (soft) and 4096 (hard).

This obviously causes errors when Nginx needs to open new sockets or file handlers and the error log can contain events like this:

2021/02/01 09:28:13 [emerg] 28935#28935: open() "/var/log/nginx/example.com.access.log" failed (24: Too many open files)

To solve this, the limits must be changed in the Systemd service unit configuration for Nginx. The quickest way to do this is to copy the original Nginx service unit and add the LimitNOFILE option:

root@nginx:~# cp /lib/systemd/system/nginx.service /etc/systemd/system/
root@nginx:~# cat /etc/systemd/system/nginx.service
# Stop dance for nginx
# =======================
#
# ExecStop sends SIGSTOP (graceful stop) to the nginx process.
# If, after 5s (--retry QUIT/5) nginx is still running, systemd takes control
# and sends SIGTERM (fast shutdown) to the main process.
# After another 5s (TimeoutStopSec=5), and if nginx is alive, systemd sends
# SIGKILL to all the remaining processes in the process group (KillMode=mixed).
#
# nginx signals reference doc:
# http://nginx.org/en/docs/control.html
#
[Unit]
Description=A high performance web server and a reverse proxy server
Documentation=man:nginx(8)
After=network.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t -q -g 'daemon on; master_process on;'
ExecStart=/usr/sbin/nginx -g 'daemon on; master_process on;'
ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reload
ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
TimeoutStopSec=5
KillMode=mixed
LimitNOFILE=500000

[Install]
WantedBy=multi-user.target

Note: A more proper solution is to create a service sub-directory (/etc/systemd/system/nginx.service.d) and append the LimitNOFILE option into a single config file with a [Service] section.

After another Nginx restart, the new limits can be verified:

root@nginx:~# ps auxf|grep nginx
root     26732  0.0  0.0  14428  1012 pts/0    S+   10:03   0:00                      \_ grep --color=auto nginx
root     21636  0.0  0.4  64508 18260 ?        Ss   09:37   0:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data 21637  0.1  0.6  68888 25368 ?        S    09:37   0:01  \_ nginx: worker process
www-data 21638 10.6  0.7  75440 32228 ?        R    09:37   2:42  \_ nginx: worker process
www-data 21639  1.6  0.6  69608 26276 ?        R    09:37   0:24  \_ nginx: worker process
www-data 21640 29.0  1.1  87836 44740 ?        S    09:37   7:22  \_ nginx: worker process
www-data 21641  0.0  0.5  68888 21368 ?        S    09:37   0:00  \_ nginx: cache manager process

root@nginx:~# cat /proc/21636/limits | grep "open files"
Max open files            500000               500000               files    

root@nginx:~# cat /proc/21637/limits |grep "open files"
Max open files            500000               500000               files    

Or even quicker, without having to manually find the PID of the Nginx master process:

root@nginx:~# cat /proc/$(pgrep -u root nginx)/limits|grep "open"
Max open files            500000               500000               files 

The limit, configured in Systemd's service unit file for Nginx, is applied for both master and worker processes.