If you are using Nagios as your Infrastructure Monitoring Tool, i am sure you have already seen or got the error “CRITICAL – Socket timeout after 10 seconds“.
What does “CRITICAL – Socket timeout after 10 seconds” error indicate?
This error does NOT necessarily indicate there is a problem with the host you are monitoring!
In most cases “CRITICAL – Socket timeout after 10 seconds” is a false positive alarm and means, that Nagios failed to get the reply from the host being monitored in certain amount of time.
By default Nagios “Socket timeout” is set to 10 seconds which means, if Nagios does not get the reply from monitored host in 10 seconds it will mark the Service as “CRITICAL – Socket timeout after 10 seconds“.
Sometimes System Admins are forced to use some Nagios customized plugins/checks, that take a bit more time to process and thus Nagios might produce/send false alarms like “CRITICAL – Socket timeout after 10 seconds“…
How to fix “CRITICAL – Socket timeout after 10 seconds” error?
We can fix this by increasing the “Socket timeout” value from the default 10 seconds to let’s say 20.
We can do this by adding a parameter to a specific command defined in commands.cfg file on your Nagios server. Commands.cfg file is usually located at /usr/local/nagios/etc/objects/commands.cfg (if you compiled Nagios) or /etc/nagios/commands.cfg (if you installed Nagios from RPM).
Read more about commands.cfg in my post “Nagios configuration – How to configure Nagios” post.
BEFORE (/usr/local/nagios/etc/objects/commands.cfg):
define command { command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ }
AFTER (/usr/local/nagios/etc/objects/commands.cfg):
define command { command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 20 }
There are also other commands that support the “-t” parameter! Be sure to add a “-t 20” parameter to the command you had “CRITICAL – Socket timeout after 10 seconds” problems with 🙂