Infiniroot Blog: We sometimes write, too.

Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our technical achievements and share some of our implemented solutions.

Monitoring Windows hosts with NSClient++: Service and system checks using NRPE and API

Published on September 27th 2021


When monitoring Windows systems with open source monitoring software, such as Nagios, Naemon or Icinga, one of the widest used solution is NSClient++ (or nscp in short). NSClient++ has been around for a long time; the first public version NSClient++ 0.0.1 (RC5) was released in 2005 (Initial SourceForge release) and has since become de facto standard for monitoring Windows hosts with open source monitoring tools. But in recent years the monitoring eco-system has changed and it's time to adjust.

"Why? What happened?" You might ask. This is what this article is about.

Created as Nagios Remote Service Check Agent

NSClient++, which is short for Nagios Service (Check) Client with the ++ hinting to its programming language C++, was initially developed as a Nagios agent for remote check execution. As the Nagios toolsets, including NSCA (Nagios Service Check Acceptor for submitting passive check results to the Nagios core server) and NRPE (Nagios Remote Plugin Executor for accepting active check requests from the Nagios core server and send back the results), were initially only developed for Unix-based and Linux Operating Systems, NSClient++ bridged the gap and allowed to run service checks on Windows hosts, too.

One of the most common ways to integrate Windows hosts is to use NRPE: The NRPE client (the check_nrpe plugin on the monitoring server) talks to the NRPE Server (which is part of the NSClient++ installation on the Windows host) and tells the server which check command should be executed. NSClient++ then internally runs the check (e.g. check_cpu) and returns the result as response.

This has worked fine for many years - however first a dead and then an evolved NRPE project has caused several problems with NSClient++. Let's dig deeper into these problems.

NRPE is dead. No, it's alive! Oh, it's dead again.

One of the major problems of NRPE was that the project was stuck at version 2.15 for a long time. NRPE 2.15 was released in September 2013 with no news whatsoever that a newer version would ever be released. NRPE was believed dead and even NSClient++ was referring to NRPE as legacy and insecure.

But then in 2016, NRPE re-emerged with new life and with a new project maintainer (John Frickson). NRPE was released as version 3.0.0 and a lot of changes happened between 2.15 and 3.0.0.

NRPE version 3.0.0 release

From the official NRPE 3.0.0 release post:

Not only was security addressed, which NRPE was highly criticized for, but also another major problem was tackled: The "payload size". The payload is the size of the response from the NRPE server (NSClient++ on Windows), also known as the packet buffer length or packet size (https://support.nagios.com/kb/article/nrpe-packet-size-explained-518.html). Before NRPE 3.0, the payload size was statically defined to a size of 1024. Responses holding a lot of data (including performance data) would be too big to handle and the check_nrpe plugin would just show "CHECK_NRPE: Receive underflow - only 1026 bytes received (4 expected)" or something similar. With the added -P / --payload-size parameter in NRPE and the possibility to set a higher payload size in NSClient++ using the "payload length" configuration parameter, large responses and lists (e.g. listing all services) can be handled.

Not long after NRPE 3.0.0 was released, development of NRPE picked up speed and more releases came out. In 2019, NRPE 4.0.0 was released and added support for TLS 1.3 and more fixes and enhancements. However in January 2020, a "deprecation notice" was added  to the NRPE repository, indicating that the development of NRPE is now officially over.

NRPE marked as deprecated

Only security fixes would be implemented in potential future releases. This basically means: NRPE is dead - again.

Using NRPE 3.x or 4.x with NSClient++

As mentioned above, the changes in NRPE 3.0.0 also introduced a higher DH key to improve encryption security (general rule of thumb: everything below 2048 bits is considered weak security). NSClient++ still uses a 512 byte key (nrpe_dh_512.pem) for NRPE communication encryption and can be found in the NSClient++ installation path in the security directory (usually C:\Program Files\NSClient++\security\nrpe_dh_512.pem).

NSclient++ NRPE DH default is 512 byte key

This now causes communication problems when trying to establish a secure connection between the updated client (check_nrpe) and the server (NSClient++):

$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com
CHECK_NRPE: (ssl_err != 5) Error - Could not complete SSL handshake with monitoring: 1

To handle this, a new DH key with 2048 byte size should be created and configured in NSClient++. See this article how to handle NRPE to NSClient++ communication error could not complete SSL handshake.

Additionally to the encryption changes, NRPE 3.x and 4.x both use the newer NRPE v3 protocol in the background. However NSClient++ never implemented NRPE v3 and only accepts the NRPE v2 protocol (named as legacy in NSClient++ documentations). Luckily the newer NRPE versions do support both NRPE v3 and v2 protocols. When running check_nrpe 3.x or 4.x against NSClient++ without additional parameters, first the newer NRPE v3 protocol is tried. The check plugin then detects that the server response uses NRPE v2 (Invalid packet version received from server) and automatically fails back to the legacy v2 protocol:

$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com
CHECK_NRPE: Invalid packet version received from server.
I (0.5.2.35 2018-01-28) seem to be doing fine...

To force the check_nrpe plugin to use the legacy protocol from the beginning, a new parameter -2 (introduced in NRPE 3.0.0) can be used:

$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com -2
I (0.5.2.35 2018-01-28) seem to be doing fine...

This way the warning "invalid packet version received from server" in the first line of the output is omitted - which would cause problems in the central monitoring user interface (usually a monitoring plugin's first line of output is shown in the monitoring UI).

Handling higher payloads with NRPE

NRPE has one major problem: It has a default payload size of 1024 bytes. Responses from NSClient++, which exceed this limit, result in an error. A good example to see this is to use the check_wmi check command and list all installed services on the target Windows host:

$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com -2 -c check_wmi -a 'query=select Name from Win32_Service'
CHECK_NRPE: Invalid packet type received from server.

Instead of cutting the response, the client (check_nrpe) receives an immediate error "Invalid packet type received from server".
Note: This is not the same error message as "invalid packet version"!

To handle this, the new payload size parameter (-P / --payload-size) can be used. Important here is to understand that this parameter only works in conjunction with the -2 parameter, using the legacy NRPE v2 protocol. NRPE v3 does support a payload size up to 64KB (see NRPE 3.0.0 changelog above) but as mentioned, NSClient++ does not support NRPE v3.

The payload size needs to be defined on both sides: On the client side in the check_nrpe command, and on server side the payload length needs to be defined in nsclient.ini within the NRPE/server block:

[/settings/NRPE/server]
; Extended Payload
payload length=4096

In this case the server (NSClient++) is configured and restarted with a new payload length of 4KB/4096 Bytes. By using this exact same size with the check_nrpe plugin, the big server response can now be handled:

$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com --payload-size=4096 -2 -c check_wmi -a "query=select Name from Win32_Service"
AdobeARMservice, AJRouter, ALG, AppIDSvc, Appinfo, AppMgmt, AppReadiness, AppVClient, AppXSvc, AssignedAccessManagerSvc, AudioEndpointBuilder, Audiosrv, autotimesvc, AxInstSV, BDESVC, BFE, BITS, BrokerInfrastructure, Browser, BTAGService, BthAvctpSvc, bthserv, camsvc, CDPSvc, CertPropSvc, ClickToRunSvc, ClipSVC, COMSysApp, CoreMessagingRegistrar, CryptSvc, CscService, DcomLaunch, defragsvc, DeviceAssociationService, DeviceInstall, DevQueryBroker, Dhcp, diagnosticshub.standardcollector.service, diagsvc, DiagTrack, DialogBlockingService, DispBrokerDesktopSvc, DisplayEnhancementService, DmEnrollmentSvc, dmwappushservice, Dnscache, DoSvc, dot3svc, DPS, DsmSvc, DsSvc, DusmSvc, Eaphost, edgeupdate, edgeupdatem, EFS, embeddedmode, EntAppSvc, EPWD, EventLog, EventSystem, Fax, fdPHost, FDResPub, fhsvc, FontCache, FrameServer, GoogleChromeElevationService, gpsvc, GraphicsPerfSvc, gupdate, gupdatem, hidserv, HvHost, icssvc, IKEEXT, InstallService, iphlpsvc, IpxlatCfgSvc, KeyIso, KtmRm, LanmanServer, LanmanWorkstation, lfsvc, LicenseManager, lltdsvc, lmhosts, LSM, LxpSvc, MapsBroker, MicrosoftEdgeElevationService, MixedRealityOpenXRSvc, MozillaMaintenance, mpssvc, MSDTC, MSiSCSI, msiserver, MsKeyboardFilter, NaturalAuthentication, NcaSvc, NcbService, NcdAutoSetup, Netlogon, Netman, netprofm, NetSetupSvc, NetTcpPortSharing, NgcCtnrSvc, NgcSvc, NlaSvc, nscp, nsi, ose, p2pimsvc, p2psvc, PcaSvc, PeerDistSvc, perceptionsimulation, PerfHost, PhoneSvc, pla, PlugPlay, PNRPAutoReg, PNRPsvc, PolicyAgent, Power, PrintNotify, ProfSvc, PushToInstall, QWAVE, RasAuto, RasMan, RemoteAccess, RemoteRegistry, RetailDemo, RmSvc, RpcEptMapper, RpcLocator, RpcSs, SamSs, SCardSvr, ScDeviceEnum, Schedule, SCPolicySvc, SDRSVC, seclogon, SecurityHealthService, SEMgrSvc, SENS, Sense, SensorDataService, SensorService, SensrSvc, SessionEnv, SgrmBroker, SharedAccess, SharedRealitySvc, ShellHWDetection, shpamsvc, smphost, SmsRouter, SNMPTRAP, spectrum, Spooler, sppsvc, SSDPSRV, ssh-agent, SstpSvc, StateRepository, stisvc, StorSvc, svsvc, swprv, SysMain, SystemEventsBroker, TabletInputService, TapiSrv, TermService, Themes, TieringEngineService, TimeBrokerSvc, TokenBroker, TracSrvWrapper, TrkWks, TroubleshootingSvc, TrustedInstaller, tzautoupdate, UevAgentService, uhssvc, UmRdpService, upnphost, UserManager, UsoSvc, VacSvc, VaultSvc, vds, VGAuthService, vm3dservice, vmicguestinterface, vmicheartbeat, vmickvpexchange, vmicrdv, vmicshutdown, vmictimesync, vmicvmsession, vmicvss, VMTools, vmvss, VSS, W32Time, WaaSMedicSvc, WalletService, WarpJITSvc, wbengine, WbioSrvc, Wcmsvc, wcncsvc, WdiServiceHost, WdiSystemHost, WdNisSvc, WebClient, Wecsvc, WEPHOSTSVC, wercplsupport, WerSvc, WFDSConMgrSvc, WiaRpc, WinDefend, WinHttpAutoProxySvc, Winmgmt, WinRM, wisvc, WlanSvc, wlidsvc, wlpasvc, WManSvc, wmiApSrv, WMPNetworkSvc, workfolderssvc, WpcMonSvc, WPDBusEnum, WpnService, wscsvc, WSearch, wuauserv, WwanSvc, XblAuthManager, XblGameSave, XboxGipSvc, XboxNetApiSvc, AarSvc_6f2b5, BcastDVRUserService_6f2b5, BluetoothUserService_6f2b5, CaptureService_6f2b5, cbdhsvc_6f2b5, CDPUserSvc_6f2b5, ConsentUxUserSvc_6f2b5, CredentialEnrollmentManagerUserSvc_6f2b5, DeviceAssociationBrokerSvc_6f2b5, DevicePickerUserSvc_6f2b5, DevicesFlowUserSvc_6f2b5, MessagingService_6f2b5, OneSyncSvc_6f2b5, PimIndexMaintenanceSvc_6f2b5, PrintWorkflowUserSvc_6f2b5, UdkUserSvc_6f2b5, UnistoreSvc_6f2b5, UserDataSvc_6f2b5, WpnUserService_6f2b5

Replacing NRPE: NSClient++ API

Of course the question arises, why NRPE v3 was never implemented in NSClient++. Actually NSClient++ implemented its own REST API and clearly focused development on the API instead of running behind the NRPE developers. The API can be started by enabling the "WEBServer" module in nsclient.ini:

; Modules
[/modules]
CheckExternalScripts = 1
CheckHelpers = 1
CheckNSCP = 1
CheckDisk = 1
CheckSystem = 1
CheckWMI = 1
NSClientServer = 1
CheckEventLog = 1
NSCAClient = 1
NRPEServer = 1
CheckLogFile = 1
SimpleFileWriter = 1
SimpleCache = 1
WEBServer = 1

Similar to the NRPE server settings section, there is also a WEB server setting:

# Section for WEB (WEBServer.dll) (check_WEB) protocol options.
[/settings/WEB/server]
allowed hosts=127.0.0.1,192.168.15.0/24
cache allowed hosts=true
certificate=${certificate-path}/certificate.pem
port=8443
threads=10

This config snippet allows requests from localhost (127.0.0.1) and from the range 192.168.15.0/24 to access the NSClient++ web server/API. By default the web server listens on port 8443. A password can be defined in nsclient.ini - either in the [/settings/default] or in the [/settings/WEB/server] section.

Note: If the password is defined in the [/settings/default] section of nsclient.ini, the password is applied to all check types (using NSClient++, NRPE, API).

This module also enables a password protected user interface which can be used to check the current system usage and the configuration of NSClient++ but also to manually execute checks. This is a great helper for troubleshooting!

NSclient++ user interface showing system metrics

Manually executing check commands in NSClient++ UI

When switching to the "Queries" tab, a list of predefined checks can be selected.

NSclient++ internal check commands shown in user interface

Do they look familiar? Actually they should. These are the exact same (internal)  NSClient++ commands which can be executed with NRPE. After a check command is selected, the check can be executed in the "Run" tab. Here the result of a simple "check_drivesize" check:

NSclient++ check_drivesize executed in user interface

One of the tricky things with NSClient++ is always to find the exact methods of setting filters and thresholds. But the input field is actually a great helper, which automatically shows additional arguments to the selected check:

NSclient++ check attributes showing up


The check commands can therefore be tested directly by applying attributes to the checks. Here the warning threshold was manually lowered to 50% disk usage (default is 80%):

NSclient++ check executed in user interface with adjusted threshold

Executing checks via NSClient++ API

Of course these checks should now be executed via NSClient++'s API and not via the user interface. Basically the URL for the checks is the same as when visiting the user interface, followed by /query/check_command?arguments. Special characters (such as space or percent sign) need to be URL-encoded. Meaning: A space becomes %20 and a percent becomes %25.

The following curl command does the same check as manually executed in the UI before, with a manual warning threshold at 50% disk usage. Note that the password to access the web server/api (defined in nsclient.ini in the default section) is submitted as additional HTTP header:

# curl -k -s -H 'password: 1234' 'https://windowshost.example.com:8443/query/check_drivesize?warning=used>50%25' | python -m json.tool
{
    "header": {
        "source_id": ""
    },
    "payload": [
        {
            "command": "check_drivesize",
            "lines": [
                {
                    "message": "WARNING C:\\: 31.287GB/58.918GB used",
                    "perf": [
                        {
                            "alias": "C:\\ used",
                            "float_value": {
                                "critical": 53.02614784240723,
                                "maximum": 58.91794204711914,
                                "minimum": 0.0,
                                "unit": "GB",
                                "value": 31.28728485107422,
                                "warning": 29.45897102355957
                            }
                        },
                        {
                            "alias": "C:\\ used %",
                            "float_value": {
                                "critical": 90.0,
                                "maximum": 100.0,
                                "minimum": 0.0,
                                "unit": "%",
                                "value": 53.0,
                                "warning": 50.0
                            }
                        },
                        {
                            "alias": "D:\\ used",
                            "float_value": {
                                "critical": 0.0,
                                "maximum": 0.0,
                                "minimum": 0.0,
                                "unit": "B",
                                "value": 0.0,
                                "warning": 0.0
                            }
                        }
                    ]
                }
            ],
            "result": "WARNING"
        }
    ]
}

This shows the same response as seen in the user interface, however in a JSON format which is easily parse-able. But does that mean the wheel needs to be invented again and a monitoring plugin such as check_nrpe needs to be written first? Luckily not, as there are two already existing monitoring plugins which do the job!

Using monitoring plugin check_nscp_api

The monitoring plugin check_nscp_api is part of Icinga 2 but can also be manually compiled from the source code. Users having installed icinga2 packages should be able to find the plugin check_nscp_api in the default monitoring plugins path (/usr/lib/nagios/plugins). The plugin is installed through the icinga2-bin package:

root@icinga2:~# dpkg -S /usr/lib/nagios/plugins/check_nscp_api
icinga2-bin: /usr/lib/nagios/plugins/check_nscp_api

An alternative would be the check_nsc_web monitoring plugin.

Using the information from the NSClient++ user interface and API, the same check (check_drivesize) is used with the same attributes (warning=used>50%) again. Using the plugin even makes it easier as the arguments are placed into a string and therefore does not require any URL encoding:

# /usr/lib/nagios/plugins/check_nscp_api -H windowshost.example.com -P 8443 --password 1234 -q check_drivesize -a "warning=used>50%"
check_drivesize WARNING C:\: 31.288GB/58.918GB used | 'C:\ used'=31.288002GB;29.458971;53.026148;0;58.917942 'C:\ used %'=53%;50;90;0;100 'D:\ used'=0B;0;0;0;0

Great! The plugin correctly returns the WARNING status for drive C: as disk usage is above 50% - and performance data is also shown.

Comparison NRPE vs. API

As written above, one of the NRPE v2 problems is the response size from the server (payload length/size). In the example above, listing the services using the check_wmi command did not work without manually increasing the NRPE payload. What about the API? Are there any limitations?

# /usr/lib/nagios/plugins/check_nscp_api -H windowshost.example.com -P 8443 --password 1234 -q check_wmi -a "query=select Name from Win32_Service"
check_wmi AdobeARMservice, AJRouter, ALG, AppIDSvc, Appinfo, AppMgmt, AppReadiness, AppVClient, AppXSvc, AssignedAccessManagerSvc, AudioEndpointBuilder, Audiosrv, autotimesvc, AxInstSV, BDESVC, BFE, BITS, BrokerInfrastructure, Browser, BTAGService, BthAvctpSvc, bthserv, camsvc, CDPSvc, CertPropSvc, ClickToRunSvc, ClipSVC, COMSysApp, CoreMessagingRegistrar, CryptSvc, CscService, DcomLaunch, defragsvc, DeviceAssociationService, DeviceInstall, DevQueryBroker, Dhcp, diagnosticshub.standardcollector.service, diagsvc, DiagTrack, DialogBlockingService, DispBrokerDesktopSvc, DisplayEnhancementService, DmEnrollmentSvc, dmwappushservice, Dnscache, DoSvc, dot3svc, DPS, DsmSvc, DsSvc, DusmSvc, Eaphost, edgeupdate, edgeupdatem, EFS, embeddedmode, EntAppSvc, EPWD, EventLog, EventSystem, Fax, fdPHost, FDResPub, fhsvc, FontCache, FrameServer, GoogleChromeElevationService, gpsvc, GraphicsPerfSvc, gupdate, gupdatem, hidserv, HvHost, icssvc, IKEEXT, InstallService, iphlpsvc, IpxlatCfgSvc, KeyIso, KtmRm, LanmanServer, LanmanWorkstation, lfsvc, LicenseManager, lltdsvc, lmhosts, LSM, LxpSvc, MapsBroker, MicrosoftEdgeElevationService, MixedRealityOpenXRSvc, MozillaMaintenance, mpssvc, MSDTC, MSiSCSI, msiserver, MsKeyboardFilter, NaturalAuthentication, NcaSvc, NcbService, NcdAutoSetup, Netlogon, Netman, netprofm, NetSetupSvc, NetTcpPortSharing, NgcCtnrSvc, NgcSvc, NlaSvc, nscp, nsi, ose, p2pimsvc, p2psvc, PcaSvc, PeerDistSvc, perceptionsimulation, PerfHost, PhoneSvc, pla, PlugPlay, PNRPAutoReg, PNRPsvc, PolicyAgent, Power, PrintNotify, ProfSvc, PushToInstall, QWAVE, RasAuto, RasMan, RemoteAccess, RemoteRegistry, RetailDemo, RmSvc, RpcEptMapper, RpcLocator, RpcSs, SamSs, SCardSvr, ScDeviceEnum, Schedule, SCPolicySvc, SDRSVC, seclogon, SecurityHealthService, SEMgrSvc, SENS, Sense, SensorDataService, SensorService, SensrSvc, SessionEnv, SgrmBroker, SharedAccess, SharedRealitySvc, ShellHWDetection, shpamsvc, smphost, SmsRouter, SNMPTRAP, spectrum, Spooler, sppsvc, SSDPSRV, ssh-agent, SstpSvc, StateRepository, stisvc, StorSvc, svsvc, swprv, SysMain, SystemEventsBroker, TabletInputService, TapiSrv, TermService, Themes, TieringEngineService, TimeBrokerSvc, TokenBroker, TracSrvWrapper, TrkWks, TroubleshootingSvc, TrustedInstaller, tzautoupdate, UevAgentService, uhssvc, UmRdpService, upnphost, UserManager, UsoSvc, VacSvc, VaultSvc, vds, VGAuthService, vm3dservice, vmicguestinterface, vmicheartbeat, vmickvpexchange, vmicrdv, vmicshutdown, vmictimesync, vmicvmsession, vmicvss, VMTools, vmvss, VSS, W32Time, WaaSMedicSvc, WalletService, WarpJITSvc, wbengine, WbioSrvc, Wcmsvc, wcncsvc, WdiServiceHost, WdiSystemHost, WdNisSvc, WebClient, Wecsvc, WEPHOSTSVC, wercplsupport, WerSvc, WFDSConMgrSvc, WiaRpc, WinDefend, WinHttpAutoProxySvc, Winmgmt, WinRM, wisvc, WlanSvc, wlidsvc, wlpasvc, WManSvc, wmiApSrv, WMPNetworkSvc, workfolderssvc, WpcMonSvc, WPDBusEnum, WpnService, wscsvc, WSearch, wuauserv, WwanSvc, XblAuthManager, XblGameSave, XboxGipSvc, XboxNetApiSvc, AarSvc_6f2b5, BcastDVRUserService_6f2b5, BluetoothUserService_6f2b5, CaptureService_6f2b5, cbdhsvc_6f2b5, CDPUserSvc_6f2b5, ConsentUxUserSvc_6f2b5, CredentialEnrollmentManagerUserSvc_6f2b5, DeviceAssociationBrokerSvc_6f2b5, DevicePickerUserSvc_6f2b5, DevicesFlowUserSvc_6f2b5, MessagingService_6f2b5, OneSyncSvc_6f2b5, PimIndexMaintenanceSvc_6f2b5, PrintWorkflowUserSvc_6f2b5, UdkUserSvc_6f2b5, UnistoreSvc_6f2b5, UserDataSvc_6f2b5, WpnUserService_6f2b5 |

By using the API, the answer from NSClient++ comes via HTTP, which does not have such limits (technically speaking there are certain limits, such as request header size limits, but they don't apply in this scenario).

And what about speed? Which check returns a result faster?

root@icinga2:~# time /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com --payload-size=4096 -2 -c check_wmi -a "query=select Name from Win32_Service"
AdobeARMservice, AJRouter, [...]

real    0m0.079s
user    0m0.011s
sys    0m0.013s

root@icinga2:~# time /usr/lib/nagios/plugins/check_nscp_api -H windowshost.example.com -P 8443 --password 1234 -q check_wmi -a "query=select Name from Win32_Service"
check_wmi AdobeARMservice, AJRouter, [...] |

real    0m0.060s
user    0m0.025s
sys    0m0.010s

The time comparison was run a couple of times in a row. Although both checks methods show almost the same response time, in 90% of all checks, the API method was slightly faster. Using the NSClient++ API is therefore definitely a worthy NRPE replacement!

Looking for monitoring specialists?

Infiniroot is more than a managed server hoster in Switzerland. We are experts in building solutions using open source software. Monitoring is one part of this. We love sharing our solutions and are available as consultants for helping your on-prem installation or to provide data knowledge transfer, such as an introduction to Icinga 2 workshop.