Nagios – An Introduction

What is Nagios?
Nagios is a framework for setting up monitoring of hosts, services, and networks .

What are the components of Nagios?

  • nagios – the main server software and web scripts
  • nagios-plugins – the common set of check scripts used to query services
  • nagios-nrpe – Nagios Remote Plugin Executor
  • nagios-nsca – Nagios Service Check Acceptor

nagios-nrpe and nagios-nsca are known as addons.

What is the difference between Nagios-Plugins and Nagios-AddOns?

Addons are an extensions to the base-system on the server. This is the main difference between a NagiosPlugin and an NagiosAddon. There are two types of Nagios Addons available.

  • Core-Addons (Addons provided by the Nagios-Developer Ethan Galstad)
  • Community-Addons(Addons provied by the Nagios-Community)

Nagios Core Addons

  • Database – NDOUtils – NDOUtils allows you to export current and historical data from one or more Nagios instances to a MySQL database. Several community addons use this as one of their data sources.
  • Distribution
  1. NSCA – NSCA allows you to integrate passive alerts and checks from remote machines and applications with Nagios. Useful for processing security alerts, as well as redundant and distributed Nagios setups.
  2. NRPE – NRPE allows you to remotely execute Nagios plugins on other Linux/Unix machines. This allows you to monitor remote machine metrics (disk usage, CPU load, etc.). NRPE can also communicate with some of the Windows agent addons, so you can execute scripts and check metrics on remote Windows machines as well.

Nagios Architechture

Nagios consists of a central server running Nagios daemon. It can be used to monitor many network services like smtp, http and dns on remote hosts. It also has support for snmp to allow you to check things like processor loads on routers and servers.

  • A service is anything on the remote host that you want checked. Its state can be one of: OK, Warning, Critical or Unknown
  • A check is a script run on the central monitoring server whose exit status determines the state of the service monitored on the remote machine: 0, 1, 2 or -1

How does Nagios Check the services?

Nagios is capable of monitoring hosts and services in two ways:

  • Active
  • Passive

Active checks

The main features of active checks are as follows

  • Active checks are initiated by the Nagios daemon in the central server
  • Active checks are run on a regularly scheduled basis

How Are Active Checks Performed?
Active checks are initiated by the check logic in the Nagios daemon. When Nagios needs to check the status of a host or service it will execute a plugin and pass it information about what needs to be checked. The plugin will then check the operational state of the host or service and report the results back to the Nagios daemon. Nagios will process the results of the host or service check and take appropriate action as necessary (e.g. send notifications, run event handlers, etc).

Why use plugins to do active checks?

Nagios does not include any internal mechanisms for checking the status of hosts and services on your network. Instead, Nagios relies on external programs (called plugins) to do all the dirty work.

What Are Plugins?

Plugins are compiled executables or scripts (Perl scripts, shell scripts, etc.) that can be run from a command line to check the status or a host or service. Nagios uses the results from plugins to determine the current status of hosts and services on your network.

Nagios will execute a plugin whenever there is a need to check the status of a service or host. The plugin does something (notice the very general term) to perform the check and then simply returns the results to Nagios. Nagios will process the results that it receives from the plugin and take any necessary actions (running event handlers, sending out notifications, etc).

Advantage of plugin architecture

Plugin architecture can monitor just about anything you can think of. We can write our own plugins if needed.

Disadvantage of plugin architecture

Nagios has absolutely no idea what it is that you’re monitoring. Nagios doesn’t understand the specifics of what’s being monitored – it just tracks changes in the state of those resources. Only the plugins themselves know exactly what they’re monitoring and how to perform the actual checks.

What Plugins Are Available?

There are plugins currently available to monitor many different kinds of devices and services, including:

  • HTTP, POP3, IMAP, FTP, SSH, DHCP
  • CPU Load, Disk Usage, Memory Usage, Current Users
  • Unix/Linux, Windows, and Netware Servers
  • Routers and Switches,etc

When Active checks are executed?

  • At regular intervals, as defined by the check_interval and retry_interval options in your host and service definitions
  • On-demand as needed

Regularly scheduled checks occur at intervals equaling either the “check_interval” or the “retry_interval” in your “host” or “service” definitions, depending on what type of state the host or service is in. If a host or service is in a HARD state, it will be actively checked at intervals equal to the check_interval option. If it is in a SOFT state, it will be checked at intervals equal to the retry_interval option.

On-demand checks are performed whenever Nagios sees a need to obtain the latest status information about a particular host or service.

Active checks are done using Nagios Plugins or check_nrpe (from NRPE). The NRPE addon is designed to allow you to execute Nagios plugins on remote Linux/Unix machines. Normally Nagios will run the plugins from the monitoring node itself. Through use of NRPE, the plugins are run from monitored machine itself.
check_nrpe (from NRPE) is used to initiate an active check from the  nagios machine. It is called by nagios much like a normal plugin but it works differently. check_nrpe contacts the nrpe daemon running on a  remote host and asks it to run a pre-configured check command and returns the results to nagios.

  • Nagios will execute the check_nrpe plugin and tell it what service needs to be checked
  • The check_nrpe plugin contacts the NRPE daemon on the remote host over an (optionally) SSL-protected connection
  • The NRPE daemon runs the appropriate Nagios plugin to check the service or resource
  • The results from the service check are passed from the NRPE daemon back to the check_nrpe plugin, which then returns the check results to the Nagios process.

Passive Checks

Host and service checks which are performed and submitted to Nagios by external apps(running in the hosts monitored) are called passive checks. Passive checks can be contrasted with active checks, which are host or service checks that have been initiated by Nagios.

NSCA(Nagios Service Check Acceptor) addon helps in performing passive checks.
NSCA is used when a passive check is initiated from a remote host. A wrapper script on a remote host, called by cron or some other method, executes a plugin there and passes the results to the send_nsca  program. send_nsca contacts the nsca daemon on the nagios machine  which passes the results of the check to nagios.

NSCA is used when a passive check is initiated from a remote host. A  wrapper script on a remote host, called by cron or some other method, executes a plugin there and passes the results to the send_nsca  program. send_nsca contacts the nsca daemon on the nagios machine  which passes the results of the check to nagios.

When to use active and passive checks?
Typically passive checks are used when a firewall prevents the Nagios server to make a request to the client or when the client is running an application that asynchronous, in other words the time schedule for a service is erratic and cannot be fully determined.  Security events is one example of a situation where you do not know when the event may occur.  Passive checks may also be used for distributed monitoring where you have multiple Nagios servers providing information to a master Nagios server.

Active checks occur when Nagios itself is responsible for checking the status of a device at regular intervals. On the other side, a passive check is when the device reports its status to Nagios only when its status changes.

For example, I may wish to use passive checks for most of the services on a device, but I may want to check for the reachability — via a PING check using check_ping — using an active check.

When does Nagios send us notifications?

When a service fails for the first time, Nagios will put that service in a “soft” state. Nagios will then check the service a configured number of times to see if it comes back up. If it does not come back up within that preconfigured number of checks, then Nagios will put the service in a “hard” state and notifications will be sent out. If the service recovers within those checks, Nagios will not send out notifications. So why do this? Well, event handlers can be used to perform actions based on a status change, whether it is a soft or hard state. For example, if you have an Apache Web service which fails, an event handler may be run to attempt to restart the Apache service. If the service comes back up while Nagios is checking it, then there’s probably no real reason to send out notifications.

But if the attempt to restart Nagios fails, then Nagios will eventually put the service in a hard state, causing the notification to be sent out. If you want notifications to always be sent out, the parameter used to specify how many checks to perform before setting the state to a “hard” state is the max_check_attempts parameter for both host and services.

Hope this was useful and if you require any assistance feel free to Contact Us.

Written by actsupp-r0cks