This article applies as of PRTG 20

I would like to know more about the clustering feature of PRTG. Can you give me an introduction and some basic information about clustering? 


Cluster basics

One very helpful feature of PRTG is the cluster. A cluster consists of two or more installations of PRTG that work together to form a highly available monitoring system.

The objective is to reach true 100% percent uptime for the monitoring tool. If you use a cluster, uptime will no longer be degraded by failing connections because of an internet outage at a PRTG core server’s location, failing hardware, or because of downtime due to a software upgrade for the operating system or PRTG itself.

How a cluster works

A cluster consists of one primary master node and one or more failover nodes. Each cluster node is a full installation of PRTG that could perform monitoring and alerting on its own. Cluster nodes are connected to each other using two TCP/IP connections. These connections communicate in both directions and a single cluster node only needs to connect to one other cluster node to integrate into the cluster.

Normal cluster operation

Central configuration, distributed data storage, and central notifications

During normal operation, the primary master node is used to configure devices and sensors. It automatically distributes the configuration to all other cluster nodes in real time. All cluster nodes permanently monitor the network according to this common configuration, and each cluster node stores its results into its own database. This way, the storage of monitoring results is also distributed among the cluster. The downside of this concept is that monitoring traffic and load on the network is multiplied by the number of cluster nodes, but this is not a problem for most usage scenarios.

You can review the monitoring results by logging in to the PRTG web interface of any of the cluster nodes in read-only mode. Because the monitoring configuration is centrally managed, you can change it only on the master node.

If one or more cluster nodes discover downtime or threshold breaches, only the primary master node will send out notifications (for example, via email and SMS). So, you will not be flooded with notifications from all cluster nodes in the event of failures. Additionally, there is a Down (Partial) sensor status, which means that the sensor shows an error on some cluster nodes, but not on all.

Failure cluster operation

  • Failure scenario 1
    If one or more of the failover nodes are disconnected from the cluster (due to hardware or network failures), the remaining cluster nodes continue to work without disruption.
  • Failure scenario 2
    If the primary master node is disconnected from the cluster, one of the failover nodes becomes a failover master node. It takes over control of the cluster and also manages notifications until the primary master node reconnects to the cluster and takes back the master role.

Sample cluster configurations

Several cluster scenarios are possible in PRTG.

  • Single failover: This is the most common usage of the cluster. Both PRTG core servers monitor the same network. When there is downtime on cluster node 1, cluster node 2 automatically takes over the master role until cluster node 1 is back online.

    Single Failover
  • Double failover: This is a very advanced failover cluster. Even if two of the cluster nodes fail, network monitoring will still continue with a single cluster node (as failover master node) until the other cluster nodes are back online.
  • The following four-node scenario shows one cluster node in disconnected mode. You can disconnect a cluster node at any time for maintenance tasks or to keep a powered-off server on standby in case another cluster node’s hardware fails.

    Four-Node Scenario

Usage scenarios for the cluster

The cluster is quite versatile and covers the following usage scenarios.

Failover LAN cluster

PRTG runs on two (or more) servers inside the company LAN (close to each other in a network topology perspective). All cluster nodes monitor the LAN and only the current master node sends out notifications.

Objectives:

  • Reach 100% uptime for the monitoring system (for example, to control SLAs, create reliable billing data, and ensure that all failures create alarms if necessary)
  • Avoid monitoring downtimes

Failover WAN or multi-location cluster

PRTG runs on two (or more) servers that are distributed throughout a multi-segmented LAN or even geographically distributed around the globe. All cluster nodes monitor the same set of servers or sensors, but only the current master node sends out notifications.

Objectives:

  • Create multi-site monitoring results for a set of sensors
    —and/or—
  • Make monitoring and alerting independent from a single site, data center, or network connection

Cluster features

  • The cluster technology is completely built into PRTG, no third-party software is necessary.
  • Central configuration and notifications on the master node
  • Configuration data and status information are automatically distributed among cluster nodes in real time.
  • The storage of monitoring results is distributed to all cluster nodes.
  • Each cluster node can take over the full monitoring and alerting functionality in case of a failover.
  • Cluster nodes can run on different operating systems and different hardware or virtual machines. They should have similar system performance and resources.
  • Node-to-node communication is always secure using SSL-encrypted connections.
  • Automatic cluster update: You need to update to a newer PRTG version on one cluster node only, all other cluster nodes are automatically updated.
  • Connect remote probes to all your cluster nodes.

What is special about a cluster in PRTG (compared to similar products)?

  • Each node is truly self-sufficient (not even the database is shared).
  • Our cluster technology is 100% “home grown” and does not rely on any external cluster technology like Windows Cluster or others.

More


Disclaimer:
The information in the Paessler Knowledge Base comes without warranty of any kind. Use at your own risk. Before applying any instructions please exercise proper system administrator housekeeping. You must make sure that a proper backup of all your data is available.