I run a large PRTG installation with thousands of sensors in my network and want to set up a high-availability cluster for fail-safe monitoring. However, according to the PRTG system recommendations, more than 2,500 sensors in a cluster are not recommended and more than 5,000 sensors in a cluster are not officially supported.

How can I set up fail-safe monitoring nevertheless in case my PRTG installation is too large for the cluster feature?

This article applies as of PRTG 22
High-availability cluster for large installations

The cluster feature does not officially support more than 5,000 sensors, and we also do not recommend that you set up a cluster with more than 2,500 sensors. Always keep in mind that the monitoring traffic and the load is multiplied for each cluster node that you add. You might encounter performance issues in this case, but this also depends on your individual setup.

So, before you create a cluster with such a high number of sensors, contact your presales team. We can discuss your options together. You can find some alternatives to a cluster below.

Alternatives to a cluster

The following alternatives neither replace nor provide equivalent features of a cluster. The aim is to give you some ideas that you can implement to help you quickly get your PRTG installation up and running (see also My PRTG has crashed and I can't restart it anymore. What can I do?).

We distinguish two cases here:

PRTG is running on a real hardware server.
PRTG is running on a virtual machine.

PRTG on real server hardware

If your PRTG installation is too large for a properly working cluster setup, you can alternatively implement the following approach to recover PRTG as fast as possible if it fails.

You will need two real servers, both must have PRTG installed. One will act as a "master node" and the second as a standby node. Keep the standby server up to date by regularly updating it to the same PRTG version as the master node.

The master node runs PRTG and monitors your infrastructure. The standby server will have PRTG installed, but the PRTG core server and its local probe services must be stopped. Copy or synchronize all PRTG data like configuration files, monitoring data, and templates on the master node with the standby server on a regular basis. You can do this by using a custom script that only copies data that has changed since the last synchronization.

Note: Copying the files requires that your master PRTG core server and its local probe services be stopped.

To keep the offline time short, your script can proceed as follows:

Stop the Windows services of the master PRTG core server and its local probe.
Copy all relevant data to a specific location where the copy time is short.
Start the services of the PRTG core server and its local probe.
Compress the copied data, transfer it to the standby server, and decompress it in the correct PRTG directory.

You can use a freeware version of PRTG that monitors the status of the master node server. When the latter fails, you will be notified to trigger the standby server to start monitoring your infrastructure.

Some manual configuration will be necessary to configure your remote probes to send monitoring data to your new PRTG core server. You will also need to migrate your PRTG license from the old server to the new server.

PRTG on a virtual machine

When running PRTG in a virtual environment, you have two options to keep monitoring downtime as low as possible.

Note: Both approaches require actions from the PRTG administrator to recover the PRTG installation once it is down. Moreover, there will be a gap in the monitoring data due to the downtime.

1. Use snapshots

The idea is to make VMware or Hyper-V snapshots of the virtual machine where the PRTG core server is running. The snapshot will contain the status of the virtual machine, disk data, and configuration at a given point in time. Take snapshots regularly and carefully because performance may decrease as more snapshots are taken.

If the virtual machine where PRTG is running crashes or fails, you can restore it quickly from the latest snapshot.

2. Use a VM backup

Hyper-V and VMware make it possible to have backups of virtual machines. The backup should contain the configuration, VM snapshots, and virtual hard disks used by the virtual machine.

If the virtual machine where PRTG is running crashes, you can restore it from a backup copy.

Are there alternatives to the cluster when running a large installation?

How can I set up fail-safe monitoring nevertheless in case my PRTG installation is too large for the cluster feature?

This article applies as of PRTG 22High-availability cluster for large installations

Alternatives to a cluster

PRTG on real server hardware

PRTG on a virtual machine

1. Use snapshots

2. Use a VM backup

More

Search

Tags

Related Articles

This article applies as of PRTG 22
High-availability cluster for large installations