Hello

Recently we had a Problem with the Failover of our PRTG Cluster, which caused it to be disconnected from the Master for about a week. After we turned the Failover on again it showed that the cluster was back to normal (Status was ok). But after some time i realized that it wasn't synchronizing all my changes anymore (For example i unpaused a sensor on my Master but it didnt on the Failover which caused the sensor's overall state to stay on paused).

I decided to disconnect the cluster completely, delete the configuration on the Failover and reconnect it again with a new cluster ID which obviously solved the Problem. My question is now for future situations like this. Is there a maximum time the cluster can be on an error state before i can expect synchronisations Problems? How do I handle a outage of a cluster that lasts for more than just a few hours? Do you have any Best Practices regarding this topic?


Article Comments

Hey Jasmin,

Thank you for your KB-posting.

  • Is there a maximum time the cluster can be on an error state before i can expect synchronizations Problems? There is no distinct time after the synchronization will fail. Once a disconnected node re-joined the cluster, the master checks the "version" of the configuration on the node which is back up. If the discrepancy is very small, the master pushes the changes only to update the configuration on the node. However, if the discrepancy is too big, the master pushes the entire configuration file to the node in order to sync the configuration asap.
  • How do I handle a outage of a cluster that lasts for more than just a few hours? Usually, there is nothing to do as PRTG will synchronize all important files automatically.
  • Do you have any Best Practices regarding this topic?
    • Check if the Failover runs the very same version as the Master does
    • Check if the license is activated successfully on all nodes of the cluster
    • In case, that you encounter any other issues, please open a support ticket and forward us the log files from all nodes of the cluster for further analysis.

Best regards,
Sven


Jun, 2017 - Permalink

Hi Sven,

Thank you for your Reply. In case a situation like this should happen again I'll follow your suggested steps.

Best regards, Jasmin


Jun, 2017 - Permalink