Had an issue recently where a server had failed and the ping sensor was obviously Down, however, even though the server was off the entire time PRTG would periodically switch state to Unknown and then back to Down.
The problem being of course that rather than remaining Down until such time as it was confirmed to back up again, the flicking between the two non-Up states caused new notifications to be sent every time it switched back to Down again. I don't understand how or why the sensor would change state when nothing had changed with the server state... it was shutdown the entire time.
I guess I can understand the use of going from Up to Unknown, where PRTG doesn't yet know if something is definitely down, but I don't understand the point of the reverse.
The Unknown notifications all showed "No data since <2 minutes previously>", though I'm not sure what data it thought it had received two minutes ago since the server was off, and there's nothing showing in the logs to indicate it receiving any response.
Article Comments
Thanks for coming back to me. The sensor that was switching back and forth was actually the ping sensor, with the other sensor on that device already set to be dependent on it. (which of course also kept going from paused to down due to the ping no longer showing as Down briefly)
The log doesn't really give any hints, just a series of the status going unknown and then back to down, with no obvious pattern I can see. For instance :
11:35:52 down 18:43:01 no data since 18:40:51 18:43:23 down 19:35:01 no data since 19:33:44 19:36:02 down 22:36:01 no data since 22:34:43 22:36:23 down 22:50:01 no data since 22:48:13 22:50:22 down 23:12:01 no data since 23:10:13 23:12:11 down
This ping sensor is set for a scanning interval of 30 seconds.
I've checked the Probe Health sensor for that period and also the general health sensors for the probe around that time, but there's no indication of any issues there. Regarding the 500 simultaneous scans in the scheduler, while we have 613 sensors on that probe, a lot of them are set with much longer scanning intervals in particular the more intensive ones so they don't end up running at the same time.
Aug, 2021 - Permalink
Hi Keith,
Thanks for pointing out that the Ping was changing its status; this explains the behavior of the other sensors as well. 613 sensors indeed does not sound like an overload and should be handled easily.
I'd be happy to take a look at the extended log files though. Could you please forward the files from the affected probe by following this article?
Kind regards,
Felix Saure, Tech Support Team
Aug, 2021 - Permalink
Hi Keith,
Thanks for reaching out to us! This first of all sounds like that there are not dependencies set for the device. I'd recommend to set the ping sensor to the master object of the parent device so that all sensors on that device are getting paused and that only Ping checks are performed to check if it's available again.
The unknown flapping status can also mean that PRTG was not able to perform the scans in time. This might be caused by a general overload of the probe which can schedule up to 500 simultaneous scans in its scheduler, or if the probe was not available and PRTG therefore did not get any data for the period.
Does the "Logs" tab of the sensor tell you more? Also check the availability of the regarding Probe Health Sensor to see if there are gaps as well.
Finally, if you want us to take a look into the log files of that respective probe, write Probe Status Files and forward the logs to us for review. Then let us know the IP and the hostname of the device you're referring to.
If you want to continue via email, kindly mention this KB post as well. Thanks!
Kind regards,
Felix Saure, Tech Support Team
Aug, 2021 - Permalink