We have a SNMP CPU Load sensor on a critical server and all cores spiked to 100% at the moment that the sensor caught it. This immediately sent the Down notification to our ticketing system. Even though I have it set to:
Set sensor to warning for 1 interval, then set to down (recommended)
in the sensor settings. The scanning interval is 5 minutes. Logs show a trigger activated roughly a minute later. I would expect another check to fail after entry 1), then a trigger.
2) 6/19/2016 3:33:01 PM SNMP CPU Load Notification Info State Trigger activated (Sensor/Source/ID:54776/0/1)
1) 6/19/2016 3:31:57 PM SNMP CPU Load Down 100 % (Total) is above the error limit of 95 % in Total
Article Comments
After sending my question in, I found the answer in the help documentation.
But I can't help but ask if this is the way it *should* be? CPU percentages are one of those metrics that can fluctuate wildly and sometimes need a wide berth to operate. Simply saying "If PRTG catches the CPU over 95% for even one check, that's critical." We're foregoing the Down condition in order to avoid this.
This is going to create a number of false alerts and increased workload on our level 1 and 2 support.
Jun, 2016 - Permalink
Dear Mike
The notification trigger can be used in this case: Add a threshold notification and configure a delay (Above X for Y seconds.) This way the sensor status is not affected, but you can still get an email or an alert by other means to check the server CPU performance.
Jun, 2016 - Permalink
Dear Mike
If a sensor goes down by a limit, it will instantly go down. The setting you mention only considers sensor errors like a connection issue to the device.
PRTG has no setting to delay a down status triggered by a channel limit.
Jun, 2016 - Permalink