Some of our customers need to go very far in the finesse of monitoring their servers. And although PRTG already has many configuration options, they do not cover every need.
One of these needs is particularly problematic.
Use cases:
Some servers manage processes that more or less saturate their CPUs for the time that these processes must last. If a threshold is conventionally positioned, as soon as it is crossed, the sensor is in error. As outlined above, this is not necessarily a problem as long as the situation does not last beyond "usual" time. |
Solution:
In this case, it is necessary to start by positioning at least one threshold for the channel of the CPU concerned and then to deactivate the limits. Two notifications are then created. One which will activate the limits according to the criteria that will be defined in the threshold trigger. Another one will deactivate them as soon as the threshold is crossed in the opposite direction. |
Example: SetLimits_ON Notification
- Execute HTTP Action checked
URL: | https://dash01.domain.tld/api/setobjectproperty.htm |
POST: | id=%sensorid&subtype=channel&subid=0&name=limitmode&value=1&username=<username>&passhash=<hash> |
Example: SetLimits_OFF Notification
- Execute HTTP Action checked
URL: | https://dash01.domain.tld/api/setobjectproperty.htm |
POST: | id=%sensorid&subtype=channel&subid=0&name=limitmode&value=0&username=<username>&passhash=<hash> |
Threshold Trigger on CPU sensor
When Total (%) channel is Above 50 for at least 3600 seconds perform SetLimits_ON |
When condition clears after a notification was triggered perform SetLimits_OFF |
Points of Attention:
It is of course necessary that the account used has the right to modify the sensor concerned. It is also important to ensure that the internal URL is used so that the API call is made live and does not go through external components.
Feature Request:
This solution works but remains very manual and requires a configuration effort which makes it unserviceable in time. It would be preferable that it be integrated into PRTG notification management to ensure stability and make its configuration easy.
Article Comments
Hello Erhard,
Thanks for your reply !
To complete your remark:
- You are right. The changeover to error state is what is needed so that the calculation of SLA for example can take account of the incident and also simply so that this one appears on the dashboard of the administrators.
- Thanks for the tip, I tought that the POST action was mandatory to use placeolders.
- This is the whole subject of my feature request! It's a bit tedious to have to preconfigure thresholds. Additional request: it could be nice in threshold trigger to work on the mean value in the period in addition to the total value or primary channel.
Sincerely yours,
Matthieu
Jun, 2017 - Permalink
Hello Matthieu,
Does PRTG's unusual detection not detect when average values are being "violated" in this particular sensor? OK, it might take a few weeks until unusual detection starts to detect "anomalies" in a sensor, because it establishes a baseline over several weeks to notice when traffic is unusually high or low for a particular time of day for example.
Regarding this whole "Go into error with thresholds/go into error when limit was (b)reached for x times" is not something that we will see anytime soon I'm afraid. Demand is there indeed for this feature, however it was not high enough so far to put efforts into it, so for now I won't expect this to change in the near future.
Kind regards,
Erhard
Jun, 2017 - Permalink
Hello Matthieu,
Thank you very much for sharing this.
A few remarks from my point of view:
https://dash01.domain.tld/api/setobjectproperty.htm?id=%sensorid&subtype=channel&subid=0&name=limitmode&value=0&username=<username>&passhash=<hash>
Kind regards,
Erhard
Jun, 2017 - Permalink