I noticed this was added in during one of the recent updates and its really great, I setup my own SNMP monitoring template a while ago with what looks to be the identical HP SNMP libraries.

One thing I noticed though is that all your temperature channels are taking in the value of their HP set limit and putting it as their Error value. The problem with doing this is that by default HP servers are usually set to automatically shut down when that value is reached, not go above which is when PRTG would show an error. It's quite possible that you wouldn't see the temperature channel go to an error state, and most likely you won't have time to react before the auto-shutdown begins.

I set my template to take in the value and then I reduce the error value to 5 below the shutdown limit, and the warning value to 10-15 below the shutdown limit to give me more time to check on an over heating system and shut it down gracefully if needed.

Great work though, I'm hoping you'll also Add Memory and Drive/Raid Array status next as I have templates for them already but it can be a pain to setup many of the channels and sensors.


Article Comments

Thank you very much for the nice feedback.

The thing about the temperature channels is, that our test devices have 3 different threshholds for every temperature. The first is described as "warning". The second is described as "critical" and the third is described as "shutdown" threshhold. We simply took the "warning" for our warning and the critical for our error. If there was no critical we took the shutdown. We did not want to change those limits, because now we can explain why the limits are as they are.

There will be a new channel "Disk Controller Status" in the HP Proliant System Health sensor, soon.

Please let us know, what oids you use to monitor your memory and we'll consider adding it to PRTG.


Nov, 2012 - Permalink

Hello Johannes,

Okay that makes more sense for how your sensor works, its likely you've got a newer server than we do here, as many of our proliant servers are a few years old and I haven't tried the new sensor yet on the latest hardware. Most are DL/ML 360-380's and just give us one temperature threshold to set at the moment.

For the memory status on our HP's I use the following OIDs:

1.3.6.1.4.1.232.6.2.14.12.1.6/11 (cpq he res mem2board) to get Memory controler error status/condition.

1.3.6.1.4.1.232.6.2.14.13.1.19/20 (cpq he res mem2module) to get the status and condition of each piece of Ram installed. Very useful to quickly see if a memory stick has died after a reboot or became unseated during transportation.

Hope that helps.


Nov, 2012 - Permalink

That helps us a lot. Thank you very much!


Nov, 2012 - Permalink