I added a VMware host to PRTG monitoring and this ambiguous result appeared under the warning status:
Sensor hostname [Hardware Status]
Warning. 61 elements return a warning state: System Board 1 Power Optimized 0 --- 0.7.1.117; Add-in Card 3 SD1 0 --- 0.11.3.122; Add-in Card 3 SD2 0 --- 0.11.3.123; BIOS 1 ECC Corr Err 65 --- 0.34.1.1; BIOS 1 ECC Uncorr Err 65 --- 0.34.1.2; BIOS 1 PCI Parity Err 65 --- 0.34.1.4; BIOS 1 PCI System Err 65 --- 0.34.1.5; BIOS 1 SBE Log Disabled 65 --- 0.34.1.6; BIOS 1 Unknown 65 --- 0.34.1.8; BIOS 1 CPU Protocol Err 65 --- 0.34.1.10; BIOS 1 CPU Machine Chk 65 --- 0.34.1.13; BIOS 1 Memory Spared 65 --- 0.34.1.17; BIOS 1 Memory Mirrored 65 --- 0.34.1.18; BIOS 1 PCIE Fatal Err 65 --- 0.34.1.24; BIOS 1 Chipset Err 65 --- 0.34.1.25; BIOS 1 Err Reg Pointer 65 --- 0.34.1.26; BIOS 1 Mem ECC Warning 65 --- 0.34.1.27; BIOS 1 USB Over-current 65 --- 0.34.1.29; BIOS 1 POST Err 65 --- 0.34.1.30; BIOS 1 Hdwr version err 65 --- 0.34.1.31; BIOS 1 Non Fatal PCI Er 65 --- 0.34.1.38; BIOS 1 Fatal IO Error 65 --- 0.34.1.39; BIOS 1 MSR Info Log 65 --- 0.34.1.40; BIOS 1 TXT Status 65 --- 0.34.1.42; BIOS 1 iDPT Mem Fail 65 --- 0.34.1.43; BIOS 1 Additional Info 65 --- 0.34.1.46; BIOS 1 CPU TDP 65 --- 0.34.1.47; BIOS 1 QPIRC Warning 65 --- 0.34.1.48; BIOS 1 QPIRC Warning 65 --- 0.34.1.49; BIOS 1 Link Warning 65 --- 0.34.1.50; BIOS 1 Link Warning 65 --- 0.34.1.51; BIOS 1 Link Error 65 --- 0.34.1.52; BIOS 1 MRC Warning 65 --- 0.34.1.53; BIOS 1 MRC Warning 65 --- 0.34.1.54; BIOS 1 Chassis Mismatch 65 --- 0.34.1.55; BIOS 1 FatalPCIErrOnBus 65 --- 0.34.1.56; BIOS 1 NonFatalPCIErBus 65 --- 0.34.1.57; BIOS 1 Fatal PCI SSD Er 65 --- 0.34.1.58; BIOS 1 NonFatalSSDEr 65 --- 0.34.1.59; BIOS 1 CPUMachineCheck 65 --- 0.34.1.60; BIOS 1 FatalPCIErARI 65 --- 0.34.1.61; BIOS 1 NonFatalPCIErARI 65 --- 0.34.1.62; BIOS 1 FatalPCIExpEr 65 --- 0.34.1.63; BIOS 1 NonFatalPCIExpEr 65 --- 0.34.1.64; BIOS 1 CPU Link Info 65 --- 0.34.1.66; BIOS 1 Chipset Info 65 --- 0.34.1.67; BIOS 1 Memory Config 65 --- 0.34.1.68; BIOS 1 QPI Link Err 65 --- 0.34.1.41; BIOS 1 LT/Flex Addr 65 --- 0.34.1.37; BIOS 1 OS Watchdog Time 65 --- 0.
What does this mean? Is this a status dump or actual triggered issues that should be tended to?
Article Comments
Angelas
Florian is right here. Make sure you monitor the ESX hosts directly, and not through vCenter. vCenter should be used to monitor VMs and Datastores. The ESX hosts should only be polled for Hardware Performance and Hardware Status.
Benjamin Day
Paessler Support
Dec, 2018 - Permalink
Hi guys.
It is a host and not vCenter; it was just amed incorrectly, sorry for the confusion.
Version: VMware ESXi, 6.5.0, 8294253
> Is this a single host or a whole cluster?
This is from a single host.
> Did you look in to the monitor - hardware status in VMware and see what is going on in there?
Under Alerts and Warnings, it is blank. Sensors has a few entries, System Event Log is blank.
> What vendor/model is the server hardware?
Dell PowerEdge R740 and VMware is a custom Dell image.
Dec, 2018 - Permalink
Well - I do not have single hosts here - but I have R740s and VMware 6.5 with no issues in PRTG.
In this case you might wanna collect some support data / sensor logfiles and send them to the support.
Unless the support team has another idea what it might be.
Do you have other hosts that work?
Regards
Florian
Dec, 2018 - Permalink
This host is working in the sense that it's serving VMs. This output from PRTG is the only indication anything is wrong but I am not sure how to understand what it's telling me.
Dec, 2018 - Permalink
Did you use a standard VMware image for this or a Dell optimized VMware vSphere installation like the following one?
You would find the specialized images on Dell's driver download page.. This possibly could make a difference ..
As of what I see - your warnings show this:
Warning. 61 elements return a warning state: System Board 1 Power Optimized 0 --- 0.7.1.117 Add-in Card 3 SD1 0 --- 0.11.3.122 Add-in Card 3 SD2 0 --- 0.11.3.123 BIOS 1 ECC Corr Err 65 --- 0.34.1.1 BIOS 1 ECC Uncorr Err 65 --- 0.34.1.2 BIOS 1 PCI Parity Err 65 --- 0.34.1.4 BIOS 1 PCI System Err 65 --- 0.34.1.5 BIOS 1 SBE Log Disabled 65 --- 0.34.1.6 BIOS 1 Unknown 65 --- 0.34.1.8 BIOS 1 CPU Protocol Err 65 --- 0.34.1.10 BIOS 1 CPU Machine Chk 65 --- 0.34.1.13 BIOS 1 Memory Spared 65 --- 0.34.1.17 BIOS 1 Memory Mirrored 65 --- 0.34.1.18 BIOS 1 PCIE Fatal Err 65 --- 0.34.1.24 BIOS 1 Chipset Err 65 --- 0.34.1.25 BIOS 1 Err Reg Pointer 65 --- 0.34.1.26 BIOS 1 Mem ECC Warning 65 --- 0.34.1.27 BIOS 1 USB Over-current 65 --- 0.34.1.29 BIOS 1 POST Err 65 --- 0.34.1.30 BIOS 1 Hdwr version err 65 --- 0.34.1.31 BIOS 1 Non Fatal PCI Er 65 --- 0.34.1.38 BIOS 1 Fatal IO Error 65 --- 0.34.1.39 BIOS 1 MSR Info Log 65 --- 0.34.1.40 BIOS 1 TXT Status 65 --- 0.34.1.42 BIOS 1 iDPT Mem Fail 65 --- 0.34.1.43 BIOS 1 Additional Info 65 --- 0.34.1.46 BIOS 1 CPU TDP 65 --- 0.34.1.47 BIOS 1 QPIRC Warning 65 --- 0.34.1.48 BIOS 1 QPIRC Warning 65 --- 0.34.1.49 BIOS 1 Link Warning 65 --- 0.34.1.50 BIOS 1 Link Warning 65 --- 0.34.1.51 BIOS 1 Link Error 65 --- 0.34.1.52 BIOS 1 MRC Warning 65 --- 0.34.1.53 BIOS 1 MRC Warning 65 --- 0.34.1.54 BIOS 1 Chassis Mismatch 65 --- 0.34.1.55 BIOS 1 FatalPCIErrOnBus 65 --- 0.34.1.56 BIOS 1 NonFatalPCIErBus 65 --- 0.34.1.57 BIOS 1 Fatal PCI SSD Er 65 --- 0.34.1.58 BIOS 1 NonFatalSSDEr 65 --- 0.34.1.59 BIOS 1 CPUMachineCheck 65 --- 0.34.1.60 BIOS 1 FatalPCIErARI 65 --- 0.34.1.61 BIOS 1 NonFatalPCIErARI 65 --- 0.34.1.62 BIOS 1 FatalPCIExpEr 65 --- 0.34.1.63 BIOS 1 NonFatalPCIExpEr 65 --- 0.34.1.64 BIOS 1 CPU Link Info 65 --- 0.34.1.66 BIOS 1 Chipset Info 65 --- 0.34.1.67 BIOS 1 Memory Config 65 --- 0.34.1.68 BIOS 1 QPI Link Err 65 --- 0.34.1.41 BIOS 1 LT/Flex Addr 65 --- 0.34.1.37 BIOS 1 OS Watchdog Time 65 --- 0.
Unfortunately - I am not able to make much sense of it.. hopefully the support can give you more advice..
Just wanted to mention that there are special ISO images for VMware on Dell servers - I think HP and some other vendors as well.. Those can partly avoid some issues.
Additionally - the BIOS seems the main source of your issues - there might be an update available.
Further would I look in to the DRAC and Dell System/Hardware logs to make sure there is nothing going on.
Regards
Florian
Dec, 2018 - Permalink
Angelas,
Can you confirm with what Florian asked? Also, you might consider opening a support ticket at this stage so someone from our support team can work directly with you.
Benjamin Day
Paessler Support
Dec, 2018 - Permalink
This raises a few questions:
From what I see there seem to be either some hardware-detection issues on VMware or something is wrong with the host. PRTG sensors normally just access the API and read the information VMware provides in their REST API. This looks a lot like there is something going on at the host himself - but right now this is just a guess!
Regards
Florian Rossmark
www.it-admins.com
Dec, 2018 - Permalink