I've been working on trying to resolve a mutex timeout with a MSA 2040 SAN we've been monitoring. We had been successfully monitoring the SAN with the SSH sensors, but recently the sensors began to fail with mutex timeouts. I've tried lengthening the scanning intervals (from 60 seconds to 5 minutes), but without success. There are three sensors are the enclosure and two logical disk sensors. They are currently paused. I've also tried adding a new enclosure and a new logical disk sensor with the default configuration settings, and both fail with the same error.
I do have another MSA 2040 that did not have any sensors (other than ping) configured at the same location, so I decided to try and add an enclosure sensor on it. It worked successfully. So it seems like something with the MSA 2040 I'm having this issue on is the culprit. I think a reboot might resolve the issue, but I'm looking to see if there is another suggestion without rebooting the SAN?
Article Comments
Have the same issue with the same device, all SSH sensors randomly going to Down state with error: "Timeout while trying to aquire Mutex, try to use bigger Scanningintervals"
What is Mutex? Can i change this behavior somehow?
I changed Scanning time for MSA to 15 min, and this is not quite good.
Dec, 2018 - Permalink
Vasily,
Is it possible for you to restart the SSH daemon on the SAN? Also, which version of PRTG are you running at this time?
Can you also check for any firmware updates for the MSA?
Benjamin Day
Paessler Support
Dec, 2018 - Permalink
We have been longer exactly the same issue. We increased the Scanning-Intervals, Shell- and SSH-Timeous, but with no Luck. The sensors going randomly Down. -> When we Restart the PRTG-Probe/Node, the Problem it does not occur for a certain time. Four us is not 100% exactly clear, that this is a SAN-Problem only.
PRTG-Version is 19.2.50.2842+ (latest stable)
Please understood, that an Firmware-Update for a SAN is not an Ad-Hoc-just for try Task, the Systems has to be available. Since we need a more precise statement from you, where the problem occurs
Please give us a accurate Guidance or analyze this Bug from PRTG-Perspective (is 100% clear, the PRTG-SSH-Sensor-Implementation works properly at this section?)
Thank you very much.
Aug, 2019 - Permalink
Hello,
The only other thing you can try to fix mutex errors is to break out your single device into multiple devices, and space out the sensors. By creating more instances of the device, the scanning is more spaced out, and new SSH sessions are created. This can help sometimes to mitigate the mutex errors.
Benjamin Day
[Paessler Support]
Aug, 2019 - Permalink
Hello!
Since my last post we changed the way of monitoring our SAN-device. I delegated this problem to dedicated SAN administrator and after some experiments he deciced to change SSH monitoring to monitoring by API. API requests are much much fasater and stable in compare to SSH, now we can use scanning interval 60 second for most of sensors and we dramatically decrease the sesnors quantity. We user Powershell to work with SAN API.
Thank you!
Aug, 2019 - Permalink
Vasiliy,
Glad to hear this, and thanks for sharing the idea!
Benjamin Day
[Paessler Support]
Aug, 2019 - Permalink
@Vasiliy, is there any chance you could share your PowerShell scripts to monitor the MSA? We're facing the same issue with the mutex timeouts on our MSAs.
Oct, 2019 - Permalink
Hello sarbuk!
The solution was created by internal system administrator and only he understand our needs and the scripts logic, i'm sorry but he a very busy guy who hasn't habit to make a manuals. All that i can say what we had quite specific tasks before we started to write the scripts, so we haven't one gereric script which replace RPTG native sensor.
Oct, 2019 - Permalink
Is it possible to restart the SSH server service on the SAN? It might just be something with the SSH service. That would prevent you from having to reboot the entire SAN.
Apr, 2018 - Permalink