From about two weeks my Esxi server on HP Proliant DL 380 G5 shutdown unexpetedly every two days.
After that I must manually restart it.
If I see on the vSphere dashboard the only alert message is this:
Any idea on how to fix it?
Thanks in advance
Hi!
I've solved the issue.
I've replaced the battery pack without solve the problem.
The real problem was related to my APC Smart UPS .
After I've contacted APC support I've simply re-started the UPS by means a very simple procedure that APC' support has mailed to me and all now works fine from about three weeks.
I hope my feedback will be useful to other with similar issue.
That usually means the cache battery on the storage controller is dead and you must replace it. As to whether that is responsible for the ESXi host shutdowns, don't know, but if you are using those internal drives in a RAID configuration and especially using write-back caching, you should plan to replace it ASAP.
Hello m4biz,
I advise logging via ssh and parsing the ESXi host logs, below the files responsible for each function and their respective locations.
Then look at the HP Server logs through HP System Insight Manager..
https://www.hpe.com/us/en/product-catalog/detail/pip.489496.html
For the unexpected shutdown, login to the server's iLO, and check the System Management Logs.
André
Checks if there are any events in the iLO log, as the controller cache may be experiencing problems. It can be hardware with problems or lack of firmware application.
Drivers & software for hp proliant dl380 g5 server:
Installing async drivers in ESXi 5.x and 6.x using esxcli and async driver VIB file (2137854):
https://support.hpe.com/hpesc/public/home/driverHome?sp4ts.oid=1121413&swLangOid=2&swEnvOid=4166
Hugs,
Hi, daphnissov , thanks for your reply.
I'll try asap
No, a failed Smart array battery won't bug your box, but any one or combination of: a failing raid controller, especially the onboard ones, a bad motherboard, bad power supplies, bad power AC/DC power regulators, will.
Yes, are are seeing a sensor for the battery, but that is just coincidence for all the other stuff that can go bad over time for a decade old box.
All the ProLiant sensors just tell you if something is present or not-present, and doesn't account for "works some of the time".
You can try a new battery swap, but the cost of the battery will be the same as replacing the entire box, your call.
You can also try to drop in a better smart array card, but the downtime to replace the entire box is just as long.
Hi Dave.
Sorry for delay in my reply.
Is there anyway to disable this check and stop the continue shutdown without replace the battery pack?
My disks not are in any RAID configuration.
I guess you can try to physically detach battery pack from raid controller.
Hi Finikiez ,
thanks forr your reply.
What happen if I do this?
The server works too?
Cache battery is necessary to avoid data corruption in case of unexpected power loss when write cache is enabled.
So try to disable write cache in controller's BIOS or from ACU cli first.
If this doesn't help, try to detach it physically.
When you disable write cache expect write performance degradation.
The check alone would not be causing the shutdown so there's no need to remove it. Your best bet is to check through the iLO to see whats actually happening as it sounds like it could probably be a hardware fault as you're using a G5 server...Apart from that, check through the ESXi host logs to see if there are any software errors
Hi imacfj , thanks for your reply.
I've just configured iLo 2 and I've founded this:
At this point I think that I must replace the battery pack:
What do you think about?
If you can replace the battery obviously you need to do this.
If you can't - disable write cache on the contoller.
Hi!
I've solved the issue.
I've replaced the battery pack without solve the problem.
The real problem was related to my APC Smart UPS .
After I've contacted APC support I've simply re-started the UPS by means a very simple procedure that APC' support has mailed to me and all now works fine from about three weeks.
I hope my feedback will be useful to other with similar issue.