We're running vcenter and esxi 7.0.3 on Dell R730 servers with dual power supplies. We have a power supply alert in the vcenter monitoring on one host, but the host itself does not show any errors. All lights are green, there's no error on the front led panel and the diagnostics come back clean. This happened after PDUs were replaced in the datacenter racks, so I suspect that was the cause, but I would expect the hardware itself to show an error.
Has anyone else here seen a case like this? I suspect I'll need some justification to request a new PS under warranty or to get the datacenter managers to troubleshoot without alerts on the hardware.
Connect to iDRAC and consult the machine logs, considering that the PDUs have been replaced and if the circumstance persists or repeats, make sure that the plugs are correctly connected to the sockets, I have already seen "problems" (more polite euphemism) like these more than once.
Thanks. We did check the connections right after we first saw the problem. All tight, so I wonder if there could be a voltage issue that would run the green leds but nothing else. Still weird that it wouldn't show up on the server itself, though.
Someone suggested checking the PS firmware version, so I'm heading down to do that.
Well, a "loss of power input" (and the like) do not imply a hardware failure and may be so short-lived that in the time it takes to travel to the site for a visual inspection, looking at the lights or front panel accomplishes nothing, that's why I told you to consult the machine LOGs via iDRAC. There may be machine firmware updates (including PSUs) that you haven't applied yet, but these are choices based on considerations I really can't comment on.
Thanks. The alert has been on for over a week and comes on after resetting to green or rebooting. I've checked the console logs, but they don't show any errors for the power supply. That didn't completely surprise me since the diagnostics came back clean and the server itself doesn't show any errors.
I just checked the lifecycle controller logs (idrac), and it shows an error for the power supply on July 17. No errors since then, even with reboots and esxi updates. It's like vCenter has cached this alert and keeps referring back to it.
ETA: OK, I see now in the LC hardware inventory that PS1 is getting zero volts. I switched the PS cables and the same PS still has the error, so it's definitely the PS and not the PDU.
Getting ready to see if I can update from the LC. VLAN/Proxy makes it tricky.
Well, a "zero Volt" reading is the same as a "loss of input power" which in itself does not necessarily qualify as a hardware malfunction, obviously by inverting the power cords the permanence of the anomalous reading recommends replacing that PSU without a second thought.
The important thing is that you have accurately identified the problem everything else philosophy.