VMware Cloud Community
gdmict
Contributor
Contributor

ESXi server shuts down at random

Dears,

our ESXi server (4.0) is shuting down on random basis. It was working fine for ~2 years, but recently, it stops. Sometimes, it's several times a day, sometimes several times a week.

We are trying to deal this issue with HP support. We have upgraded BIOS to version HP System BIOS P56 2011-05-02. After the upgrade, it was working for ~3-4 days without restart, but yesterday, it restarted again several times.

We also moved the majority of our VM's outside this server, though there are still 4 servers running there.

Some system info:

VMWare ESX 4.0.0 build-181792 2009-07-30

VMWare, Inc. VMware ESXi Alternate Boot Bank 4.0.0-0.6.181792

bnx2 driver 1.6.9

HPSmart Array controller 4.12

bnx2 device firmware 1.9.6

=====

Now ... As I am writing this post, I found a strange thing on the ESXi VMWare vSphere console. In Configuration -> Health Status -> Power, I see following entries:

System Board 1 Power Meter - Device enabled,Normal,240 Watts

Power Supply 1: Running/Full Power-Enabled,Normal,0 Watts

Power Supply 2: Running/Full Power-Enabled,Normal,0 Watts

Power Supply 3 Power Supplies - Fully redundant,Normal,

Power Supply 2 Power Supply 2: Failure detected - Deassert, Normal

Power Supply 1 Power Supply 1: Failure detected - Deassert, Normal

All the status is displayed as 'normal', although I see 'Failure detected' message there .. When I access the server over the ILO2 interface, I see health status says:

Fans: Ok; Fully Redundant

Temperatures: Ok

VRMs: Ok

Power Supplies: Ok; Fully Redundant

In the ILO2 log I see following entries:

Server power restored.

Power-On signal sent to host server by: Administrator.

Browser login: Administrator - 10.50.12.109(DNS name not found).

Browser logout: Administrator - 10.50.12.109(DNS name not found).

Server power removed.

Server reset.

Server power restored.

Power-On signal sent to host server by: Administrator.

Browser login: Administrator - 10.50.12.109(DNS name not found).

Any thoughts what could be wrong? Do you think it may be power supply? If so, can I gather more details about the power, so I am sure?

Your help is highly appreciated. Thank you for your time in advance,

Juraj

0 Kudos
1 Reply
gdmict
Contributor
Contributor

Again,

server rebooted .. Any idea, what the log bellow means?

Power Supply 2 Power Supply 2: Failure detected - Deassert, Normal

Power Supply 1 Power Supply 1: Failure detected - Deassert, Normal

0 Kudos