sc_2111
Enthusiast
Enthusiast

ESX Crash

A ESX301 server crashed . Where can I search for information about what happened ( logs , dump ) so I can try to understand wht the cause was ?

thanks

0 Kudos
9 Replies
frankdenneman
Expert
Expert

ESX places the log files in the /var/log directory.

Try to look at:

/var/log/vmkernel

/vr/log/vmkwarning

search the forum on the error codes found in those logs.

Open a SSH session to the ESX server with tools such as putty.

Blogging: frankdenneman.nl Twitter: @frankdenneman Co-author: vSphere 4.1 HA and DRS technical Deepdive, vSphere 5x Clustering Deepdive series
sc_2111
Enthusiast
Enthusiast

It happened again and on the scrren there was the following error

exception type 14 in word 1031

0 Kudos
CWedge
Enthusiast
Enthusiast

I've had many ESX 3.0.1 crashes,

Each time it has been hardware related. If it continually crashes, 99.9% chance of a Hardware failure...Usually memory,

I'm my cases it was System Boards,CPU's.

I had HP DL580 G3's w/Dual Core..

1 of the times, there were no Indications from any of the HP Diag Software that anything had failed.

My only saving grace with HP was that I have 11 IDENTICAL servers in a farm all set up the same way...So i said "Why is this one Crashing?"

They wanted me to upgrade Bios's firmware and I kept telling them "NO the other 10 at the same Bios level are FINE and have been for 1 YR"

It ended up being one of the Cores on the Processor, but only happened when something like ESX accesses the 4th Core.. Took them about 2 Weeks and replacing every part in the System before figuring that out.

Depending on the System, the VMWare Crash Anaylst team can tell you what piece of hardware failed.

CWedge
Enthusiast
Enthusiast

It happened again and on the scrren there was the

following error

exception type 14 in word 1031

I just looked at my OLD PSOD (pink screen) snapshot..

I had Exception Type 14 in World 1211 and it was a hardware fault.

I'm almost 100% certain you have a hardware fault, This last one I had was a system board/E1000 problem...One killed the other I don't know which one killed which one...but I had to replace them both.

0 Kudos
trinc4me
Contributor
Contributor

Are you running the HP Insight Manager agent for ESX? We have HP DL585's with quad dual core CPU's. I was just wondering if you were running the agent, and if so did it log anything?

0 Kudos
rubensluque
Enthusiast
Enthusiast

Update the BIOS and firmware of the server and all other devices such as HBA's, SCSI Controllers, network cards to last versions.

0 Kudos
CWedge
Enthusiast
Enthusiast

Update the BIOS and firmware of the server and all

other devices such as HBA's, SCSI Controllers,

network cards to last versions.

You sound like the HP guy.

Don't do this...I'm going through much pain because of this now...

1 Server is upgraded, and is NOT COMPATIBLE with the others.

I can't Vmotion anymore, the HP New Bios have Intel Virtualization Technology and the old one doesn't

So now I'm stuck HIDING the NX flag for 200+ Servers..

SYS ADMIN 101 - NOT A GOOD IDEA TO UPGRADE BIOS/FIRWARE WITHOUT TESTING.

0 Kudos
grasshopper
Virtuoso
Virtuoso

review the vmware.log of the VM that caused the fault. There may be more verbose information there. Also, there may be a crash dump (of the VM, not the vmkernel) in the config file folder of the offending VM.

The virtual machine dump can be provided to VMware or reviewed with something such as Microsoft Visual Studio.

Message was edited by: grasshopper

Ooops... meant to reply to the OP

0 Kudos
CWedge
Enthusiast
Enthusiast

review the vmware.log of the VM that caused the

fault. There may be more verbose information there.

Also, there may be a crash dump (of the VM, not the

vmkernel) in the config file folder of the offending

VM.

The virtual machine dump can be provided to VMware or

reviewed with something such as Microsoft Visual

Studio.

There is a command to review the crash dump within ESX.

I'll see if I can dig it up..

0 Kudos