A ESX301 server crashed . Where can I search for information about what happened ( logs , dump ) so I can try to understand wht the cause was ?
thanks
ESX places the log files in the /var/log directory.
Try to look at:
/var/log/vmkernel
/vr/log/vmkwarning
search the forum on the error codes found in those logs.
Open a SSH session to the ESX server with tools such as putty.
It happened again and on the scrren there was the following error
exception type 14 in word 1031
I've had many ESX 3.0.1 crashes,
Each time it has been hardware related. If it continually crashes, 99.9% chance of a Hardware failure...Usually memory,
I'm my cases it was System Boards,CPU's.
I had HP DL580 G3's w/Dual Core..
1 of the times, there were no Indications from any of the HP Diag Software that anything had failed.
My only saving grace with HP was that I have 11 IDENTICAL servers in a farm all set up the same way...So i said "Why is this one Crashing?"
They wanted me to upgrade Bios's firmware and I kept telling them "NO the other 10 at the same Bios level are FINE and have been for 1 YR"
It ended up being one of the Cores on the Processor, but only happened when something like ESX accesses the 4th Core.. Took them about 2 Weeks and replacing every part in the System before figuring that out.
Depending on the System, the VMWare Crash Anaylst team can tell you what piece of hardware failed.
It happened again and on the scrren there was the
following error
exception type 14 in word 1031
I just looked at my OLD PSOD (pink screen) snapshot..
I had Exception Type 14 in World 1211 and it was a hardware fault.
I'm almost 100% certain you have a hardware fault, This last one I had was a system board/E1000 problem...One killed the other I don't know which one killed which one...but I had to replace them both.
Are you running the HP Insight Manager agent for ESX? We have HP DL585's with quad dual core CPU's. I was just wondering if you were running the agent, and if so did it log anything?
Update the BIOS and firmware of the server and all other devices such as HBA's, SCSI Controllers, network cards to last versions.
Update the BIOS and firmware of the server and all
other devices such as HBA's, SCSI Controllers,
network cards to last versions.
You sound like the HP guy.
Don't do this...I'm going through much pain because of this now...
1 Server is upgraded, and is NOT COMPATIBLE with the others.
I can't Vmotion anymore, the HP New Bios have Intel Virtualization Technology and the old one doesn't
So now I'm stuck HIDING the NX flag for 200+ Servers..
SYS ADMIN 101 - NOT A GOOD IDEA TO UPGRADE BIOS/FIRWARE WITHOUT TESTING.
review the vmware.log of the VM that caused the fault. There may be more verbose information there. Also, there may be a crash dump (of the VM, not the vmkernel) in the config file folder of the offending VM.
The virtual machine dump can be provided to VMware or reviewed with something such as Microsoft Visual Studio.
Message was edited by: grasshopper
Ooops... meant to reply to the OP
review the vmware.log of the VM that caused the
fault. There may be more verbose information there.
Also, there may be a crash dump (of the VM, not the
vmkernel) in the config file folder of the offending
VM.
The virtual machine dump can be provided to VMware or
reviewed with something such as Microsoft Visual
Studio.
There is a command to review the crash dump within ESX.
I'll see if I can dig it up..