esxi 4.0 crashed 3 times, and no longer boots up n...

netdefilr · ‎06-10-2009

I built a whitebox that consists of:

i7 920

MSI x58 Platinum

2xIntel NICs

1 Hard drive

And the setup was working fine for 2 days with 2 active VM's running. Then, I created a CentOS 5 VM and started cloning vm's using FOG (fogproject.org). When I had about 5 VM's running I received esxi's version of a blue screen of death.

I didn't type up the whole thing, but I got the important lines I believe:

#PF Exception(14) in world 16827:vmm0:Windows ip 0x418015522084 addr 0xffffffffffffffc6

0xfffffffffc083c98:[0xfffffffffc21aab8]Unknown stack: 0x0

VMK checksum BAD: 0xa812399487df2a8f 0x14cf5f51412fcc6f

FSbase (0x0) GSbase (0x0) kernelGSbase (0x7fffffac000)

initialization for tpn_tis failed with -19.

Starting coredump to disk

using lsot 1 of 1... 98766666666543210 DiskDump Successful.

Unfortunately I have no performance graphs after the reboot. I would like to know if it was a hard drive i/o problem (pretty much all the vm's were idle or booting up, except for a fog backup.)

The first crash I noticed I didn't have execute bit protection turned on from the error message, I turned this on in the bios.

The second crash happened somewhat of the same way.

The 3rd crash I had 6 VM's running idle successfully until it crashed 1/2 hours later. The VM's were CentOS 5 64 bit, and Windows Server 2003 64 bit if that makes any difference.

I tried rebooting last night and it didn't appear to boot up properly, it was late so I gave up on this.

Any thoughts on what caused this? Is it esxi 4 (being new), i/o(1 hard disk), or ?.

Also, if it makes any difference I made a second machine that is running 3 VM's and it's been on for 2 weeks now without a problem. The differences between these boxes are:

2 hard drives -- some vms are on one, and on the other.

Memory is set to default 1066 frequency. While the box with the problem is set to 1600.

Thanks,

Tim

zhangtong3910 · ‎06-15-2009

What's the Hard drive in your box? Are they officially supported?

netdefilr · ‎06-16-2009

I believe the sata controller on the motherboard is supported or compatible.

It actually wasn't broke, it did boot into the esxi hypervisor when I got home.

Since I've only had 3 VM's running at one time and I haven't had any crashes.

The rest of the specs of the system should support more than that, it's just the

1 hard drive on the system which I think is the bottleneck.

Datto · ‎06-16-2009

You might want to run Memtest86 (http://www.memtest86.com) on your box for six complete run-throughs of all your memory or overnight, whichever is longer. You can download the ISO, burn it to a CD, boot your box off the CD and run Memtest86 to test your system memory.

Datto

DSTAVERT · ‎06-16-2009

Have you over commintted resources. How much RAM.

Start fewer VM's. Start one and see what happens. Make everything as simple as possible and test for a period of time. Starting all the machines and having the same crash over and over isn't a way to detect the problem.

-- David -- VMware Communities Moderator

MrCab · ‎06-16-2009

I'm having a similar problem. Post is here:

http://communities.vmware.com/thread/215646

I'm not sure what my problem is yet for certain, but my current suspected culprit is a crash whenever ESX itself attempts to access Virtual Memory.

Anyway, I'm not here to hijack the thread so much as I suspect we have the same problem. Happy hunting!

DSTAVERT · ‎06-16-2009

You NEED to check to see if your equipment is on the HCL. Just wanting it to run isn't enough. ESXi and especially 4 is very dependent on the hardware and so makes great demands. It is a server operating system. Real servers usually come as a package with all components well tested and certified to run the vmware product.

As was posted earlier check on to see if your exact hardware is listed and if there are any workarounds for marginally suported ones.

-- David -- VMware Communities Moderator

All

esxi 4.0 crashed 3 times, and no longer boots up now.