VMware Cloud Community
Herb0ne
Enthusiast
Enthusiast
Jump to solution

ESXi 6.0 keeps randomly crashing

Hello, my ESXi server keeps randomly crashing. At first I thought it was causeb by a faulty GPU which I passed through, but after swapping it

the server still keeps crashing. I took a look @ the vmkernel log and saw some errors with the datastore but the log went on until suddently a dumpfile was created "/var/core/hostd-worker-zdump.001"

Unluckily I can't open this file with cat through ssh. Maybe someone can help me with that server crash issue?

Kind Regards

Herb

Tags (4)
0 Kudos
1 Solution

Accepted Solutions
Herb0ne
Enthusiast
Enthusiast
Jump to solution

It was the onboard usb 3 controller. After some research I bought a pcie controller card with a Texas Instruments chip on it. and I can fully recommend a controller with TI chip, win8.1 no problem.

So no more hang ups and things like that. Time to put that baby(server) on the network Smiley Wink

View solution in original post

0 Kudos
7 Replies
Linjo
Leadership
Leadership
Jump to solution

What server hardware and GPU are you using?

Did it ever work ok?

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
Herb0ne
Enthusiast
Enthusiast
Jump to solution

Hi Linjo, thank you for the quick reply!

Hardware:

CPU: FX830

MB: Asus 990x Evo R2

Ram: Ares 16 GB DDR3 1600MHz

GPU: Nvidia K5000 or AMD HD6670 -> crashing with both

RAID Controller: FastTrak TX2300 -> only passed through to a VM, ESXi is not installed on RAID

NIC1: Intel 82574L

NIC2: Realtek 8168 -> Driver injected

Yes the hardware worked perfect until out of a sudden it crashed. The server doesn't reboot I think because I have to hit the reset button on the server to bring it back up again.

It's more like a freeze, I can't enter the host through vsphere or ssh but the server is still running. Maybe it's in a purple screen and doesnt reboot. Because I've passed through my GPU to a virtual machine I don't see

the purple screen(if there is any). Thats also the reason, why I try to debug the server logs. Do you have any suggestions which other logs I should check?

I really don't think it's the hardware which is crashing due to overheat or something like that, because than the server would just reboot and I would be able to connect to the host again through vsphere client.

0 Kudos
CoolRam
Expert
Expert
Jump to solution

I will support to go and check vmware hardware compatibility page and check for the all the device you used is supported .

If you find any answer useful. please mark the answer as correct or helpful.
0 Kudos
cykVM
Expert
Expert
Jump to solution

Just to rule that out I would run a memtest for a while to see if one of the modules probably went bad.

0 Kudos
Herb0ne
Enthusiast
Enthusiast
Jump to solution

Thank you all for your reply!

it's now running a whole day and no crash. I think the problem could be an onboard USB 3 controller which I passed through to the vm. Sometimes it isn't inizialized correctly(after the vm crash)

and I'm not able to use it in the vm until I do a clean reboot of the host server. So I just let the server + vms settle with that faulty initialized usb controller and see! no more crash. At least for 24 hours now.

I will leave the system as long as possible like that and if there is no crash after say some week I'll report back. Maybe I'll go for a pcie usb controller card which I can pass through. Every non onboard stuff I passed through worked like a charm

but every time I try to passthrough some onboard devices things go mad:smileylaugh:

Kind regards

Herb

0 Kudos
Herb0ne
Enthusiast
Enthusiast
Jump to solution

It was the onboard usb 3 controller. After some research I bought a pcie controller card with a Texas Instruments chip on it. and I can fully recommend a controller with TI chip, win8.1 no problem.

So no more hang ups and things like that. Time to put that baby(server) on the network Smiley Wink

0 Kudos
cgb1912
Contributor
Contributor
Jump to solution

I was experiencing the same symptoms but it was unrelated to VM-passthrough in my case.  I thought I'd post in case anyone else benefits from my pain (I've lost a few hours trying to work out what was going on).

In my case, with brand new Lenovo x3650's, ESX 6.0 installed and testing going great.  Then after reboot, some of the ESX servers could not remain connected.  I tracked it down to IBM/Lenovo's IMM management interface.  Virtual media was not mounted, but while it was separately 'activated' in the virtual media menu.  While this was the case, it was passing through a USB media device of 2GB in size.  ESX was tripping on that and hostd/vpxa was crashing.  Argh!

Hope this helps somebody else.

0 Kudos