VMware Cloud Community
AndrewAdvnetsol
Contributor
Contributor

ESX Host Freezes Randomly

I have an ESX 6.5.0 Update 2 (Build 8294253) that is a standalone host.  It randomly locks up.  I know it is locked up because my running VM isn't accessible, I cannot connect to the host using the web gui, and when I got to the console I cannot login.  I cannot even get a response from the keyboard by pushing the num lock key.  My only option is to power off the physical server and then turn it on again.

I am new to dealing with VM Logs.  Is there anything I need to turn on or setup to better log what is happening?  What should I be looking for in the log files and in what log file should I be looking?

I generated a support bundle from the last 2 times the server locked up so I can post logs if anyone would like to look at them.

I appreciate any help anyone has to offer.

Thank you.

15 Replies
daphnissov
Immortal
Immortal

What is the hardware on which you are running this ESXi host?

Reply
0 Kudos
AndrewAdvnetsol
Contributor
Contributor

It is a brand X box that was laying around and unused.  It has a Xeon E3-1220 V2 CPU.  It has 16GB of Ram.  I did put in another NIC because I thought the onboard NIC might be an issue.  Beyond that I don't know what exactly is in it.

Reply
0 Kudos
daphnissov
Immortal
Immortal

So totally unsupported hardware to begin with then.

Reply
0 Kudos
AndrewAdvnetsol
Contributor
Contributor

Sorry it took me so long to get back to you.  Yes it is unsupported hardware, though I have 4 or 5 other servers running on unsupported hardware and have no issues.

Reply
0 Kudos
nachogonzalez
Commander
Commander

Hi AndrewAdvnetsol
Do you have remote console access to the server to check if there is a PSOD or to check the logs?
Do you have a Syslog?

Are you using external storage (SAN, NAS, iSCSI, etc)?

VMKernel.log might be a good hint on this.

but based on what you said regarding the unsupported hardware, my guess is that is is causing the issues. 

Looking forward to hearing from you

Regards

Reply
0 Kudos
AndrewAdvnetsol
Contributor
Contributor

Yes I have console access.  When it locks up there is no PSOD.  It is just sitting at the console screen, but I don't get any response out of the keyboard so I am unable to login.

I do have logs, but I do not know what I am looking for in the logs.

There is no external storage.

Attached are the logs.

Reply
0 Kudos
tayfundeger
Hot Shot
Hot Shot

Which ESXi host is usually caused by the storage connection. In cases such as APD or PDL on ESXi host, ESXi host can be locked. You cannot log in to the console or via the web gui. Check out the Task event section? Do you see warnings like Lost access volume here?

Also, what is the hardware brand model? Didn't you use ESXi custom ISO?

--
Blog: https://www.tayfundeger.com
Twitter: https://www.twitter.com/tayfundeger

vBlogger, vExpert, Cisco Champions

Please, if this solution helped your problem, "Helpful" if it solves your problem "Correct Answer" to mark.
Reply
0 Kudos
tayfundeger
Hot Shot
Hot Shot

You are having a problem accessing the disks. This may be a driver or firmware problem or a defective part. What is Hardware's brand model? Did you install with Custom ESXi ISO?

--
Blog: https://www.tayfundeger.com
Twitter: https://www.twitter.com/tayfundeger

vBlogger, vExpert, Cisco Champions

Please, if this solution helped your problem, "Helpful" if it solves your problem "Correct Answer" to mark.
Reply
0 Kudos
nachogonzalez
Commander
Commander

hey bud, can you please provide the timeframe the error ocurred on that log bundle?

warm regards

Reply
0 Kudos
AndrewAdvnetsol
Contributor
Contributor

I don't know the exact time but I think somewhere right around 10:20 pm on March 21st based on failed backups.

Reply
0 Kudos
AndrewAdvnetsol
Contributor
Contributor

How did you determine I was having trouble accessing the disks?  I don't know what brand it is off hand.  All I know off hand is that it has a Xeon E3-1220 V2 CPU on the sandybridge paltform.  It has 16GB of Ram.  I did put in another NIC because I thought the onboard NIC might be an issue.  Beyond that I don't know what exactly is in it.

Reply
0 Kudos
AndrewAdvnetsol
Contributor
Contributor

Sorry I forgot to tell you that it is not a custom ISO for ESXi.  It is the standard one downloaded from the VMware website.

nachogonzalez
Commander
Commander

I see two interesting things:

1. The VMKernel.log you've provided is filled with this entry:

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14635: Admission failure in path: hostd/python.69553/uw.69553

2020-03-19T15:14:18.686Z cpu3:69553)MemSched: 14642: uw.69553 (22449) extraMin/extraFromParent: 64/64, hostd (705) childEmin/eMinLimit: 82684/82688

Please check
VMware Knowledge Base

2. There are lots of SCSI errors

020-03-21T14:35:06.033Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x4395008c9b80) 0x1a, CmdSN 0x75cc from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T14:59:09.876Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x4395008ecf00) 0x1a, CmdSN 0x762c from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T15:19:10.114Z cpu2:66064)ScsiDeviceIO: 2954: Cmd(0x4395009f8b00) 0x1a, CmdSN 0x768b from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T15:35:38.121Z cpu3:66064)ScsiDeviceIO: 2954: Cmd(0x439500935580) 0x1a, CmdSN 0x76cb from world 0 to dev "naa.600605b006eb6730220acb13229f15cf" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T15:41:08.184Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x439500954300) 0x1a, CmdSN 0x76ea from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T16:05:38.200Z cpu3:66064)ScsiDeviceIO: 2954: Cmd(0x439500976c00) 0x1a, CmdSN 0x31e from world 67393 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T16:35:08.873Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395009d4d00) 0x1a, CmdSN 0x77b4 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T16:59:11.345Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395009d1c00) 0x1a, CmdSN 0x7814 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T17:19:11.541Z cpu2:66064)ScsiDeviceIO: 2954: Cmd(0x43950098c300) 0x1a, CmdSN 0x7873 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T17:35:38.481Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x43950098f780) 0x1a, CmdSN 0x78ae from world 0 to dev "naa.600605b006eb6730220acb13229f15cf" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T17:41:09.790Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x439500991000) 0x1a, CmdSN 0x78d2 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T18:05:38.562Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x439500906180) 0x1a, CmdSN 0x33e from world 67393 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T18:35:11.591Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395009cd980) 0x1a, CmdSN 0x799c from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T18:59:12.665Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x439500927580) 0x1a, CmdSN 0x79fc from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T19:05:38.724Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x43950094f600) 0x4d, CmdSN 0x349 from world 67393 to dev "naa.600605b006eb6730220acb13229f15cf" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2020-03-21T19:19:12.895Z cpu2:66064)ScsiDeviceIO: 2954: Cmd(0x439500948d00) 0x1a, CmdSN 0x7a5b from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T19:41:11.327Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x43950094d680) 0x1a, CmdSN 0x7aba from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T20:05:38.908Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x439500993680) 0x1a, CmdSN 0x35e from world 67393 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T20:35:14.279Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x43950097e300) 0x1a, CmdSN 0x7b84 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T20:35:38.984Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x4395008d7800) 0x1a, CmdSN 0x7b8f from world 0 to dev "naa.600605b006eb6730220acb13229f15cf" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T20:59:14.628Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395009d8500) 0x1a, CmdSN 0x7be4 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T21:19:14.991Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395008cde00) 0x1a, CmdSN 0x7c43 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T21:41:12.823Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395008e8c80) 0x1a, CmdSN 0x7ca2 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T22:05:39.279Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395009b6300) 0x1a, CmdSN 0x37e from world 67393 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T22:35:17.392Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x43950099f700) 0x1a, CmdSN 0x7d6c from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T22:35:39.355Z cpu3:66064)ScsiDeviceIO: 2954: Cmd(0x43950091f080) 0x1a, CmdSN 0x7d72 from world 0 to dev "naa.600605b006eb6730220acb13229f15cf" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T22:59:16.899Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x439500919900) 0x1a, CmdSN 0x7dcc from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T23:19:17.279Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395008dc500) 0x1a, CmdSN 0x7e2b from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-21T23:41:14.644Z cpu2:66064)ScsiDeviceIO: 2954: Cmd(0x43950098d800) 0x1a, CmdSN 0x7e8a from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-22T00:05:39.623Z cpu3:66064)ScsiDeviceIO: 2954: Cmd(0x4395008f5400) 0x4d, CmdSN 0x399 from world 67393 to dev "naa.600605b006eb6730220acb13229f15cf" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2020-03-22T00:05:39.633Z cpu3:66064)ScsiDeviceIO: 2954: Cmd(0x439500933d00) 0x1a, CmdSN 0x39e from world 67393 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-22T00:35:19.837Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395009b1600) 0x1a, CmdSN 0x7f54 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-22T00:59:18.861Z cpu2:66064)ScsiDeviceIO: 2954: Cmd(0x4395009b7100) 0x1a, CmdSN 0x7fb4 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-22T01:19:19.224Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x4395009c4300) 0x1a, CmdSN 0x8013 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-22T01:35:39.898Z cpu1:66064)ScsiDeviceIO: 2954: Cmd(0x4395009ece00) 0x1a, CmdSN 0x8053 from world 0 to dev "naa.600605b006eb6730220acb13229f15cf" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-22T01:41:16.556Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x4395009f1e80) 0x1a, CmdSN 0x8072 from world 0 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2020-03-22T02:05:39.995Z cpu0:66064)ScsiDeviceIO: 2954: Cmd(0x4395008b9500) 0x1a, CmdSN 0x3be from world 67393 to dev "naa.600605b006eb6730220acb13229ebeae" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 

This might indicate that you are having a driver issue, please update to the latest driver and get back

Warm regards

Reply
0 Kudos
AndrewAdvnetsol
Contributor
Contributor

Thank you very much for finding that.  When it comes to drivers and VMware I have never updated drivers before.  Is this done the the host through web GUI and going to Manage -> Packages -> Install Updates? It looks like it is my MagaRaid drive that needs to be updated.  The 2 devices listed our my Local LSI Disks, which list a model of MR9271-8i.  When I look at Storage -> Adapters I see one that is using a driver lsi_mr3.  I assume that is the driver I want to update.

Correct me if I am wrong on any of this.

Thank you again for all your help.

Reply
0 Kudos
nachogonzalez
Commander
Commander

You can use this KB
VMware Knowledge Base

you will need to connect via SSH to the esxi host and find the proper driver.
As you said your driver is a clone and might not be supported, so you might no find the latest drivers or the dirvers to a compatible version.

If you found this helpful, please rate the answer as helpful.
Warm regards

Reply
0 Kudos