VMware Cloud Community
mediatr
Contributor
Contributor

Esxi and all VMs stop responding

Hi,

My ESXi 5.5 reccurently and suddenly stop responding and all VMs are not pingable.

I can't fin a workaround for this. Please help.

This is what i found in my hostd.log :

2018-07-18T21:25:00.022Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMkernel log for detailed error information

2018-07-18T21:25:00.022Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2018-07-18T21:25:01.283Z [FFBCFB70 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 17500MHz/60725MB to 17500MHz/60734MB

2018-07-18T21:25:02.000Z [6C2E1B70 verbose 'SoapAdapter'] Responded to service state request

2018-07-18T21:25:08.283Z [6C8C1B70 info 'Solo.Vmomi' opID=hostd-d353 user=root] Activation [N5Vmomi10ActivationE:0x6c752a88] : Invoke done [waitForUpdates] on [vmodl.query.PropertyCollector:ha-property-collector]

2018-07-18T21:25:08.283Z [6C8C1B70 verbose 'Solo.Vmomi' opID=hostd-d353 user=root] Arg version:

--> "297"

2018-07-18T21:25:08.283Z [6C8C1B70 info 'Solo.Vmomi' opID=hostd-d353 user=root] Throw vmodl.fault.RequestCanceled

2018-07-18T21:25:08.283Z [6C8C1B70 info 'Solo.Vmomi' opID=hostd-d353 user=root] Result:

--> (vmodl.fault.RequestCanceled) {

-->    dynamicType = <unset>,

-->    faultCause = (vmodl.MethodFault) null,

-->    msg = "",

--> }

2018-07-18T21:25:08.284Z [FFBCFB70 error 'SoapAdapter.HTTPService.HttpConnection'] Failed to read header on stream <io_obj p:0x6c706508, h:84, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

2018-07-18T21:25:14.561Z [6CDC2B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root

2018-07-18T21:25:20.021Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMkernel log for detailed error information

2018-07-18T21:25:20.021Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2018-07-18T21:25:40.021Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMkernel log for detailed error information

2018-07-18T21:25:40.021Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2018-07-18T21:26:00.021Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMkernel log for detailed error information

2018-07-18T21:26:00.021Z [6C2E1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2018-07-18T21:26:01.284Z [6C8C1B70 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 17500MHz/60734MB to 17500MHz/60733MB

2018-07-18T21:26:20.022Z [6C860B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMkernel log for detailed error information

2018-07-18T21:26:20.022Z [6C860B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

Reply
0 Kudos
27 Replies
Devi94
Hot Shot
Hot Shot

These are warning messages, not related to your issue. can you share vmkernel log at the time of host went down.

Reply
0 Kudos
mediatr
Contributor
Contributor

Thank you for your reply.

This is what i have on wmkernel.log

2018-07-18T18:19:45.260Z cpu5:36204)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e84906980, 34471) to dev "naa.600508e000000000d50a48286be63004" on path "vmhba1:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2018-07-18T18:19:45.260Z cpu5:36204)ScsiDeviceIO: 2337: Cmd(0x412e84906980) 0x85, CmdSN 0x3f0 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T18:19:45.261Z cpu5:36204)ScsiDeviceIO: 2337: Cmd(0x412e84906980) 0x4d, CmdSN 0x3f1 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T18:19:45.261Z cpu5:36204)ScsiDeviceIO: 2337: Cmd(0x412e84906980) 0x1a, CmdSN 0x3f2 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-07-18T18:33:47.727Z cpu4:33423)MigrateNet: vm 33423: 2096: Accepted connection from <::ffff:46.72.53.0>

2018-07-18T18:33:47.727Z cpu4:33423)MigrateNet: vm 33423: 2138: data socket size 0 is less than config option 562140

2018-07-18T18:33:47.727Z cpu4:33423)MigrateNet: vm 33423: 2166: dataSocket 0x4109bc137cd0 receive buffer size is 562140

2018-07-18T18:33:52.727Z cpu4:33423)Migrate: 208: Error reading from pending connection: Connection closed by remote host, possibly due to timeout

2018-07-18T18:49:45.238Z cpu9:36119)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e807ba640, 34471) to dev "naa.600508e000000000d50a48286be63004" on path "vmhba1:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2018-07-18T18:49:45.238Z cpu9:36119)ScsiDeviceIO: 2337: Cmd(0x412e807ba640) 0x85, CmdSN 0x3f3 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T18:49:45.238Z cpu9:36119)ScsiDeviceIO: 2337: Cmd(0x412e807ba640) 0x4d, CmdSN 0x3f4 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T18:49:45.238Z cpu9:36119)ScsiDeviceIO: 2337: Cmd(0x412e807ba640) 0x1a, CmdSN 0x3f5 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-07-18T19:19:45.214Z cpu9:36252)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e808196c0, 34471) to dev "naa.600508e000000000d50a48286be63004" on path "vmhba1:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2018-07-18T19:19:45.214Z cpu9:36252)ScsiDeviceIO: 2337: Cmd(0x412e808196c0) 0x85, CmdSN 0x3f6 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T19:19:45.214Z cpu9:36252)ScsiDeviceIO: 2337: Cmd(0x412e808196c0) 0x4d, CmdSN 0x3f7 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T19:19:45.214Z cpu9:36252)ScsiDeviceIO: 2337: Cmd(0x412e808196c0) 0x1a, CmdSN 0x3f8 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-07-18T19:49:45.194Z cpu3:34203)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e807be100, 34471) to dev "naa.600508e000000000d50a48286be63004" on path "vmhba1:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2018-07-18T19:49:45.194Z cpu3:34203)ScsiDeviceIO: 2337: Cmd(0x412e807be100) 0x85, CmdSN 0x3f9 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T19:49:45.194Z cpu3:34203)ScsiDeviceIO: 2337: Cmd(0x412e807be100) 0x4d, CmdSN 0x3fa from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T19:49:45.194Z cpu3:34203)ScsiDeviceIO: 2337: Cmd(0x412e807be100) 0x1a, CmdSN 0x3fb from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-07-18T20:19:45.182Z cpu7:36118)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e8077c7c0, 34471) to dev "naa.600508e000000000d50a48286be63004" on path "vmhba1:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2018-07-18T20:19:45.182Z cpu7:36118)ScsiDeviceIO: 2337: Cmd(0x412e8077c7c0) 0x85, CmdSN 0x3fc from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T20:19:45.182Z cpu7:36118)ScsiDeviceIO: 2337: Cmd(0x412e8077c7c0) 0x4d, CmdSN 0x3fd from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T20:19:45.182Z cpu7:36118)ScsiDeviceIO: 2337: Cmd(0x412e8077c7c0) 0x1a, CmdSN 0x3fe from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-07-18T20:49:45.160Z cpu7:161170)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e807f9080, 34471) to dev "naa.600508e000000000d50a48286be63004" on path "vmhba1:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2018-07-18T20:49:45.160Z cpu7:161170)ScsiDeviceIO: 2337: Cmd(0x412e807f9080) 0x85, CmdSN 0x3ff from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T20:49:45.160Z cpu7:161170)ScsiDeviceIO: 2337: Cmd(0x412e807f9080) 0x4d, CmdSN 0x400 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T20:49:45.160Z cpu7:161170)ScsiDeviceIO: 2337: Cmd(0x412e807f9080) 0x1a, CmdSN 0x401 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2018-07-18T21:19:45.129Z cpu3:32859)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e848f0180, 34471) to dev "naa.600508e000000000d50a48286be63004" on path "vmhba1:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2018-07-18T21:19:45.129Z cpu3:32859)ScsiDeviceIO: 2337: Cmd(0x412e848f0180) 0x85, CmdSN 0x402 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T21:19:45.129Z cpu3:32859)ScsiDeviceIO: 2337: Cmd(0x412e848f0180) 0x4d, CmdSN 0x403 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-07-18T21:19:45.129Z cpu3:32859)ScsiDeviceIO: 2337: Cmd(0x412e848f0180) 0x1a, CmdSN 0x404 from world 34471 to dev "naa.600508e000000000d50a48286be63004" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

what is the lun naa.600508e000000000d50a48286be63004 throwing error ? Is it being used ? i doubt if you are in APD/PDL situation.

Reply
0 Kudos
mediatr
Contributor
Contributor

what is the lun naa.600508e000000000d50a48286be63004 throwing error ? Is it being used ?

Frankly I'm not a vSphere expert, and I do not know how to answer that question.

i doubt if you are in APD/PDL situation.

What i can do in this case ?

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

can you go to datastore view and see any of your datastore is associated with that naa id ?

Is your host down still ?

Reply
0 Kudos
mediatr
Contributor
Contributor

Yes I get it. This is my primary HDD.

Reply
0 Kudos
mediatr
Contributor
Contributor

No my host is up now. I've ask for a reboot and I started all VMs.

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

is your host down at this point ?

Reply
0 Kudos
mediatr
Contributor
Contributor

No my host is up now. I've asked for a reboot and I started all VMs.

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

if you are looking for reason why it went down, we need more info about your issue ?

Was the host in PSOD or just hung?

Is it happening frequently ?

what is your hardware ?

is this standalone host ?

Reply
0 Kudos
mediatr
Contributor
Contributor

It is a dedicated server hosted at ikoula (a hosting service provider).

Attached you will find two images.

Reboot : the last down time. as you can see it's frequent but he does not have a definite period. it can happen the day as it can happen once time a month.

Serveur Physique : the general information about the server.

Please let me know if you need another information.

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

what is the status of host when you reboot it ?

what is your hardware model (HP, lenovo, dell,cisco)?

Reply
0 Kudos
mediatr
Contributor
Contributor

It's an Intel.

Intel® Xeon® Processor E5-1650 v2 (12M Cache, 3.50 GHz) Product Specifications

Hard Drive : 2 TB S-ATA

Memory : 4 x 16 GB DDR3 / DDR4

Raid card : 2308 card Gen-3, w/o PBSRAM - R0, 1, 10

Connection : 1 Gbs

Raid : Raid 1

Processor : E5-1650v2/v3 6C HT 3,5 GHZ 12 MB cache

Additional Hard Drives (Slot 2) :2 TB S-ATA

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

Please share screenshot of summary page in your esxi host. also status of esxi host before you reboot.

Reply
0 Kudos
mediatr
Contributor
Contributor

Please find attached the screen shot of summary page in my esxi host.

When the host crash and go down I can't access to vSphere Client and the SSH become unreachable also all virtual machines.

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

your hardware is not compatible esxi 5.5 when you are running on unsupported hardware this kind of issues are expected.

Reply
0 Kudos
mediatr
Contributor
Contributor

Thank you for your quick response. please can you advice how to know if my hardware are compatible with any version of esxi ?

Reply
0 Kudos
Devi94
Hot Shot
Hot Shot

you can follow below vmware compatibility guide.

VMware Compatibility Guide - System Search

Reply
0 Kudos
mediatr
Contributor
Contributor

Hi Devi94,

Yesterday my server was down and I was able to connect to the vSphere Client before he goes unrecheable and I found an interesting event who says that :

Bootbank cannot be found at path ' bootbank'

Could this be the possible issue ?

Reply
0 Kudos