VMware Cloud Community
kopper27
Hot Shot
Hot Shot

ESXi 5.5 Nort Responding

hi guys

I few hours ago I had ESXi 5.5 Server which showed message in vCenter not responding - I tried reconnecting same issue

services.sh restart same issue and customers started reported  connectivity issues.

so I had to restart the server well first of all no VM got restart even when HA was enabled and second checking logs - vmkernel - hostd.log vmkwarning.log I really did not find much

only this but does not say much, any idea guys, this server firmware was updated like 45 days ago.

The event occurred about 20:30 21:10 hours I am adding vmkernel.log

thanks a lott

This is from vmkwarning.log

2014-03-20T21:38:49.534Z cpu13:8252012)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

2014-03-20T21:38:50.327Z cpu13:8252012)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffa09d3c)

2014-03-20T21:46:44.152Z cpu18:8255698)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

2014-03-20T21:46:45.007Z cpu18:8255698)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff9c5d3c)

2014-03-20T21:53:56.903Z cpu16:33499)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60050768028103c76800000000000007" state in doubt; requested fast path state update...

2014-03-20T21:54:23.197Z cpu6:8257781)WARNING: Tcpip_Vmk: 794: Failed to set default gateway (51): Network unreachable

0:00:00:05.752 cpu0:32768)WARNING: PCI: 994: 0000:00:08.0 bind to PCI bus driver failed with Bad parameter.

0:00:00:05.755 cpu0:32768)WARNING: PCI: 994: 0000:00:1c.0 bind to PCI bus driver failed with Bad parameter.

0:00:00:05.756 cpu0:32768)WARNING: PCI: 994: 0000:00:1e.0 bind to PCI bus driver failed with Bad parameter.

2014-03-20T22:07:14.671Z cpu11:33396)WARNING: APEI: 247: Could not initialize HEST

2014-03-20T22:07:18.727Z cpu6:33437)WARNING: LinuxSignal: 538: ignored unexpected signal flags 0x2 (sig 17)

2014-03-20T22:07:19.312Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

2014-03-20T22:07:19.312Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.0): region #3 not enabled status = bad0017

2014-03-20T22:07:19.312Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.0): num_rsp_queues = 1, num_req_queues = 1

2014-03-20T22:07:19.312Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.0): MSI-X: Enabled (0x2, 0x0).

2014-03-20T22:07:19.368Z cpu6:33437)WARNING: ROM firmware has finished initialization and is ready to process mailbox commands

2014-03-20T22:07:19.548Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

2014-03-20T22:07:19.548Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.1): region #3 not enabled status = bad0017

2014-03-20T22:07:19.548Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.1): num_rsp_queues = 1, num_req_queues = 1

2014-03-20T22:07:19.548Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.1): MSI-X: Enabled (0x2, 0x0).

2014-03-20T22:07:19.839Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

2014-03-20T22:07:19.839Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.0): region #3 not enabled status = bad0017

2014-03-20T22:07:19.840Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.0): num_rsp_queues = 1, num_req_queues = 1

2014-03-20T22:07:19.840Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.0): MSI-X: Enabled (0x2, 0x0).

2014-03-20T22:07:19.849Z cpu6:33437)WARNING: ROM firmware has finished initialization and is ready to process mailbox commands

2014-03-20T22:07:19.999Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

2014-03-20T22:07:19.999Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.1): region #3 not enabled status = bad0017

2014-03-20T22:07:20.000Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.1): num_rsp_queues = 1, num_req_queues = 1

2014-03-20T22:07:20.000Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.1): MSI-X: Enabled (0x2, 0x0).

2014-03-20T22:07:20.383Z cpu7:33315)WARNING: Team.etherswitch: TeamES_Activate:668: Failed to initialize beaconing on portset 'pps': Not implemented.

2014-03-20T22:07:22.952Z cpu12:33484)WARNING: LinScsiLLD: scsi_add_host:573: vmkAdapter (usb-storage) sgMaxEntries rounded to 255. Reported size was 65535

2014-03-20T22:07:23.450Z cpu16:33463)WARNING: LinNet: LinNet_CreateDMAEngine:3807: vusb0, failed to get device properties with error Not supported

2014-03-20T22:07:23.450Z cpu16:33463)WARNING: LinNet: LinNet_ConnectUplink:10927: vusb0: Failed to create DMA engine with error Not supported, it maybe a pseudo device

2014-03-20T22:07:47.014Z cpu5:33315)WARNING: NMP: nmp_PspSet:539: Switching to claimrule PSP "VMW_PSP_RR" for device Unregistered Device.

2014-03-20T22:07:47.014Z cpu5:33315)WARNING: NMP: nmp_PspSet:539: Switching to claimrule PSP "VMW_PSP_RR" for device Unregistered Device.

2014-03-20T22:07:51.081Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000007] is already managed by PSP "VMW_PSP_RR"

2014-03-20T22:07:51.081Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000052] is already managed by PSP "VMW_PSP_RR"

2014-03-20T22:07:51.164Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000007] is already managed by PSP "VMW_PSP_RR"

2014-03-20T22:07:51.164Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000052] is already managed by PSP "VMW_PSP_RR"

2014-03-20T22:07:51.365Z cpu9:33597)WARNING: RDT: RDTModInit:841: Kernel is not configured for IPv6

2014-03-20T22:07:51.385Z cpu9:33597)WARNING: Created heap vsanutil (prealloc 0), maxsize 6 MB

2014-03-20T22:07:51.398Z cpu9:33597)WARNING: Created heap LSOM (prealloc 0), maxsize 1024 MB

2014-03-20T22:07:51.407Z cpu9:33597)WARNING: Created slab SSDLOG_LogBlkDescSlab (prealloc 0), 8192 entities of size 4578, total 35 MB, numheaps 1

2014-03-20T22:07:51.407Z cpu9:33597)WARNING: Created slab SSDLOG_AllocMapSlab (prealloc 0), 8192 entities of size 50, total 0 MB, numheaps 1

2014-03-20T22:07:51.407Z cpu9:33597)WARNING: Created slab SSDLOG_CBContextSlab (prealloc 0), 8192 entities of size 98, total 0 MB, numheaps 1

2014-03-20T22:07:51.723Z cpu19:33705)WARNING: Supported VMs 281, Max VSAN VMs 100, SystemMemoryInGB 48

2014-03-20T22:07:51.723Z cpu19:33705)WARNING: MaxFileHandles: 3000, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 1

2014-03-20T22:07:51.733Z cpu19:33705)WARNING: DOM memory will be preallocated.

2014-03-20T22:07:51.733Z cpu19:33705)WARNING: Created heap DOM-MODULE (prealloc 1), maxsize 1 MB

2014-03-20T22:08:12.016Z cpu5:34650)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

2014-03-20T22:08:12.139Z cpu5:34650)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffc32d8c)

0:00:00:05.733 cpu0:32768)WARNING: PCI: 994: 0000:00:08.0 bind to PCI bus driver failed with Bad parameter.

0:00:00:05.736 cpu0:32768)WARNING: PCI: 994: 0000:00:1c.0 bind to PCI bus driver failed with Bad parameter.

0:00:00:05.737 cpu0:32768)WARNING: PCI: 994: 0000:00:1e.0 bind to PCI bus driver failed with Bad parameter.

0 Kudos
9 Replies
0v3rc10ck3d
Enthusiast
Enthusiast

Looks like an issue with storage or storage networking, is it local storage or shared storage? Could also be an issue with where ESXi is installed, SD Card? USB Disk? These are the important issues.

2014-03-20T21:53:56.903Z cpu16:33499)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60050768028103c76800000000000007" state in doubt; requested fast path state update...?

2014-03-20T21:54:23.197Z cpu6:8257781)WARNING: Tcpip_Vmk: 794: Failed to set default gateway (51): Network unreachable

2014-03-20T22:07:22.952Z cpu12:33484)WARNING: LinScsiLLD: scsi_add_host:573: vmkAdapter (usb-storage) sgMaxEntries rounded to 255. Reported size was 65535

2014-03-20T22:07:47.014Z cpu5:33315)WARNING: NMP: nmp_PspSet:539: Switching to claimrule PSP "VMW_PSP_RR" for device Unregistered Device.

Can you post the vobd.log file?

VCIX6 - NV | VCAP5 - DCA / DCD / CID | vExpert 2014,2015,2016 | http://www.vcrumbs.com - My Virtualization Blog!
0 Kudos
kopper27
Hot Shot
Hot Shot

now that you're mentioning I think it could be storage issue... related where ESXi is installed  since I've seen this message

(2014-03-20T21:53:56.903Z cpu16:33499)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60050768028103c76800000000000007" state in doubt; requested fast path state update...?)

in other ESXis no in this environment but they won't cause a disconnected ESXi.

In fact some VMs continued having communication so I might say SAN was not the culprit

We consoled the ESXi and this was not able to ping GW and ssh was down.

I forgot the servers are usb storage for ESXi boot

OK adding the log requested

0 Kudos
0v3rc10ck3d
Enthusiast
Enthusiast

Didn't see what I was looking for in the log but it's probably already passed.

There are a bunch of dead path errors that seem to come and go, really have a hunch that it's storage related, when ESXi loses connection or has issues with storage it's taken as a #1 priority, the host will use all available resources to try to hold onto storage or reconnect to it, this will cause all sorts of issues with lost connectivity, getting kicked out of vcenter, lockups, lag, crashing etc.

Do you have VMware support for this host? I would open a support request case, it would be easier to read through all the logs live.

Which storage array vendor do you use? And ESXi is on a flash drive?

VCIX6 - NV | VCAP5 - DCA / DCD / CID | vExpert 2014,2015,2016 | http://www.vcrumbs.com - My Virtualization Blog!
0 Kudos
kopper27
Hot Shot
Hot Shot

we use v7000 IBM yeah this ESXi server is running on USB Storage 2GB Capacity

yes we have support, well the issue passed few hours ago they will see the same logs you already saw

0 Kudos
continuum
Immortal
Immortal

There are early warnings which predicted a crash in future all over the log.
There was an unreliable connection to two LUNs during the whole last week - didn't you notice errors in the logs of your backup-tool since a few days ?

When you noticed the disconnected host - trying to run services.sh restart probably made it even worse.
Next time you run into a situation like this - check if hostd is running at 100% - and if yes - try to stop all VMs that run on that host with localcli and reboot the host.

Then the question is not wether the VMs will autostart at next boot - the question rather is if the two affected LUNs survive the incident without serious damage.

Before you use that host in production again - make sure you have no dead Luns and no errors in the hosts network config.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
kopper27
Hot Shot
Hot Shot

thanks a lot continuum

well we tried to shutdown the server using DCUI F12 but it never powered off

Backups have been working fine the last two weeks

0 Kudos
aakalan
Enthusiast
Enthusiast

did you solve this problem ?

0 Kudos
martinkamke
Contributor
Contributor

We have exactly the same error with our current server (supermicro). I updated everything and change the raid controller (adaptec 6805) too. But still the same error and freezing.

2014-07-07T10:11:25.666Z cpu5:33462)WARNING: Created heap vsanutil (prealloc 0), maxsize 6 MB

2014-07-07T10:11:25.679Z cpu5:33462)WARNING: Created heap LSOM (prealloc 0), maxsize 1024 MB

2014-07-07T10:11:25.688Z cpu5:33462)WARNING: Created slab SSDLOG_LogBlkDescSlab (prealloc 0), 8192 entities of size 4578, total 35 MB, numheaps 1

2014-07-07T10:11:25.688Z cpu5:33462)WARNING: Created slab SSDLOG_AllocMapSlab (prealloc 0), 8192 entities of size 50, total 0 MB, numheaps 1

2014-07-07T10:11:25.688Z cpu5:33462)WARNING: Created slab SSDLOG_CBContextSlab (prealloc 0), 8192 entities of size 98, total 0 MB, numheaps 1

2014-07-07T10:11:26.073Z cpu2:33570)WARNING: Supported VMs 256, Max VSAN VMs 100, SystemMemoryInGB 96

2014-07-07T10:11:26.073Z cpu2:33570)WARNING: MaxFileHandles: 3000, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 1

2014-07-07T10:11:26.084Z cpu2:33570)WARNING: DOM memory will be preallocated.

2014-07-07T10:11:26.084Z cpu2:33570)WARNING: Created heap DOM-MODULE (prealloc 1), maxsize 1 MB

2014-07-07T10:12:00.131Z cpu6:34777)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

2014-07-07T10:12:00.964Z cpu6:34777)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff914d8c)

2014-07-07T10:41:59.043Z cpu5:34737)WARNING: ScsiDeviceIO: 6998: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-07-07T11:11:59.085Z cpu0:34737)WARNING: ScsiDeviceIO: 6998: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-07-07T11:41:59.136Z cpu0:34737)WARNING: ScsiDeviceIO: 6998: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

0 Kudos
martinkamke
Contributor
Contributor

We have the same error with our supermicro. i updated everything, create new raid, with new LSI controller, but still the same error today.

2014-08-26T10:38:03.336Z cpu0:33206)WARNING: NetDVS: 489: portAlias is NULL

2014-08-26T10:38:03.617Z cpu3:33461)WARNING: Tcpip: 827: Failed to unset the ip address (error = 49)

2014-08-26T10:38:12.212Z cpu1:33474)WARNING: RDT: RDTModInit:906: Kernel is not configured for IPv6

2014-08-26T10:38:12.725Z cpu7:33587)WARNING: Supported VMs 256, Max VSAN VMs 100, SystemMemoryInGB 96

2014-08-26T10:38:12.725Z cpu7:33587)WARNING: MaxFileHandles: 3000, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 1

2014-08-26T10:38:12.735Z cpu7:33587)WARNING: DOM memory will be preallocated.

2014-08-26T10:38:14.308Z cpu3:33616)ALERT: Logs are stored on non-persistent storage.  Consult product documentation to configure a syslog server or a scratch partition.

2014-08-26T10:38:43.364Z cpu6:34220)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

2014-08-26T10:38:44.195Z cpu6:34220)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff9bcd8c)

2014-08-26T11:08:44.764Z cpu4:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-08-26T11:38:44.929Z cpu7:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-08-26T11:46:09.064Z cpu3:33993 opID=a756f83)WARNING: LVM: 2900: [naa.600050e0f7031d008d9e000015f20000:1] Device shrank (actual size 1728446431 blocks, stored size 1755291982 blocks)

2014-08-26T11:46:09.489Z cpu3:33993 opID=a756f83)WARNING: LVM: 2900: [naa.600050e0f7031d008d9e000015f20000:1] Device shrank (actual size 1728446431 blocks, stored size 1755291982 blocks)

2014-08-26T12:08:45.090Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-08-26T12:15:09.952Z cpu2:32791)WARNING: ScsiDeviceIO: 1223: Device naa.600050e0f7032700b12e00003c960000 performance has deteriorated. I/O latency increased from average value of 241 microseconds to 11678 microseconds.

2014-08-26T12:38:45.259Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-08-26T13:08:45.425Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-08-26T13:38:45.590Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

2014-08-26T14:08:45.743Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

0 Kudos