9 Replies Latest reply on Sep 12, 2014 2:19 AM by martinkamke

    ESXi 5.5 Nort Responding

    kopper27 Expert

      hi guys

       

      I few hours ago I had ESXi 5.5 Server which showed message in vCenter not responding - I tried reconnecting same issue

      services.sh restart same issue and customers started reported  connectivity issues.

       

      so I had to restart the server well first of all no VM got restart even when HA was enabled and second checking logs - vmkernel - hostd.log vmkwarning.log I really did not find much

       

      only this but does not say much, any idea guys, this server firmware was updated like 45 days ago.

       

      The event occurred about 20:30 21:10 hours I am adding vmkernel.log

      thanks a lott

       

      This is from vmkwarning.log

      2014-03-20T21:38:49.534Z cpu13:8252012)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

      2014-03-20T21:38:50.327Z cpu13:8252012)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffa09d3c)

      2014-03-20T21:46:44.152Z cpu18:8255698)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

      2014-03-20T21:46:45.007Z cpu18:8255698)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff9c5d3c)

      2014-03-20T21:53:56.903Z cpu16:33499)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60050768028103c76800000000000007" state in doubt; requested fast path state update...

      2014-03-20T21:54:23.197Z cpu6:8257781)WARNING: Tcpip_Vmk: 794: Failed to set default gateway (51): Network unreachable

      0:00:00:05.752 cpu0:32768)WARNING: PCI: 994: 0000:00:08.0 bind to PCI bus driver failed with Bad parameter.

      0:00:00:05.755 cpu0:32768)WARNING: PCI: 994: 0000:00:1c.0 bind to PCI bus driver failed with Bad parameter.

      0:00:00:05.756 cpu0:32768)WARNING: PCI: 994: 0000:00:1e.0 bind to PCI bus driver failed with Bad parameter.

      2014-03-20T22:07:14.671Z cpu11:33396)WARNING: APEI: 247: Could not initialize HEST

      2014-03-20T22:07:18.727Z cpu6:33437)WARNING: LinuxSignal: 538: ignored unexpected signal flags 0x2 (sig 17)

      2014-03-20T22:07:19.312Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

      2014-03-20T22:07:19.312Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.0): region #3 not enabled status = bad0017

      2014-03-20T22:07:19.312Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.0): num_rsp_queues = 1, num_req_queues = 1

      2014-03-20T22:07:19.312Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.0): MSI-X: Enabled (0x2, 0x0).

      2014-03-20T22:07:19.368Z cpu6:33437)WARNING: ROM firmware has finished initialization and is ready to process mailbox commands

      2014-03-20T22:07:19.548Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

      2014-03-20T22:07:19.548Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.1): region #3 not enabled status = bad0017

      2014-03-20T22:07:19.548Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.1): num_rsp_queues = 1, num_req_queues = 1

      2014-03-20T22:07:19.548Z cpu6:33437)WARNING: qlnativefc: Invalid(15:0.1): MSI-X: Enabled (0x2, 0x0).

      2014-03-20T22:07:19.839Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

      2014-03-20T22:07:19.839Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.0): region #3 not enabled status = bad0017

      2014-03-20T22:07:19.840Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.0): num_rsp_queues = 1, num_req_queues = 1

      2014-03-20T22:07:19.840Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.0): MSI-X: Enabled (0x2, 0x0).

      2014-03-20T22:07:19.849Z cpu6:33437)WARNING: ROM firmware has finished initialization and is ready to process mailbox commands

      2014-03-20T22:07:19.999Z cpu6:33437)WARNING: VMK_PCI: 846: BAR not a valid resource for mapping

      2014-03-20T22:07:19.999Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.1): region #3 not enabled status = bad0017

      2014-03-20T22:07:20.000Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.1): num_rsp_queues = 1, num_req_queues = 1

      2014-03-20T22:07:20.000Z cpu6:33437)WARNING: qlnativefc: Invalid(24:0.1): MSI-X: Enabled (0x2, 0x0).

      2014-03-20T22:07:20.383Z cpu7:33315)WARNING: Team.etherswitch: TeamES_Activate:668: Failed to initialize beaconing on portset 'pps': Not implemented.

      2014-03-20T22:07:22.952Z cpu12:33484)WARNING: LinScsiLLD: scsi_add_host:573: vmkAdapter (usb-storage) sgMaxEntries rounded to 255. Reported size was 65535

      2014-03-20T22:07:23.450Z cpu16:33463)WARNING: LinNet: LinNet_CreateDMAEngine:3807: vusb0, failed to get device properties with error Not supported

      2014-03-20T22:07:23.450Z cpu16:33463)WARNING: LinNet: LinNet_ConnectUplink:10927: vusb0: Failed to create DMA engine with error Not supported, it maybe a pseudo device

      2014-03-20T22:07:47.014Z cpu5:33315)WARNING: NMP: nmp_PspSet:539: Switching to claimrule PSP "VMW_PSP_RR" for device Unregistered Device.

      2014-03-20T22:07:47.014Z cpu5:33315)WARNING: NMP: nmp_PspSet:539: Switching to claimrule PSP "VMW_PSP_RR" for device Unregistered Device.

      2014-03-20T22:07:51.081Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000007] is already managed by PSP "VMW_PSP_RR"

      2014-03-20T22:07:51.081Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000052] is already managed by PSP "VMW_PSP_RR"

      2014-03-20T22:07:51.164Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000007] is already managed by PSP "VMW_PSP_RR"

      2014-03-20T22:07:51.164Z cpu5:33315)WARNING: NMP: nmp_VsiPspManagesSet:2228: Device [naa.60050768028103c76800000000000052] is already managed by PSP "VMW_PSP_RR"

      2014-03-20T22:07:51.365Z cpu9:33597)WARNING: RDT: RDTModInit:841: Kernel is not configured for IPv6

      2014-03-20T22:07:51.385Z cpu9:33597)WARNING: Created heap vsanutil (prealloc 0), maxsize 6 MB

      2014-03-20T22:07:51.398Z cpu9:33597)WARNING: Created heap LSOM (prealloc 0), maxsize 1024 MB

      2014-03-20T22:07:51.407Z cpu9:33597)WARNING: Created slab SSDLOG_LogBlkDescSlab (prealloc 0), 8192 entities of size 4578, total 35 MB, numheaps 1

      2014-03-20T22:07:51.407Z cpu9:33597)WARNING: Created slab SSDLOG_AllocMapSlab (prealloc 0), 8192 entities of size 50, total 0 MB, numheaps 1

      2014-03-20T22:07:51.407Z cpu9:33597)WARNING: Created slab SSDLOG_CBContextSlab (prealloc 0), 8192 entities of size 98, total 0 MB, numheaps 1

      2014-03-20T22:07:51.723Z cpu19:33705)WARNING: Supported VMs 281, Max VSAN VMs 100, SystemMemoryInGB 48

      2014-03-20T22:07:51.723Z cpu19:33705)WARNING: MaxFileHandles: 3000, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 1

      2014-03-20T22:07:51.733Z cpu19:33705)WARNING: DOM memory will be preallocated.

      2014-03-20T22:07:51.733Z cpu19:33705)WARNING: Created heap DOM-MODULE (prealloc 1), maxsize 1 MB

      2014-03-20T22:08:12.016Z cpu5:34650)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

      2014-03-20T22:08:12.139Z cpu5:34650)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffc32d8c)

      0:00:00:05.733 cpu0:32768)WARNING: PCI: 994: 0000:00:08.0 bind to PCI bus driver failed with Bad parameter.

      0:00:00:05.736 cpu0:32768)WARNING: PCI: 994: 0000:00:1c.0 bind to PCI bus driver failed with Bad parameter.

      0:00:00:05.737 cpu0:32768)WARNING: PCI: 994: 0000:00:1e.0 bind to PCI bus driver failed with Bad parameter.

        • 1. Re: ESXi 5.5 Nort Responding
          Josh Enthusiast

          Looks like an issue with storage or storage networking, is it local storage or shared storage? Could also be an issue with where ESXi is installed, SD Card? USB Disk? These are the important issues.

           

          2014-03-20T21:53:56.903Z cpu16:33499)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60050768028103c76800000000000007" state in doubt; requested fast path state update...?

           

          2014-03-20T21:54:23.197Z cpu6:8257781)WARNING: Tcpip_Vmk: 794: Failed to set default gateway (51): Network unreachable

           

          2014-03-20T22:07:22.952Z cpu12:33484)WARNING: LinScsiLLD: scsi_add_host:573: vmkAdapter (usb-storage) sgMaxEntries rounded to 255. Reported size was 65535

           

          2014-03-20T22:07:47.014Z cpu5:33315)WARNING: NMP: nmp_PspSet:539: Switching to claimrule PSP "VMW_PSP_RR" for device Unregistered Device.

           

          Can you post the vobd.log file?

          VCIX6 - NV | VCAP5 - DCA / DCD / CID | vExpert 2014,2015,2016 | http://www.vcrumbs.com - My Virtualization Blog!
          • 2. Re: ESXi 5.5 Nort Responding
            kopper27 Expert

            now that you're mentioning I think it could be storage issue... related where ESXi is installed  since I've seen this message

             

            (2014-03-20T21:53:56.903Z cpu16:33499)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60050768028103c76800000000000007" state in doubt; requested fast path state update...?)

             

            in other ESXis no in this environment but they won't cause a disconnected ESXi.

             

            In fact some VMs continued having communication so I might say SAN was not the culprit

             

            We consoled the ESXi and this was not able to ping GW and ssh was down.

             

            I forgot the servers are usb storage for ESXi boot

             

            OK adding the log requested

            • 3. Re: ESXi 5.5 Nort Responding
              Josh Enthusiast

              Didn't see what I was looking for in the log but it's probably already passed.

              There are a bunch of dead path errors that seem to come and go, really have a hunch that it's storage related, when ESXi loses connection or has issues with storage it's taken as a #1 priority, the host will use all available resources to try to hold onto storage or reconnect to it, this will cause all sorts of issues with lost connectivity, getting kicked out of vcenter, lockups, lag, crashing etc.

               

              Do you have VMware support for this host? I would open a support request case, it would be easier to read through all the logs live.

               

              Which storage array vendor do you use? And ESXi is on a flash drive?

              VCIX6 - NV | VCAP5 - DCA / DCD / CID | vExpert 2014,2015,2016 | http://www.vcrumbs.com - My Virtualization Blog!
              • 4. Re: ESXi 5.5 Nort Responding
                kopper27 Expert

                we use v7000 IBM yeah this ESXi server is running on USB Storage 2GB Capacity

                 

                yes we have support, well the issue passed few hours ago they will see the same logs you already saw

                • 5. Re: ESXi 5.5 Nort Responding
                  continuum Guru
                  Community WarriorsUser ModeratorsvExpert

                  There are early warnings which predicted a crash in future all over the log.
                  There was an unreliable connection to two LUNs during the whole last week - didn't you notice errors in the logs of your backup-tool since a few days ?

                   

                  When you noticed the disconnected host - trying to run services.sh restart probably made it even worse.
                  Next time you run into a situation like this - check if hostd is running at 100% - and if yes - try to stop all VMs that run on that host with localcli and reboot the host.

                   

                  Then the question is not wether the VMs will autostart at next boot - the question rather is if the two affected LUNs survive the incident without serious damage.

                   

                  Before you use that host in production again - make sure you have no dead Luns and no errors in the hosts network config.

                  • 6. Re: ESXi 5.5 Nort Responding
                    kopper27 Expert

                    thanks a lot continuum

                     

                    well we tried to shutdown the server using DCUI F12 but it never powered off

                     

                    Backups have been working fine the last two weeks

                    • 7. Re: ESXi 5.5 Nort Responding
                      John Snow Novice
                      vExpert

                      did you solve this problem ?

                      • 8. Re: ESXi 5.5 Nort Responding
                        martinkamke Lurker

                        We have exactly the same error with our current server (supermicro). I updated everything and change the raid controller (adaptec 6805) too. But still the same error and freezing.

                         

                        2014-07-07T10:11:25.666Z cpu5:33462)WARNING: Created heap vsanutil (prealloc 0), maxsize 6 MB

                        2014-07-07T10:11:25.679Z cpu5:33462)WARNING: Created heap LSOM (prealloc 0), maxsize 1024 MB

                        2014-07-07T10:11:25.688Z cpu5:33462)WARNING: Created slab SSDLOG_LogBlkDescSlab (prealloc 0), 8192 entities of size 4578, total 35 MB, numheaps 1

                        2014-07-07T10:11:25.688Z cpu5:33462)WARNING: Created slab SSDLOG_AllocMapSlab (prealloc 0), 8192 entities of size 50, total 0 MB, numheaps 1

                        2014-07-07T10:11:25.688Z cpu5:33462)WARNING: Created slab SSDLOG_CBContextSlab (prealloc 0), 8192 entities of size 98, total 0 MB, numheaps 1

                        2014-07-07T10:11:26.073Z cpu2:33570)WARNING: Supported VMs 256, Max VSAN VMs 100, SystemMemoryInGB 96

                        2014-07-07T10:11:26.073Z cpu2:33570)WARNING: MaxFileHandles: 3000, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 1

                        2014-07-07T10:11:26.084Z cpu2:33570)WARNING: DOM memory will be preallocated.

                        2014-07-07T10:11:26.084Z cpu2:33570)WARNING: Created heap DOM-MODULE (prealloc 1), maxsize 1 MB

                        2014-07-07T10:12:00.131Z cpu6:34777)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

                        2014-07-07T10:12:00.964Z cpu6:34777)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff914d8c)

                        2014-07-07T10:41:59.043Z cpu5:34737)WARNING: ScsiDeviceIO: 6998: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                        2014-07-07T11:11:59.085Z cpu0:34737)WARNING: ScsiDeviceIO: 6998: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                        2014-07-07T11:41:59.136Z cpu0:34737)WARNING: ScsiDeviceIO: 6998: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                        • 9. Re: ESXi 5.5 Nort Responding
                          martinkamke Lurker

                          We have the same error with our supermicro. i updated everything, create new raid, with new LSI controller, but still the same error today.

                           

                          2014-08-26T10:38:03.336Z cpu0:33206)WARNING: NetDVS: 489: portAlias is NULL

                          2014-08-26T10:38:03.617Z cpu3:33461)WARNING: Tcpip: 827: Failed to unset the ip address (error = 49)

                          2014-08-26T10:38:12.212Z cpu1:33474)WARNING: RDT: RDTModInit:906: Kernel is not configured for IPv6

                          2014-08-26T10:38:12.725Z cpu7:33587)WARNING: Supported VMs 256, Max VSAN VMs 100, SystemMemoryInGB 96

                          2014-08-26T10:38:12.725Z cpu7:33587)WARNING: MaxFileHandles: 3000, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 1

                          2014-08-26T10:38:12.735Z cpu7:33587)WARNING: DOM memory will be preallocated.

                          2014-08-26T10:38:14.308Z cpu3:33616)ALERT: Logs are stored on non-persistent storage.  Consult product documentation to configure a syslog server or a scratch partition.

                          2014-08-26T10:38:43.364Z cpu6:34220)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

                          2014-08-26T10:38:44.195Z cpu6:34220)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xff9bcd8c)

                          2014-08-26T11:08:44.764Z cpu4:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                          2014-08-26T11:38:44.929Z cpu7:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                          2014-08-26T11:46:09.064Z cpu3:33993 opID=a756f83)WARNING: LVM: 2900: [naa.600050e0f7031d008d9e000015f20000:1] Device shrank (actual size 1728446431 blocks, stored size 1755291982 blocks)

                          2014-08-26T11:46:09.489Z cpu3:33993 opID=a756f83)WARNING: LVM: 2900: [naa.600050e0f7031d008d9e000015f20000:1] Device shrank (actual size 1728446431 blocks, stored size 1755291982 blocks)

                          2014-08-26T12:08:45.090Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                          2014-08-26T12:15:09.952Z cpu2:32791)WARNING: ScsiDeviceIO: 1223: Device naa.600050e0f7032700b12e00003c960000 performance has deteriorated. I/O latency increased from average value of 241 microseconds to 11678 microseconds.

                          2014-08-26T12:38:45.259Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                          2014-08-26T13:08:45.425Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                          2014-08-26T13:38:45.590Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0

                          2014-08-26T14:08:45.743Z cpu6:34263)WARNING: ScsiDeviceIO: 7005: IEC page to device "mpx.vmhba33:C0:T0:L0" has bad pagecode: 0x0