7 Replies Latest reply on Aug 29, 2019 8:34 AM by monderick

    dead I/O on igb-nic (ESXi 6.7)

    BaumMeister Lurker

      Hi,

       

      I'm running a homelab with ESXi 6.7 (13006603). I got three nics in my host, two are onboard and one is an Intel ET 82576 dual-port pci-e card. All nics are assigned to the same vSwitch; actually only one is connected to the (physical) switch atm.

      When I'm using one of the 82576 nics and put heavy load on it (like backing up VMs via Nakivo B&R) the nic stops workign after a while and is dead/Not responding anymore. Only a reboot of the host or (much easier) physically reconnecting the nic (cable out, cable in) solves the problem.

       

      I was guessing there is a driver issue, so I updated to the latest driver by intel:

       

       

      [root@esxi:~] /usr/sbin/esxcfg-nics -l

      Name    PCI          Driver      Link Speed      Duplex MAC Address       MTU    Description

      vmnic0  0000:04:00.0 ne1000      Down 0Mbps      Half   00:25:90:a7:65:dc 1500   Intel Corporation 82574L Gigabit Network Connection

      vmnic1  0000:00:19.0 ne1000      Up   1000Mbps   Full   00:25:90:a7:65:dd 1500   Intel Corporation 82579LM Gigabit Network Connection

      vmnic2  0000:01:00.0 igb         Down 0Mbps      Half   90:e2:ba:1e:4d:c6 1500   Intel Corporation 82576 Gigabit Network Connection

      vmnic3  0000:01:00.1 igb         Down 0Mbps      Half   90:e2:ba:1e:4d:c7 1500   Intel Corporation 82576 Gigabit Network Connection

      [root@esxi:~] esxcli software vib list|grep igb

      net-igb                        5.2.5-1OEM.550.0.0.1331820            Intel   VMwareCertified   2019-06-16

      igbn                           0.1.1.0-4vmw.670.2.48.13006603        VMW     VMwareCertified   2019-06-07

       

      Unfortunately this didn't solve the problem.

       

      However ... this behaviour doesn't occur, when I'm using one of the nics using the ne1000 driver.

       

      Any idea how to solve the issue?

      (... or at least dig down to it's root?)

       

      Thanks a lot in advance.

       

      Regards

      Chris

       

      PS: I found another thread which might be connected to my problem: Stopping I/O on vmnic0  Same system behaviour, same driver.

        • 1. Re: dead I/O on igb-nic (ESXi 6.7)
          Sureshkumar M Expert
          vExpert

          What does vmkernel.log say ? can you post vmkernel logs here ..

          • 2. Re: dead I/O on igb-nic (ESXi 6.7)
            anvanster Enthusiast

            igb driver 5.2.5 that you are using was released in 2014 and quite old.

            Unfortunately your card is not supported by newer "igbn" drivers.

            • 3. Re: dead I/O on igb-nic (ESXi 6.7)
              BaumMeister Lurker

              You're right about the newer igbn driver not supporting the nic anymore.

              However ... the nic and driver I'm using are on vmwares hcl:

              VMware Compatibility Guide - I/O Device Search

              • 4. Re: dead I/O on igb-nic (ESXi 6.7)
                BaumMeister Lurker

                Sure.

                Here's the log output in the relevant timeslot.

                I marked the line that shows when the 82576-nic (-> vmnic3) went down. vmnic1 is runnign with the ne1000 driver.

                2019-06-17T12:20:44.190Z cpu4:2097707)DVFilter: 5964: Checking disconnected filters for timeouts

                2019-06-17T12:23:04.707Z cpu3:2097182)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x2, no repeat

                2019-06-17T12:30:44.190Z cpu0:2097707)DVFilter: 5964: Checking disconnected filters for timeouts

                2019-06-17T12:35:42.190Z cpu0:2098034)StorageApdHandler: 1203: APD start for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:35:42.190Z cpu0:2098034)StorageApdHandler: 1203: APD start for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:35:42.190Z cpu3:2097369)StorageApdHandler: 419: APD start event for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:35:42.190Z cpu0:2098034)StorageApdHandler: 1203: APD start for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:35:42.190Z cpu3:2097369)StorageApdHandlerEv: 110: Device or filesystem with identifier [3a5eb32c-7141e730] has entered the All Paths Down state.

                2019-06-17T12:35:42.190Z cpu3:2097369)StorageApdHandler: 419: APD start event for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:35:42.190Z cpu3:2097369)StorageApdHandlerEv: 110: Device or filesystem with identifier [a16fe90b-d7095fcc] has entered the All Paths Down state.

                2019-06-17T12:35:42.190Z cpu3:2097369)StorageApdHandler: 419: APD start event for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:35:42.190Z cpu3:2097369)StorageApdHandlerEv: 110: Device or filesystem with identifier [37c6519b-ec9783e7] has entered the All Paths Down state.

                2019-06-17T12:37:06.190Z cpu7:2098034)WARNING: NFS: 337: Lost connection to the server 10.0.0.199 mount point /volume1/VMs, mounted as 3a5eb32c-7141e730-0000-000000000000 ("VMs@Fuchur")

                2019-06-17T12:37:06.190Z cpu7:2098034)WARNING: NFS: 337: Lost connection to the server 10.0.0.199 mount point /volume1/VM_Backups/, mounted as a16fe90b-d7095fcc-0000-000000000000 ("VM_Backups@Fuchur")

                2019-06-17T12:37:06.190Z cpu7:2098034)WARNING: NFS: 337: Lost connection to the server 10.0.0.199 mount point /volume1/Media, mounted as 37c6519b-ec9783e7-0000-000000000000 ("Media@Fuchur")

                2019-06-17T12:38:02.191Z cpu0:2097369)StorageApdHandler: 609: APD timeout event for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:38:02.191Z cpu0:2097369)StorageApdHandlerEv: 126: Device or filesystem with identifier [3a5eb32c-7141e730] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

                2019-06-17T12:38:02.191Z cpu0:2097369)StorageApdHandler: 609: APD timeout event for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:38:02.191Z cpu0:2097369)StorageApdHandlerEv: 126: Device or filesystem with identifier [a16fe90b-d7095fcc] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

                2019-06-17T12:38:02.191Z cpu0:2097369)StorageApdHandler: 609: APD timeout event for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:38:02.191Z cpu0:2097369)StorageApdHandlerEv: 126: Device or filesystem with identifier [37c6519b-ec9783e7] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

                2019-06-17T12:40:44.190Z cpu0:2097707)DVFilter: 5964: Checking disconnected filters for timeouts

                2019-06-17T12:45:39.351Z cpu3:2097615)<6>igb: vmnic3 NIC Link is Down

                2019-06-17T12:45:42.732Z cpu7:2097615)<6>igb: vmnic3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

                2019-06-17T12:45:43.190Z cpu4:2097220)NetqueueBal: 5032: vmnic3: device Up notification, reset logical space needed

                2019-06-17T12:45:43.190Z cpu4:2097220)NetPort: 1580: disabled port 0x2000004

                2019-06-17T12:45:43.190Z cpu2:2097770)NetSched: 654: vmnic3-0-tx: worldID = 2097770 exits

                2019-06-17T12:45:43.190Z cpu4:2097220)Uplink: 11689: enabled port 0x2000004 with mac 90:e2:ba:1e:4d:c7

                2019-06-17T12:45:43.190Z cpu4:2097220)NetPort: 1580: disabled port 0x2000004

                2019-06-17T12:45:43.190Z cpu4:2097220)Uplink: 11689: enabled port 0x2000004 with mac 90:e2:ba:1e:4d:c7

                2019-06-17T12:45:43.191Z cpu5:2097296)CpuSched: 699: user latency of 2102301 vmnic3-0-tx 0 changed by 2097296 NetSchedHelper -6

                2019-06-17T12:45:43.191Z cpu2:2102301)NetSched: 654: vmnic3-0-tx: worldID = 2102301 exits

                2019-06-17T12:45:43.191Z cpu5:2097296)CpuSched: 699: user latency of 2102302 vmnic3-0-tx 0 changed by 2097296 NetSchedHelper -6

                2019-06-17T12:45:48.941Z cpu3:2098034)NFS: 346: Restored connection to the server 10.0.0.199 mount point /volume1/Media, mounted as 37c6519b-ec9783e7-0000-000000000000 ("Media@Fuvchur")

                2019-06-17T12:45:48.941Z cpu4:2097369)StorageApdHandler: 507: APD exit event for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:45:48.941Z cpu3:2098034)NFS: 346: Restored connection to the server 10.0.0.199 mount point /volume1/VMs, mounted as 3a5eb32c-7141e730-0000-000000000000 ("VMs@Fuchur")

                2019-06-17T12:45:48.941Z cpu4:2097369)StorageApdHandlerEv: 117: Device or filesystem with identifier [37c6519b-ec9783e7] has exited the All Paths Down state.

                2019-06-17T12:45:48.941Z cpu4:2097369)StorageApdHandler: 507: APD exit event for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:45:48.941Z cpu4:2097369)StorageApdHandlerEv: 117: Device or filesystem with identifier [3a5eb32c-7141e730] has exited the All Paths Down state.

                2019-06-17T12:45:49.613Z cpu3:2098034)NFS: 346: Restored connection to the server 10.0.0.199 mount point /volume1/VM_Backups/, mounted as a16fe90b-d7095fcc-0000-000000000000 ("VM_Backups@Fuchur")

                2019-06-17T12:45:49.613Z cpu4:2097369)StorageApdHandler: 507: APD exit event for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:45:49.613Z cpu4:2097369)StorageApdHandlerEv: 117: Device or filesystem with identifier [a16fe90b-d7095fcc] has exited the All Paths Down state.

                2019-06-17T12:49:19.476Z cpu3:2097615)<6>igb: vmnic3 NIC Link is Down

                2019-06-17T12:49:29.190Z cpu6:2098637 opID=f97c863c)World: 11943: VC opID sps-Main-767271-893-94-37-bba6 maps to vmkernel opID f97c863c

                2019-06-17T12:49:29.190Z cpu6:2098637 opID=f97c863c)SunRPC: 3303: Synchronous RPC abort for client 0x4304520bfb90 IP 10.0.0.199.8.1 proc 1 xid 0x76d7dd9e attempt 1 of 3

                2019-06-17T12:49:39.190Z cpu6:2098637 opID=f97c863c)SunRPC: 3303: Synchronous RPC abort for client 0x4304520bfb90 IP 10.0.0.199.8.1 proc 1 xid 0x76d7dda2 attempt 2 of 3

                2019-06-17T12:49:49.190Z cpu6:2098637 opID=f97c863c)SunRPC: 3303: Synchronous RPC abort for client 0x4304520bfb90 IP 10.0.0.199.8.1 proc 1 xid 0x76d7dda6 attempt 3 of 3

                2019-06-17T12:49:49.190Z cpu6:2098637 opID=f97c863c)WARNING: NFS: 2335: Failed to get attributes (I/O error)

                2019-06-17T12:49:49.190Z cpu6:2098637 opID=f97c863c)NFS: 2444: [Repeated 1 times] Failed to get object (0x451a1b49b3ce) 36 3a5eb32c 7141e730 70001 686a001 0 829c3d42 976c7782 0 0 0 0 0 :No connection

                2019-06-17T12:49:49.190Z cpu6:2098637 opID=f97c863c)NFS: 2449: Failed to get object (0x451a1751b16e) 36 37c6519b ec9783e7 70001 48001 0 829c3d42 976c7782 0 0 0 0 0 :I/O error

                2019-06-17T12:49:51.673Z cpu5:2099927)DEBUG (ne1000): checking link for adapter vmnic1

                2019-06-17T12:49:52.679Z cpu3:2097566)INFO (ne1000): vmnic1: Link is Up

                2019-06-17T12:49:52.679Z cpu3:2097566)DEBUG (ne1000): Reporting uplink 0x43044d090250 status

                2019-06-17T12:49:53.190Z cpu3:2097220)NetqueueBal: 4967: vmnic1: new netq module, reset logical space needed

                2019-06-17T12:49:53.190Z cpu3:2097220)NetqueueBal: 4996: vmnic1: plugins to call differs, reset logical space

                2019-06-17T12:49:53.190Z cpu3:2097220)NetqueueBal: 5032: vmnic1: device Up notification, reset logical space needed

                2019-06-17T12:49:53.190Z cpu3:2097220)Uplink: 537: Driver claims supporting 0 RX queues, and 0 queues are accepted.

                2019-06-17T12:49:53.190Z cpu3:2097220)Uplink: 533: Driver claims supporting 0 TX queues, and 0 queues are accepted.

                2019-06-17T12:49:53.190Z cpu3:2097220)NetPort: 1580: disabled port 0x2000008

                2019-06-17T12:49:53.190Z cpu1:2097761)NetSched: 654: vmnic1-0-tx: worldID = 2097761 exits

                2019-06-17T12:49:53.190Z cpu3:2097220)Uplink: 11689: enabled port 0x2000008 with mac 00:25:90:a7:65:dd

                2019-06-17T12:49:53.190Z cpu5:2097296)CpuSched: 699: user latency of 2102444 vmnic1-0-tx 0 changed by 2097296 NetSchedHelper -6

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Xmit Scatter-Gathered Data'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Offload Checksum for IPv4'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Offload TCP Segmentation for IPv4'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Insert VLAN Tag'

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing uplink config

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing adapter config

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Strip VLAN Tag'

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing uplink config

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing adapter config

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Xmit Scatter-Gathered Across Multiple Pages'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Offload Checksum for IPv6'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Capable To Offload TCP Segmentation for IPv6'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Xmit Scatter-Gathered Data'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Offload Checksum for IPv4'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Offload TCP Segmentation for IPv4'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Insert VLAN Tag'

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing uplink config

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing adapter config

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Strip VLAN Tag'

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing uplink config

                2019-06-17T12:49:53.190Z cpu3:2097220)DEBUG (ne1000): writing adapter config

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Xmit Scatter-Gathered Across Multiple Pages'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Offload Checksum for IPv6'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Enabled 'Capable To Offload TCP Segmentation for IPv6'

                2019-06-17T12:49:53.190Z cpu3:2097220)INFO (ne1000): vmnic1: Disabled 'Driver Requires No Packet Scheduling'

                2019-06-17T12:49:54.190Z cpu6:2098034)StorageApdHandler: 1203: APD start for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:49:54.190Z cpu6:2098034)StorageApdHandler: 1203: APD start for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:49:54.190Z cpu6:2098034)StorageApdHandler: 1203: APD start for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:49:54.190Z cpu4:2097369)StorageApdHandler: 419: APD start event for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:49:54.190Z cpu4:2097369)StorageApdHandlerEv: 110: Device or filesystem with identifier [3a5eb32c-7141e730] has entered the All Paths Down state.

                2019-06-17T12:49:54.190Z cpu4:2097369)StorageApdHandler: 419: APD start event for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:49:54.190Z cpu4:2097369)StorageApdHandlerEv: 110: Device or filesystem with identifier [a16fe90b-d7095fcc] has entered the All Paths Down state.

                2019-06-17T12:49:54.190Z cpu4:2097369)StorageApdHandler: 419: APD start event for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:49:54.190Z cpu4:2097369)StorageApdHandlerEv: 110: Device or filesystem with identifier [37c6519b-ec9783e7] has entered the All Paths Down state.

                2019-06-17T12:50:00.969Z cpu2:2098034)StorageApdHandler: 1315: APD exit for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:50:00.969Z cpu4:2097369)StorageApdHandler: 507: APD exit event for 0x430c44eeb4c0 [37c6519b-ec9783e7]

                2019-06-17T12:50:00.969Z cpu2:2098034)StorageApdHandler: 1315: APD exit for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:50:00.969Z cpu4:2097369)StorageApdHandlerEv: 117: Device or filesystem with identifier [37c6519b-ec9783e7] has exited the All Paths Down state.

                2019-06-17T12:50:00.969Z cpu2:2098034)StorageApdHandler: 1315: APD exit for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:50:00.969Z cpu4:2097369)StorageApdHandler: 507: APD exit event for 0x430c44ee76d0 [3a5eb32c-7141e730]

                2019-06-17T12:50:00.969Z cpu4:2097369)StorageApdHandlerEv: 117: Device or filesystem with identifier [3a5eb32c-7141e730] has exited the All Paths Down state.

                2019-06-17T12:50:00.969Z cpu4:2097369)StorageApdHandler: 507: APD exit event for 0x430c44ee95d0 [a16fe90b-d7095fcc]

                2019-06-17T12:50:00.969Z cpu4:2097369)StorageApdHandlerEv: 117: Device or filesystem with identifier [a16fe90b-d7095fcc] has exited the All Paths Down state.

                2019-06-17T12:50:32.325Z cpu6:2099723)VSCSI: 6602: handle 8209(vscsi0:0):Destroying Device for world 2099687 (pendCom 0)

                2019-06-17T12:50:32.327Z cpu3:2099715)VSCSI: 6602: handle 8208(vscsi0:0):Destroying Device for world 2099688 (pendCom 0)

                2019-06-17T12:50:32.327Z cpu2:2099723)CBT: 723: Disconnecting the cbt device 2f0796-cbt with filehandle 3082134

                2019-06-17T12:50:32.328Z cpu3:2099715)CBT: 723: Disconnecting the cbt device 31072d-cbt with filehandle 3213101

                2019-06-17T12:50:32.342Z cpu1:2099723)CBT: 1352: Created device 41078e-cbt for cbt driver with filehandle 4261774

                2019-06-17T12:50:32.342Z cpu3:2099715)CBT: 1352: Created device 320792-cbt for cbt driver with filehandle 3278738

                2019-06-17T12:50:32.345Z cpu1:2099723)CBT: 1352: Created device 5107a4-cbt for cbt driver with filehandle 5310372

                2019-06-17T12:50:32.346Z cpu1:2099723)CBT: 723: Disconnecting the cbt device 41078e-cbt with filehandle 4261774

                2019-06-17T12:50:32.346Z cpu3:2099715)CBT: 1352: Created device 2807a7-cbt for cbt driver with filehandle 2623399

                2019-06-17T12:50:32.346Z cpu3:2099715)CBT: 723: Disconnecting the cbt device 320792-cbt with filehandle 3278738

                2019-06-17T12:50:32.346Z cpu1:2099723)CBT: 723: Disconnecting the cbt device 5107a4-cbt with filehandle 5310372

                2019-06-17T12:50:32.346Z cpu3:2099715)CBT: 723: Disconnecting the cbt device 2807a7-cbt with filehandle 2623399

                2019-06-17T12:50:32.347Z cpu3:2099715)CBT: 1352: Created device 2a07a7-cbt for cbt driver with filehandle 2754471

                2019-06-17T12:50:32.348Z cpu1:2099723)CBT: 1352: Created device 5307a4-cbt for cbt driver with filehandle 5441444

                2019-06-17T12:50:32.348Z cpu3:2099715)SVM: 5032: SkipZero 0, dstFsBlockSize -1, preallocateBlocks 0, vmfsOptimizations 0, useBitmapCopy 1, skipPlugGrain 1, destination disk grainSize 0

                2019-06-17T12:50:32.349Z cpu3:2099715)SVM: 5126: SVM_MakeDev.5126: Creating device 2a07a7-3407aa-svmmirror: Success

                2019-06-17T12:50:32.349Z cpu3:2099715)SVM: 5175: Created device 2a07a7-3407aa-svmmirror, primary 2a07a7, secondary 3407aa

                2019-06-17T12:50:32.349Z cpu3:2099715)VSCSI: 3782: handle 8212(vscsi0:0):Using sync mode due to sparse disks

                2019-06-17T12:50:32.349Z cpu3:2099715)VSCSI: 3810: handle 8212(vscsi0:0):Creating Virtual Device for world 2099688 (FSS handle 4327310) numBlocks=41943040 (bs=512)

                2019-06-17T12:50:32.349Z cpu3:2099715)VSCSI: 273: handle 8212(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

                2019-06-17T12:50:32.349Z cpu3:2099715)Vmxnet3: 18569: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 3 port 0x200000b

                2019-06-17T12:50:32.349Z cpu3:2099715)Vmxnet3: 18810: Using default queue delivery for vmxnet3 for port 0x200000b

                2019-06-17T12:50:32.349Z cpu1:2099723)SVM: 5032: SkipZero 0, dstFsBlockSize -1, preallocateBlocks 0, vmfsOptimizations 0, useBitmapCopy 1, skipPlugGrain 1, destination disk grainSize 0

                2019-06-17T12:50:32.349Z cpu1:2099723)SVM: 5126: SVM_MakeDev.5126: Creating device 5307a4-3b07ad-svmmirror: Success

                2019-06-17T12:50:32.349Z cpu1:2099723)SVM: 5175: Created device 5307a4-3b07ad-svmmirror, primary 5307a4, secondary 3b07ad

                2019-06-17T12:50:32.349Z cpu1:2099723)VSCSI: 3782: handle 8213(vscsi0:0):Using sync mode due to sparse disks

                2019-06-17T12:50:32.349Z cpu1:2099723)VSCSI: 3810: handle 8213(vscsi0:0):Creating Virtual Device for world 2099687 (FSS handle 3606440) numBlocks=62914560 (bs=512)

                2019-06-17T12:50:32.349Z cpu1:2099723)VSCSI: 273: handle 8213(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

                2019-06-17T12:50:32.350Z cpu1:2099723)Vmxnet3: 18569: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 3 port 0x200000d

                2019-06-17T12:50:32.350Z cpu1:2099723)Vmxnet3: 18810: Using default queue delivery for vmxnet3 for port 0x200000d

                2019-06-17T12:50:33.185Z cpu2:2102534)SVM: 2847: scsi0:0 Completed copy in 821 ms. vmmLeaderID = 2099688.

                2019-06-17T12:50:33.223Z cpu0:2102533)SVM: 2847: scsi0:0 Completed copy in 858 ms. vmmLeaderID = 2099687.

                2019-06-17T12:50:33.275Z cpu0:2099715)VSCSI: 6602: handle 8212(vscsi0:0):Destroying Device for world 2099688 (pendCom 0)

                2019-06-17T12:50:33.276Z cpu0:2099715)SVM: 2548: SVM Mirrored mode IO stats for device: 2a07a7-3407aa-svmmirror

                2019-06-17T12:50:33.276Z cpu0:2099715)SVM: 2552: Total # IOs mirrored: 0, Total # IOs sent only to source: 0, Total # IO deferred by lock: 0

                2019-06-17T12:50:33.276Z cpu0:2099715)SVM: 2556: Deferred IO stats - Max: 0, Total: 0, Avg: 1 (msec)

                2019-06-17T12:50:33.276Z cpu0:2099715)SVM: 2570: Destroyed device 2a07a7-3407aa-svmmirror

                2019-06-17T12:50:33.281Z cpu3:2099723)VSCSI: 6602: handle 8213(vscsi0:0):Destroying Device for world 2099687 (pendCom 0)

                2019-06-17T12:50:33.282Z cpu7:2099723)SVM: 2548: SVM Mirrored mode IO stats for device: 5307a4-3b07ad-svmmirror

                2019-06-17T12:50:33.282Z cpu7:2099723)SVM: 2552: Total # IOs mirrored: 0, Total # IOs sent only to source: 0, Total # IO deferred by lock: 0

                2019-06-17T12:50:33.282Z cpu7:2099723)SVM: 2556: Deferred IO stats - Max: 0, Total: 0, Avg: 1 (msec)

                2019-06-17T12:50:33.282Z cpu7:2099723)SVM: 2570: Destroyed device 5307a4-3b07ad-svmmirror

                2019-06-17T12:50:33.335Z cpu1:2099715)CBT: 723: Disconnecting the cbt device 2a07a7-cbt with filehandle 2754471

                2019-06-17T12:50:33.341Z cpu6:2099723)CBT: 723: Disconnecting the cbt device 5307a4-cbt with filehandle 5441444

                2019-06-17T12:50:33.350Z cpu3:2099715)CBT: 1352: Created device 6d09cd-cbt for cbt driver with filehandle 7145933

                2019-06-17T12:50:33.350Z cpu3:2099715)VSCSI: 3782: handle 8214(vscsi0:0):Using sync mode due to sparse disks

                2019-06-17T12:50:33.350Z cpu3:2099715)VSCSI: 3810: handle 8214(vscsi0:0):Creating Virtual Device for world 2099688 (FSS handle 12388969) numBlocks=41943040 (bs=512)

                2019-06-17T12:50:33.350Z cpu3:2099715)VSCSI: 273: handle 8214(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

                2019-06-17T12:50:33.351Z cpu3:2099715)Vmxnet3: 18569: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 3 port 0x200000b

                2019-06-17T12:50:33.351Z cpu3:2099715)Vmxnet3: 18810: Using default queue delivery for vmxnet3 for port 0x200000b

                2019-06-17T12:50:33.357Z cpu4:2099723)CBT: 1352: Created device 220ba5-cbt for cbt driver with filehandle 2231205

                2019-06-17T12:50:33.357Z cpu4:2099723)VSCSI: 3782: handle 8215(vscsi0:0):Using sync mode due to sparse disks

                2019-06-17T12:50:33.357Z cpu4:2099723)VSCSI: 3810: handle 8215(vscsi0:0):Creating Virtual Device for world 2099687 (FSS handle 1706919) numBlocks=62914560 (bs=512)

                2019-06-17T12:50:33.357Z cpu4:2099723)VSCSI: 273: handle 8215(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

                2019-06-17T12:50:33.357Z cpu4:2099723)Vmxnet3: 18569: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 3 port 0x200000d

                2019-06-17T12:50:33.357Z cpu4:2099723)Vmxnet3: 18810: Using default queue delivery for vmxnet3 for port 0x200000d

                • 5. Re: dead I/O on igb-nic (ESXi 6.7)
                  Sureshkumar M Expert
                  vExpert

                  Sorry for the late response.

                   

                  Above log does not give more information on why the nic went down. We have to enable debug logging for the driver to find what made the nic to go down at that time. However, if we identify this issue is something due to driver , we cant do much apart from updating the driver/firmware that you have done already. Only NIC vendor can help us.

                   

                  or if you see no issues with ne1000, you may use this driver instead of igb.

                  • 6. Re: dead I/O on igb-nic (ESXi 6.7)
                    nague Lurker

                    Exact same behavior here with ESXi 6.5 U3 and Intel NIC 82576. Everythnigs was fine in ESXi 6.5 U2.

                    I've updated igb driver from 5.0.5 to 5.2.5 (last officialy supported version), let's say, it's a "little" better, it takes now two weeks (instead of 2 days) before NIC stops passing trafic. Plugin ou/in the ethernet cable, or remotly down/up the port on switch, solve the issue.

                     

                    Do you find any solution to this issue ? Using ne1000 driver with this NIC is possible right ? How to switch driver ?

                    • 7. Re: dead I/O on igb-nic (ESXi 6.7)
                      monderick Enthusiast

                      We're having the same random issue with Intel Corporation 82576 Gigabit Network Connection QP NICs on our vSPhere 6.5 hosts, opened support ticket and of course the suggestion is upgrading to the 5.2.5 driver.  We're going to proceed but this thread doesn't make me confident.