I just started using Veeam Backup, and it breaks the network on one of my host servers. I have been having this same issue intermittently for months now, but Veeam Backup reliably breaks the network on this host.
This is not an intermittent issue where the host is resources are over taxed and the guests loose network access intermittently during the backup.
The only things on this host are the VCSA VM and my cameras system VM.
Heres what happens, I run Veeam Backup against my camera system VM, as soon as Veeam starts transferring data to my NAS, the VCSA drops off the network, then the Camera VM starts to loose connectivity to the cameras, then it drops off the network too. I've let it sit for a few hours, the guests never regain network access. Rebooting the host brings the network back up.
I did notice there were a bunch of "VMXNET3 TX HANG" messages on my camera VM. After some googling I switched the NIC on that VM to the e1000 vNIC, this had no effect. After some more googling I tried disabing, TSO / LRO, first on the guests, then on the host, then on both. None of this has any effect.
I also tried uninstalling open-vm-tools and installing the VMware Tools provided by vSphere, no luck.
This doesn't seem to effect my other VMware host, just the one running my camera system (Ubuntu 16.04) and the VCSA VM (VMware Photon)
Output of VMKernel follows.
2019-11-26T06:25:43.536Z cpu22:2111250)WARNING: UserSocketInet: 2266: vsanmgmtd: waiters list not empty!
2019-11-26T06:25:43.537Z cpu22:2111250)WARNING: UserSocketInet: 2266: vsanmgmtd: waiters list not empty!
2019-11-26T06:25:44.609Z cpu10:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2019-11-26T06:35:44.597Z cpu7:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2019-11-26T06:45:44.586Z cpu0:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2019-11-26T06:50:44.655Z cpu11:2099882)VSCSI: 6602: handle 8205(vscsi0:0):Destroying Device for world 2099858 (pendCom 0)
2019-11-26T06:50:44.655Z cpu11:2099882)VSCSI: 6602: handle 8206(vscsi0:1):Destroying Device for world 2099858 (pendCom 0)
2019-11-26T06:50:44.656Z cpu11:2099882)cswitch: VSwitchDisablePTIOChainRemoveCB:1149: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Remove ptDisable IOChain Handle for port 0x2000008
2019-11-26T06:50:44.656Z cpu11:2099882)NetPort: 1580: disabled port 0x2000008
2019-11-26T06:50:44.671Z cpu11:2099882)CBT: 723: Disconnecting the cbt device 70d63-cbt with filehandle 462179
2019-11-26T06:50:44.773Z cpu11:2099882)CBT: 723: Disconnecting the cbt device 70d62-cbt with filehandle 462178
2019-11-26T06:50:45.568Z cpu11:2099882)CBT: 1352: Created device 80d67-cbt for cbt driver with filehandle 527719
2019-11-26T06:50:45.569Z cpu11:2099882)VSCSI: 3782: handle 8207(vscsi0:0):Using sync mode due to sparse disks
2019-11-26T06:50:45.569Z cpu11:2099882)VSCSI: 3810: handle 8207(vscsi0:0):Creating Virtual Device for world 2099858 (FSS handle 920937) numBlocks=335544320 (bs=512)
2019-11-26T06:50:45.569Z cpu11:2099882)VSCSI: 273: handle 8207(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000
2019-11-26T06:50:45.572Z cpu11:2099882)FDS: 617: Enabling IO coalescing on driver 'deltadisks' device '2a0d6b-Ubuntu Zoneminder_1-000001-sesparse.vmdk'
2019-11-26T06:50:45.573Z cpu11:2099882)CBT: 1352: Created device 120d6d-cbt for cbt driver with filehandle 1183085
2019-11-26T06:50:45.573Z cpu11:2099882)VSCSI: 3810: handle 8208(vscsi0:1):Creating Virtual Device for world 2099858 (FSS handle 1314159) numBlocks=8589934592 (bs=512)
2019-11-26T06:50:45.573Z cpu11:2099882)VSCSI: 273: handle 8208(vscsi0:1):Input values: res=0 limit=-2 bw=-1 Shares=1000
2019-11-26T06:50:45.574Z cpu11:2099882)NetPort: 1359: enabled port 0x2000008 with mac 00:0c:29:36:01:fb
2019-11-26T06:50:45.574Z cpu11:2099882)cswitch: VSwitchDisablePTIOChainRemoveCB:1149: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Remove ptDisable IOChain Handle for port 0x2000008
2019-11-26T06:55:44.574Z cpu7:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2019-11-26T07:00:44.937Z cpu20:2099643)Vmxnet3: 24930: VMware vCenter Server Appliance,00:0c:29:35:dd:b6, portID(33554439): Hang detected,numHangQ: 2, enableGen: 2
2019-11-26T07:00:44.937Z cpu20:2099643)Vmxnet3: 24939: portID:33554439, QID: 0, next2TX: 2842, next2Comp: 1472, lastNext2TX: 1474, next2Write:1893, ringSize: 4096 inFlight: 4, delay(ms): 2656,txStopped: 0
2019-11-26T07:00:44.937Z cpu20:2099643)Vmxnet3: 24943: portID: 33554439, sop: 1472 eop: 1472 enableGen: 0 qid: 2, pkt: 0x45a242feb680
2019-11-26T07:00:44.937Z cpu20:2099643)Vmxnet3: 24939: portID:33554439, QID: 1, next2TX: 3066, next2Comp: 3068, lastNext2TX: 3070, next2Write:3629, ringSize: 4096 inFlight: 21, delay(ms): 26237,txStopped: 0
2019-11-26T07:00:44.937Z cpu20:2099643)Vmxnet3: 24943: portID: 33554439, sop: 3068 eop: 3069 enableGen: 1 qid: 2, pkt: 0x45a25a276780
2019-11-26T07:00:44.938Z cpu20:2099643)NetSched: 717: 0x2000004: received a force quiesce for port 0x2000007, dropped 239 pkts
2019-11-26T07:00:44.945Z cpu20:2099643)cswitch: VSwitchDisablePTIOChainRemoveCB:1149: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Remove ptDisable IOChain Handle for port 0x2000007
2019-11-26T07:00:44.946Z cpu20:2099643)NetPort: 1580: disabled port 0x2000007
2019-11-26T07:00:44.953Z cpu20:2099643)Vmxnet3: 18576: indLROPktToGuest: 0, vcd->umkShared->vrrsSelected: 1 port 0x2000007
2019-11-26T07:00:44.953Z cpu20:2099643)Vmxnet3: 18817: Using default queue delivery for vmxnet3 for port 0x2000007
2019-11-26T07:00:44.953Z cpu20:2099643)NetPort: 1359: enabled port 0x2000007 with mac 00:0c:29:35:dd:b6
2019-11-26T07:00:44.953Z cpu20:2099643)cswitch: VSwitchDisablePTIOChainRemoveCB:1149: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Remove ptDisable IOChain Handle for port 0x2000007
Issue resolved.
I had to update the driver for the Intel 82576 NICs in my server from the igb version 5.0.5.1.1-5vmw driver that came bundled with ESXi to the igb version 5.2.5 driver supplied by Intel.
The updated driver can be downloaded from here : VMware vSphere 5: Private Cloud Computing, Server and Data Center Virtualization
Instructions for installing the driver are in the readme file that comes bundled with the driver.
What version of ESXi have you used? Build number?
Can you also specify the version of the vmware tools on the virtual machine? I understand correctly that vCenter Server's communication is breaking when the camera server wants to be taken backup?
ESXI Version: 6.7.0 Update 3 (Build 15018017)
VCSA Version: 6.7.0.41000 (Build: 14836122)
Camera server VMware Tools Version (Open-VM-Tools, installed from APT): version:10304
VCSA VMware Tools Version: version:10309
What is happening is that the guest VMs on this host (Camera server and VCSA) are loosing network access when Veeam begins transferring data to to the NAS.
Yesterday I tried setting up a test VM (Ubuntu 16.04) with the latest version of VMware Tools installed, e1000 vNIC. This VM also looses network access when the backup starts transferring to the NAS.
Hi,
do you use NBD transport mode with Veeam??
I'm not sure, I only installed Veeam yesterday. So it's using what ever the defaults are. The backup target is an SMB share, so maybe.
I've been having this issue with the guests on this loosing network access intermittently for the past few months now, so there is an underlying issue that Veeam is reliably triggering.
Is your Veeam Backup Server a VM? If yes, chances are high that you are using Virtual Appliance mode.
I think im using VIrtual Appliance mode, the Veeam server is a VM on my other ESXI host, and the backup proxy is the Veeam VM
Does a Windows VM face the same issues?
I was able to back up my windows PRTG vm on my other host with out issues. I read that the Veeam VM should be excluded from backing up itself, so I haven't tried to back up that VM.
My vSphere license doesn't include vMotion so I can't migrate PRTG VM to the other host to see if backing it up crashes the networking or not.
You can try to switch to NBD transport mode. How does your vSwitch/ VMKernel port setup looks like?
Unfortunately I don't have the spare windows physical machines to make NBD mode work, we are an all Mac shop except for the servers / VMs, and all the Macs are currently in use.
I have been running into this issue on occasion for months, and I just installed Veeam yesterday. Veeam is only triggering what ever the underlying issue is.
Screen shots of the vSwitch and VMkernel NIC are attached.
The configuration of the etherchanel from my switch follows:
interface Port-channel3
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
interface GigabitEthernet1/0/5
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
channel-group 3 mode on
!
interface GigabitEthernet1/0/6
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
channel-group 3 mode on
Please switch to NBD and test the backup.
Well, I think I'm going to have to move the VCSA to the other ESXi host, and just rebuild the ESXi host that's having network issues as a physical Ubuntu host for the camera system, unless someone knows what the log messages mean.
Anyone have an idea what's going on here?
I tried unplugging the cable from the 2nd NIC, then doing the backup. The host became inaccessible this time. I was able to get the network back up, for the host as well as the VMs by disconnecting and reconnecting the cable
Issue resolved.
I had to update the driver for the Intel 82576 NICs in my server from the igb version 5.0.5.1.1-5vmw driver that came bundled with ESXi to the igb version 5.2.5 driver supplied by Intel.
The updated driver can be downloaded from here : VMware vSphere 5: Private Cloud Computing, Server and Data Center Virtualization
Instructions for installing the driver are in the readme file that comes bundled with the driver.
Thank you for your update and the solution to your problem!