VMware Cloud Community
jvvmware
Contributor
Contributor

network issue after upgrade to ESXi 7.0 Update 3l

After I upgraded to ESXi 7.0 Update 3l I ran into issues with guests not having network connectivity. All the settings were still correct and I also checked the switch it was connected to. I upgraded from ESXi 7.0 Update 3k using update in vCenter. I tried rebooting and that didn't help so I ended up rolling back and it's working again. Just curious if anyone had seen the same issue and if you did what the resolution was. Thanks in advance.

30 Replies
jvvmware
Contributor
Contributor

I'm running only Intel NIC cards so if I'm correct I'm not even using the ntg3 driver. 

0 Kudos
KenWirz
Contributor
Contributor

I received an update on my case to try this: https://kb.vmware.com/s/article/88875

I will try it on Wednesday. If anyone who is experiencing the issue tries it before then, let me know how it goes, please.

0 Kudos
mezzads
Contributor
Contributor

I had the same issue, my portgroups reverted to originating instead of Route based on IP Hash.

 

I ran the following in powerCLI to fix the issue: 

#Change vcenter.domain.net to your fqdn of your vcenter server.
Connect-VIServer -Server vcenter.domain.net

#Change hostname.domain.net to the fqdn of the host that you are updating.
$vmhost = 'hostname.domain.net'

#Update vSwitch0 to your vSwitch hat is experiencing the issue.
$portgroups = Get-VirtualPortGroup -VirtualSwitch vSwitch0 -VMHost $vmhost | select -ExpandProperty Name | sort

#loop through each portgroup on your vSwitch.
foreach ( $portgroup in $portgroups ) {
#This line will output the current portgroup
$portgroup
Write-Output ""

#get the teaming policy and place into policy1 variable.
$policy1 = Get-VirtualPortGroup -VirtualSwitch vSwitch0 -VMHost $vmhost -Name $portgroup | Get-NicTeamingPolicy

#List the current Policy, mainly used for review.
Write-Output "Load Balancing Policy: "
$policy1.LoadBalancingPolicy

#Set the policy to Route based on IP Hash.
$policy1 | Set-NicTeamingPolicy -LoadBalancingPolicy LoadBalanceIP -WhatIf #remove the -WhatIf when you are ready to run it.
Write-Output ""
Write-Output ""
}

PatrickDLong
Enthusiast
Enthusiast

This issue (as well as the ntg3 driver issue) is fixed in ESXi 7.0 Update 3m build-21686933 released TODAY 2023-05-03

From:
https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u3m-release-notes.html

PR 3164897: After an upgrade to ESXi 7.0 Update 3l, some ESXi hosts and virtual machines connected to virtual switches might lose network

After an upgrade to ESXi 7.0 Update 3l, some ESXi hosts, their VMs, and other VMkernel ports, such as ports used by vSAN and vSphere Replication, which are connected to virtual switches, might lose connectivity due to an unexpected change in the NIC teaming policy. For example, the teaming policy on a portgroup might change to Route Based on Originating Virtual Port from Route Based on IP Hash. As a result, such a portgroup might lose network connectivity and some ESXi hosts and their VMs become inaccessible.  

AND

PR 3182870: After upgrading the ntg3 driver to version 4.1.9.0-4vmw, Broadcom NICs with fiber physical connectivity might lose network

Changes in the ntg3 driver version 4.1.9.0-4vmw might cause link issues for the fiber physical layer and connectivity on some NICs, such as Broadcom 1Gb, fails to come up.

This issue is resolved in this release. ESXi 7.0 Update 3m provides ntg3 driver version 4.1.9.0-5vmw. The fix also adds a module parameter, fifoElastic, which you can enable in case of jumbo frame drops in certain Dell switches. To enable the parameter, use the following command:
esxcli system module parameters set -p 'fifoElastic=1' -m ntg3

If you are already experiencing this problem with Teaming and Failover load balancing due to applying 7.0 U3l, I do not believe that applying new release 7.0 U3m will revert any now-incorrect teaming policy back to what it was prior to your application of 7.0 U3l - you will likely need to do that manually. But honestly I have not yet applied U3m so I cannot say for sure.

KenWirz
Contributor
Contributor

Thank you for the post. I put one of my hosts back on build 3l a second time so that I could work with VMware on a solution. The experience the second time was different than the first. During the first upgrade, VMs were unreachable on the upgraded host. After the second upgrade, my dedicated vmotion nic couldn't communicate with the other host (we moved the vmotion service to a working nic and VMs were usable on the upgraded host). After hours of troubleshooting with VMware network support, I was told to work with my internal network team as everything appeared correct on the VMware side of things. I was calling it a day when I saw your post. Today I reverted my upgraded host, upgraded it to 3m, and everything worked as expected. I migrated all of my VMs to the upgraded host and updated the other host to 3m and rebalanced the load. All good now.

0 Kudos
Hyman-Y
Contributor
Contributor

We faced the same issue on HPE DL380 Gen10 after upgrade to 7.0U3l last weekend. We have 2 VMs on same host and same vswitch. One is WIN10 and the other is Server 2019. After upgrade WIN10 worked well , it could communicate both with Server 2019 and other servers outside the ESXi host. But Server 2019 could only communicate with WIN10 VM. All communications with other servers outside ESXi host were lost. We finally solved the problem by disconnecting virtual nic on Server 2019 VM and then reconnect it again. Some details: 1, Server network adapter is I350 which does not use ntg3 driver. 2,We tried to ping gateway and it returned Timedout (ping after arp -d). But arp record was created with correct MAC address which means data was successfully sent and arp echo also was successfully received. However further communications failed.
0 Kudos
berniewell
Contributor
Contributor

We installed the update 7.0.3L on a Nutanix platform yesterday and got an issue too. Yes, some group ports on few hosts got their teaming settings overridden but for us, it was the same setting so no issue there. What happened is that few port groups on some host got their VLAN ID removed. Even some port groups had their VLAN ID replaced by a new one. We put back the good VLAN ID's and everything is ok now.

I opened a case. They're redirecting me to the KB but I think we encountered another bug.

Hope this helped! 

0 Kudos
nlies99
Contributor
Contributor

Confirmed from PatrickDLong's post that u3m appears to fix the teaming issues introduced in u3l on an HP Proliant DL380 G9.

0 Kudos
jvvmware
Contributor
Contributor

I upgraded yesterday to ESXi 7.0 Update 3m with no problem also.

0 Kudos
bguerzize
Contributor
Contributor

I'm facing the same problem with same hardware as yours!

So if I understand the only solution now is to revert back to the previous esxi version ?

0 Kudos
bguerzize
Contributor
Contributor

Now it's working after patching the server.

0 Kudos