VMware Cloud Community
WindSupport
Contributor
Contributor

Networking issues after upgrade ESXi 7.0.3 to 21930508

Hi,

we run a cluster of eight ESXi hosts and a number of single hosts, all connected to a vCenter. We upgraded all machines to the 21930508 patch and see a number of issues after the upgrade. Our hosts are DELL servers, affected machines are several models, R7515, T350, T340.

 

VMs will suddenly no longer have network connections. Services depending on network connections hang on VMs. OS in the VMs doesn't matter, we see the issues with Windows Server 2016, 2019 and 2022 and also Ubuntu Linux. 

Sometimes the VM itself hangs so that a shutdown via VMware tools is no longer possible. Machines that are generating network load through many connections seem to fail more often. With some VMs this happend more than once a day.

 

We have now reverted the systems to 21686933 and the problems disappeared immediately and have not appeared again. Is this a known issue? Are there any solutions?

 

BR, Ingo

17 Replies
Noobus
Contributor
Contributor

What is the latest Dell image that is installed on the esxi hosts?

Reply
0 Kudos
Sachchidanand
Expert
Expert

What is your vCenter version?

Regards,

Sachchidanand

Reply
0 Kudos
WindSupport
Contributor
Contributor

The machines were installed at different times, so they originally had different images. 

The affected machines in our cluster started with DEL-ESXi-703_19193900-A02, some of the single hosts also. Others were originally on newer images. 

 

Reply
0 Kudos
WindSupport
Contributor
Contributor

vCenter is on 7.0.3 Build: 21784236.

 

Reply
0 Kudos
Sachchidanand
Expert
Expert

Upgarde your vCenter to build 21958406 and then upgrade your esxi and check if it solves your issue.

Regards,

Sachchidanand

Reply
0 Kudos
WindSupport
Contributor
Contributor

Are you sure, that this may be the source of the problems? As far as I can see, the only change is something with Kubernetes, which is not part of our license anyway. According to the release notes there is no other change, therefore I did not install this update yet. I can try it. 

Reply
0 Kudos
grax88
Contributor
Contributor

We have the same problem - a cluster of Dell R7525, after upgrading to the latest build connectivity failures in various VMs, especially the most loaded ones. The networking in windows is alive, but you can't get anywhere, after disabling the networking you can't re-enable it, the machine reboots and ends up in a BSOD with a "DRIVER POWER STATE FAILURE" message. So far (hopefully) solved by rollback (reinstall) to older builds. When upgrading the vmware, the firmware update of the servers was done at the same time. If anyone has more information, here it is!

Reply
0 Kudos
grax88
Contributor
Contributor

One notice - we are using dynamic switch on our hosts.

Reply
0 Kudos
WindSupport
Contributor
Contributor

I did upgrade the vCenter and did not expect any changes. And: no changes. The issue is still there.

Reply
0 Kudos
grax88
Contributor
Contributor

Hello, i opened case with vmware, but don't have response. May I ask you if:

- are you using dynamic or static switch?
- after downgrade everything ok?

I suspect some Dell firmware becouse we did it in same time, but the fact that you wrote that you have it on many different machines has (so far) discouraged me from opening a ticket with Dell as well.

Reply
0 Kudos
Kinnison
Commander
Commander

Hello,


Pardon the question, what do you mean by the term "dynamic switch" ?


Regards,
Ferdinando

Reply
0 Kudos
grax88
Contributor
Contributor

Sorry, distributed 🙂

Reply
0 Kudos
Kinnison
Commander
Commander

Hello,


Now it's a little clearer, however for what is my very personal point of view I would check which driver is used to govern your network cards, the one released by the vendor or the native one included with the generic ESXi distribution, check also the level of the firmware, something at the level of "recommended" has been released over the last few months and while we're at it, it seems to me that a bios update related to your systems has been released a few days ago classified as "urgent".


Apart from that, in addition to waiting for a response to your ticket, some more details could be useful to narrow down your problem, this is a forum after all and it's always difficult to diagnose remotely. I would also take a look at the LOGs of your network devices and check the firmware level of your network devices, I've seen "strange" things due to defects in them spreading throughout the entire network.


Regards,
Ferdinando

Reply
0 Kudos
grax88
Contributor
Contributor

We updated the server firmware together with vmware. I have to say that after reverting the vsphere build we have (hopefully) no problems so far. I noticed that the update loaded a new network card driver:

from:
bnxtnet 223.0.152.0-1OEM.700.1.0.15843807
To:
bnxtnet 225.0.131.0-1OEM.700.1.0.15843807

I thought I would open the problem with Dell as well, but I was discouraged by the different server types in the initial post, and therefore I assume different network card drivers.

Reply
0 Kudos
Kinnison
Commander
Commander

Hello,


Different models and generations of systems do not preclude using network cards from the same manufacturer which consequently require the same driver. Apart from this, I would not at all exclude that the source of the problem is somehow a consequence of the different driver version as a result of the update, as well as not even excluding that the very recent bios update as well as correcting some "vulnerabilities" does not carry "something else" that is not documented.


And yes, IMHO it would be also worth exchanging words with DELL because according to HCL there would be an even more updated diver, which however I don't see included in the custom ISO /vendor add-on (maybe I'm wrong).


Regards,
Ferdinando

Reply
0 Kudos
aklausen
Contributor
Contributor

We had this issue after upgrading to 7.0.3n on our Dell servers. We tried to upgrade to 8.0.1c with a Dell-image, but the issue remained. After working with VMware support for a while, we found a known issue with Broadcom bnxtnet drivers. We were at 225.x.x.x version of the driver, upgraded to 226.0.145.4-1, and the issue is gone.
Command to check the driver on ESXi-host: esxcli network nic get -n vmnic0


Link to solution: Windows VMs experience full or partial loss of network connectivity on ESXi hosts using certain vers...

Link on VMware Compatibility Guide for Broadcom drivers: VMware Compatibility Guide - I/O Device Search

Reply
0 Kudos
kacyk
Contributor
Contributor

Same symptoms in Nutanix NX-8150-G8 environment.  Different nic adapters however:  Mellanox Technologies ConnectX-5 EN NIC; 10/25GbE; dual-port SFP28; PCIe3.0 x8; (MCX512A-ACU)

Had to upgrade to mitigate a cpu microcode issue and now environment starts failing under load.   Causes major issues as Nutanix is software-defined storage/NFS to the hypervisor so lots of storage performance degradation, vmotion timeouts.  Ticket open w/VMware.

Reply
0 Kudos