vSphere 7.0 - whole port group lost connectivity suddenly
My DellEMC servers were previously running VMware vSphere 6.7 u3, and everything was working as expected.
In December 2021, I updated the servers to VMware vSphere 7.0 u2, and preliminary testing suggests that everything was working as expected as well.
However, starting May-June 2022 (no exact date), one particular port group that was configured on vSphere suddenly lost all connectivity. All VMs in the port group cannot hit the gateway (a firewall/router), nor can it hit any other VMs in the same port group (e.g. 'ip neighbour' command shows no ARP neighbours).
Essentially, no network traffic is leaving the individual VMs ('tcpdump' shows no output, firewall logs show no traffic from these VMs hitting the gateway). There were no firewall rule changes that will affect these VMs in any way.
I do suspect that the issue is on the vSphere level, since other physical machines and network devices in the same network subnet has no issues, and all other port groups that run on the same physical NIC also has no issues.
Here is a list of things I have tried in an attempt to fix the issue:-
Unplug network cable and plugging it back in
Reboot the whole vSphere device
Monitor firewall for potential rules that are blocking the traffic (conclusion: no incoming traffic = no blocked traffic)
None of the above works. The only solution that allowed the VMs to send traffic outwards is if I moved the VMs to another known working port group.
All vSphere logs do not show any anomalies as far as I can tell.
Does the community have any idea what might have caused this issue and if possible, how it can be resolved? (trying to avoid a full OS downgrade where possible...)