VMware Cloud Community
GeorgeSyps
Contributor
Contributor

Added additional ESXi 5.5 hosts, now have intermittent network problems with VMs

Environment:

10 of the latest 5.5 7967571 ESXi hosts running on UCS blades, imaged from newer install iso, and updated through vCenter.

10 more 7967571 ESXi hosts running on UCS blades, imaged from an older install iso (2015 I believe), and updated recently through vCenter and added to the same cluster as the other 10.

We had intermittent network connectivity problems as follows after migrating some VM's to the new hosts:

Some VM's could not be contacted by the monitoring systems outside their vSwitch.

Some VM's could ping other VM's inside their subnet, but only the ones with lower IP addresses (could ping .26, but not .211 for instance).

Some VM's could not map their NFS shares in different vSwitch on different subnet, but could still reach DNS servers.

Sometimes a soft reboot from inside a VM would fix the problem with that VM.

Sometimes a power cycle through vCenter would fix the problem with that VM.

Sometimes migrating the VM back to the old ESXi hosts would fix the problem with that VM.

UCS firmware was at 2.2.3f on the new problem UCS blades. Changed them all to 2.2.2c to be consistent with the others.

"enic" device firmware showed 4.0(1f) after the update on them (was 4.0(2f) before the downgrade), while firmware on the other non-problematic ones is 2.2(2c) and the enic driver is 2.1.2.71. Since yesterday, the enic device firmware on the new hosts is now showing as 2.2(2c) - I think this was caused by the SA rebooting all the new hosts after updating the firmware on the chassis.

"enic" driver is 1.4.2.15c after the downgrade on the new ESXi hosts. It was 2.1.2.71 before the downgrade.

The downgrade to the new hosts seems to have stabilized the environment, but we are worried there might be an interoperability problem somewhere. We are scared to update all the hosts to the new firmware, because we don't know if the problem was due to interoperability or due to a problem with the newer firmware (2.2.3f) that came with the new installer iso for the hosts.

Does anyone know how to troubleshoot intermittent network problems caused by adding new ESXi hosts to a cluster? Where can we review logs that might indicate other problems that we aren't seeing a direct impact at the moment?

Thanks,

George

Tags (1)
0 Kudos
3 Replies
RajeevVCP4
Expert
Expert

Are you using vblock ?

Are you using N1K or Vmware DVS

UCS type ?

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you
0 Kudos
IT_pilot
Expert
Expert

Did you update the hardware version and vmware tools?

Maybe the network adapter is different?

http://it-pilot.ru
0 Kudos
dconradSAP
Contributor
Contributor

The system is using VDS and the UCS firmware on the 440 blades was updated prior to bringing the ESX servers into vCenter and then updating them.  The firmware update was pushed through a profile for the blades.  The UCS 5108 chassis have been neglected for some time.  The firmware version was already installed years before it was used and reverted.  Since reverting the problems have not re-appeared.

0 Kudos