FYI - Experience with ESX 3.5 U4 required a rollback (5 x PE2900, EMC Clariion, Emulex, Intel NICS). VM connectivity issues across all hosts - some users could connect to (random) VM's others couldn't

We have three VMware environments: Dell PE R900 x 2, Dell PE 2950 x 2, and Dell PE 2900 x 5. All three make use of EMC Clariion back ends, Emulex HBA;s, Intel multi port nics (PRO/1000 PT, 82546GB, both supported), and Cisco 65XX switches using large ether channels. The ether channels have been in place since Nov 2008. The 2900's are maxed with ram, nics (5 x 4 port, 16 in use, onboard nics not being used), and an emulex hba.

After the update of the PE2900's we experienced issues with network connectivity on the (random) VM's. One user could reach the VM, another couldn't. Move the VM to another host it worked fine, but yet another VM on that same host wouldn't. Each host was having random problems with VM network connectivity; VM's would work fine for some but not others. We could move VM around to fix the issue, but in some cases moving the VM would fix it for one person but break it for another. We also tried changing the subnet mask on VM's that people couldn't connect to; that didn't help either.

We get our VMware support through Dell; they reviewed support logs. We remove individual nics from the vswitch in ESX and it didn't fix the issue. The only thing we didn't do is remove the ether channel from the switch because we figured that would probably work but that doesn't solve the problem.

Next step was to rebuild a single 2900 host from scratch using a U4 install disk. We did this and were immediately able to duplicate the problem with the first VM that we moved to the rebuilt host.

Next step was to install the post-U4 patches and try again, no luck. Was able to duplicate the connectivity problem again.

Final step was to rebuild from scratch with a U3 install disk. This fixed the problem with connectivity.

We rebuilt the entire PE2900 farm and the problem is now fixed. Our other two farms appear ok, the issue appears specific to the PE2900's.

I just wanted to describe my issues to people in here in case they come across the same thing. Since the connectivity problems were different from one user to the next it took us awhile to determine that the problem was indeed with ALL of our PE2900's. Perhaps this issue will be fixed some day but for now U4 is a no go for us.

If someone from VMware wants our support logs let me know and I will provide them along with any other information that they might want.


0 Kudos
0 Replies