VMware Cloud Community
bigdee
Enthusiast
Enthusiast

Strange Network issue

Hi

We have a ESX farm with 3 ESX hosts (HP DL580G3). The hosts have two onboard NICs (broadcom) and four dual port intel nics (HP NC7170).

The hosts run on ESX 2.5.3 without any problems.

Now we wanted to migrate to ESX 3.0.1. We moved all VMs off one host, did a fresh install of ESX 3.0.1 on the host, installed all patches, provided new LUNs, configured networking and tried to migrate some VMs to the ESX 3.0.1. The migration itself was fine but the VMs have strange network issues. The VMs can only ping some IPs of their own subnet. On some IPs they get a response on some they don't! That's strange! We already tried different NICs with different virtual switches.

Also when building up a new VM their is the same issue.

I already did an other migration the same way without any problems.

Does anybody have an idea?

Thanks

CU

I also removed the virtual NIC from the VM and readded it. Same problem.

Reply
0 Kudos
6 Replies
Erik_Zandboer
Expert
Expert

Hi, check your setting of the NICs. Make sure that if you have GBIT, all interfaces are fixed at 1GBPs (or make sure all negociate correctly). Check this on the ESX side, but also on the pswitch side. Also, check for any errors on the pswitch ports.

Sometimes when you migrate, the old NICs remain somewhere in the hardweare list, showing a wraning when you (re)set the same IP address on the VM. Did you see this warning?

Finally, make sure your gateway and subnets match.

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
esiebert7625
Immortal
Immortal

What is your vswitch configuration? Are you using 802.1Q VLAN tagging? With gigabit NIC's it is better to set them to auto/auto at the NIC and on the physical switch they are plugged into. Post the output of the following commands, "esxcfg-nics -l" and esxcfg-vswitch -l".

http://www.vmware.com/community/thread.jspa?messageID=568366&#568366

Reply
0 Kudos
bigdee
Enthusiast
Enthusiast

Unfortunately I had to go back to ESX 2.5.3 so I cannot post the output of the commands at the moment.

I will have to wait till our next maintenance window.

I have the default vswitch configuration, no 802.1Q VLAN tagging. I tried the NICs with auto and fixed settings. Also I tried to assign only one NIC to a vswitch to eliminate problems with the teaming.

The VMs were migrated by both VMotion with storage reallocation and cold migration. With both the problem exists. The VMs are all in the same Subnet (Mask 255.255.255.0).

I have never seen such a behavior before.

CU

Reply
0 Kudos
rDale
Enthusiast
Enthusiast

It sounds like your using IP HASH but dont have the switch setup with ether channels correctly.

I would check your configuration and make sure that your switches are not set to use ether channels and set the vswitch to use port id.

If you need the port aggregation then youll need to setup ether channel groups for the ports set the switch to use "port-channel load-balance src_dest_ip" and set the vswitch to use ip hash.

Reply
0 Kudos
bigdee
Enthusiast
Enthusiast

Thanks.

I also thought about the teaming configuration. So I tried the different settings Route based port ID/IP Hash and also explicit failover with one or more NICs connected. Same behavior. I think I will have to go through this with our network team next week.

The strange thing: the same server with the same network connections and ESX 2.5.3 is working fine. To get in ESX3 the same vswitch teaming setting as in ESX 2.5.3 I have to set the vswitch to "Route based on original Port ID", right? (which is the default setting)

Thanks

CU

Reply
0 Kudos
rDale
Enthusiast
Enthusiast

If you switches are setup to use defaults and no Ether Channels then your best bet is that you use Port ID it also puts less stress on the host and configurations. I have used IP hash and ether channels because i want to have the load balance in both directions; with IP hash everything has to match or it wont work correctly. If you use Port ID then you need to make sure the switches are default no ether channels etc; although im sure other combinations work I have found this the simplest.

Oh btw regarding the error you had ive seen the same problem along time ago and what i noticed was that the odd IP numbers went to one interface and the evens went to the other interface so which ever interface was the primary that was the IPs i was able to ping however one of the interfaces just dropped the packets; the problem went away when i setup the ether channels.