VMware Cloud Community
rayvd
Enthusiast
Enthusiast

Wrong MAC address used for replies during vMotion (and more?)

Reference this thread.

Have a vSwitch comprised of one physical NIC.  This vSwitch has two "ports" defined on it apart from the VM network devices -- a Management Console port (with MAC address "A") and a vMotion port (with MAC address "B").  No tagging is being done, both of these ports communicate via a different IP address within the same subnet.

We've noticed that reply traffic with an originating IP of the vMotion port are being sent with the Management Console port's MAC address.  Traffic to the vMotion IP is sent to the correct MAC address ("B").

Since all of the ACKs/responses are coming back with MAC "A", eventually our Cisco switches lose MAC "B" from their CAM tables and unicast flooding begins until we manually inject an ARP request to repopulate the CAM tables with the correct MAC ("B").

I understand putting management console traffic along with vMotion traffic on the same physical subnet is not best practice.  But the behvarior we're seeing (wrong MAC address being used for the IP on reply packets) seems to be a bug.

Anyone else run into this?

This is with ESXi 4.1.0 260247.

Reply
0 Kudos
12 Replies
bulletprooffool
Champion
Champion

As you have said : putting management console traffic along with vMotion traffic on the same physical subnet is not best practice - so I would start looking here.

you also have to be careful to note the actual IP settings / Default gateways etc for these ports - you are using 1 physical NIC and I am guessing poresenting both IPs on the same subnet (or do you have a router on a stick setup?) - the ESX is simply managing Ip traffic the way any other host in a similar scenario would.

Are you able to segregate the traffic for these 2 Ports tagging VLANs by any chance to manage traffic?

Also, is this a production environment, or a lab?

Lastly, are you sure that the 'Management' port does not have VMotion enabled?

One day I will virtualise myself . . .
Reply
0 Kudos
rayvd
Enthusiast
Enthusiast

Thanks for the response.

We're identifying and changing the configuration on these "non-best practices" setups, and yes, I believe this will resolve the issue.  I'll have to check on the default gateway setting.  Perhaps the source MAC chosen is alawys associated with the "port" with the default gateway assigned.

My main thinking was, if traffic arrives for IP 10.x.x.5, responses sent out with src address 10.x.x.5 should use the source MAC address associated with the port group having that IP, not a port group having a different IP (even if that port group is the one with the default gateway and even if we're dealing with the same physical NIC).  This seems like buggy behavior to me.

May have to do a little testing on a few generic Linux hosts since I presume ESXi's networking stack is still based on the Linux kernel.

Reply
0 Kudos
mcowger
Immortal
Immortal

rayvd wrote:

May have to do a little testing on a few generic Linux hosts since I presume ESXi's networking stack is still based on the Linux kernel.

This would be an incorrect assumption that hasn't been true for quite a few years.

--Matt VCDX #52 blog.cowger.us
TMeissner
Contributor
Contributor

We have confirmed that this is true using a WireShark trace.  Even though this configuration is not "best practice" to put these on the same physical subnet, in reality many people do because a 1 Gb link provides more than enough bandwidth to do the job.  Basically, we consider this a bug that should be patched.

rayvd
Enthusiast
Enthusiast

Matt wrote:

rayvd wrote:

May have to do a little testing on a few generic Linux hosts since I presume ESXi's networking stack is still based on the Linux kernel.

This would be an incorrect assumption that hasn't been true for quite a few years.

Any idea what kernel or networking components ESXi is based on currently?  I've seen this sort of behavior on Solaris as well.

Reply
0 Kudos
rayvd
Enthusiast
Enthusiast

TMeissner wrote:

We have confirmed that this is true using a WireShark trace.  Even though this configuration is not "best practice" to put these on the same physical subnet, in reality many people do because a 1 Gb link provides more than enough bandwidth to do the job.  Basically, we consider this a bug that should be patched.

Have you engaged VMware Support at all?  As we're in a position to potentially redo how we have things linked up, I wasn't sure this was worth even bothering over -- their manuals do indicate not to wire things up in this manner (even though it's common and practical as you mention).

Reply
0 Kudos
mastrboy
Contributor
Contributor

their manuals do indicate not to wire things up in this manner

Could you provide a link to that manual? (I have like ~100 pdf whitepapers from vmware on my drive :smileyplain:)

Reply
0 Kudos
rayvd
Enthusiast
Enthusiast

mastrboy wrote:

their manuals do indicate not to wire things up in this manner

Could you provide a link to that manual? (I have like ~100 pdf whitepapers from vmware on my drive :smileyplain:)

Apparently I didn't save the exact manual either... but try the Performance Best Practices for VMware vSphere 4.1.  Page 12 or so recommends separate logical networks at least for vmknic traffic.

However, this reads as a recommendation based more on capacity than based on preventing the issues we're seeing.

the iSCSI SAn Configuration Guide for 4.1 may also have some information on this.

Reply
0 Kudos
fletch00
Enthusiast
Enthusiast

I just opened a case on this - we can not have network disruptions due to vMotions.

This did not exist in our environment prior to migrating from ESX to ESXi!

http://vmadmin.info

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
Reply
0 Kudos
jbajba
Contributor
Contributor

Hello,

I guess I get similar issue with ESXi 4.1U1:

After a Vmotion, ESXi host still claim having MAC of the migrated VM.

Migrated VM is consequently unreachable from all VM hosted on the former host.

Migrated VM is reachable from all networks but its former ESX host and attached VM.

Currently, the only 2 ways the former host (and its VM) can reach the migrated VM is :

- reboot

- vmkload_mod -u e1000e; sleep 5 ; vmkload_mod e1000e

NIC are Intel 82574L (e1000e)

2 VMKernel (Vmotion and Management) on a VLAN (same subnets)

1 VMKernel (Management) on a routed LAN. (different subnets)

Servers are hosted in a french datacenter (OVH), and they claim their CAM tables are not involved. No way to get swicth monitoring.

I opened a case but VMware support was not able to reproduce this problem.

If anyone has an idea ?

Thank you.

JB

Reply
0 Kudos
KReagan
Contributor
Contributor

The VMwrae site does not make mention of unicast flooding in any officaly published documentation, or per VMware Networking Support, it is not mentioned in internal documentation either.

The ESXi Server Config Guide, page 67, second bullet under Networking Best Practices;

"Keep the vMotion connection on a separate network devoted to vMotion"

And that imples not to have the management network and the vMotion network in the same network else it would not be a 'separate network devoted to vMotion'

However, that recomendation reads as if being made for security and speed, ie, machines being sent accross the wire are unencrypted and if you want best performance, then you should have a dedicted network.

It is a supported configuration; (see last paragraph of solution)

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101307...

And the unicast storm is not unexpected;

"all outgoing traffic to the IP subnet always transmit on the first vmknic"

And it is not a supported configuration; (See note under solution)

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100698...

And finally, this thread gives the reasons why the physical switch resorts to a unicast flooding.

http://communities.vmware.com/message/1732987

Reply
0 Kudos
kopper27
Hot Shot
Hot Shot

someone knows if this affects ESXi 4.1 Update 2?

http://www.vmadmin.info/2011/04/vmotion-unicast-flood-esxi.html

According to this

Right now my Management and vMotion are like this (2 hosts)

Hosts 1

Management - 192.168.23.240

vMotion - 192.168.23.241

Gateway 192.168.23.1

Host 2

Management - 192.168.23.242

vMotion - 192.168.23.243

Gateway 192.168.23.1

so I should create a vMotion with 10.10.10.x ???? for instance?

vMotion host 1 : 10.10.10.5

vMotion host 2 : 10.10.10.6

and same Gateway 192.168.23.1

or

something like this might be enough?

vMotion host 1 : 192.168.30.5

vMotion host 2 : 192.168.30.6

Let me know guys

thanks a lot

Reply
0 Kudos