david2009
Contributor
Contributor

VMWare ESXi 5.1 update 1 with multicast. Everyone be aware

I am not a VMWare expert and my main knowledge is in networking but I've used ESXi 5.1 update 1 enough.  We have a ESXi 5.1 update 1 environment that has been having issue for almost six months now.  We opened a TAC case with VMWare for about six months now without any resolution insight.

We are running ESXi 5.1 update 1 on Dell Server poweredge 2950 with the latest firmware.  We have redhat linux 5.8 64 bits guest OS on VMware.  We have a multicast application running on the redhat linux guest OS.

We are experiencing multicast packet loss.  After careful analysis from the network side, we've concluded that the multicast traffics actually gets to the physical interface of the ESX host but it is not being forwarded by the VMware distributed virtual switch.  For the past six months, we spend countless hours and resources working with VM TAC support to resolve this issue and it is going anywhere.  VM TAC confirms that the packet is not being forwarded by the virtual switch but has no resolution for it.  A lot financial applications utilize multicast.

VMWare is definitely not ready prime time and especially for mission critical applications, as described above.

my 2c.

0 Kudos
11 Replies
DavoudTeimouri
Virtuoso
Virtuoso

Hi friend,

Read this blog: Multicast Performance on vSphere 5.0 | VMware VROOM! Blog - VMware Blogs

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/
0 Kudos
david2009
Contributor
Contributor

Hi friend,

I am VERY AWARE of the article that you referred.  I actually brought up this article to VMWare TAC engineer when we first opened the TAC case.

Well, sadly after six months later, the issue is still there and it is still not resolved by VMWare. 

0 Kudos
ChadAEG
Contributor
Contributor

Sorry to hear you are having such trouble.  We use a fair amount of multicast applications, and have never had issues with dropped packets.  We don't have the vm density on our hosts to require splitRxMode, the default config has been fine so far.  Only multicast issue I can remember is the spam caused by multi nic vmotions running longer than your mac table timouts, but that was fixed some time ago.

Just to be clear, is the VM in question the target of the multicast or one of the recipients?

You only mention the one server, a rather old dell 2950.  Have you tried moving the VM in question to more modern hardware?   Can't really complain ESXi isn't ready for mission critical applications when your running it on a server model Dell won't even warranty anymore Smiley Happy

0 Kudos
david2009
Contributor
Contributor

You need to do proper research before saying things that are not true.  My PowerEdge 2950-III is SUPPORTED by DELL until December 2014.  Furthermore, ESXi 5.1 update 1 is SUPPORTED on Dell PE 2950-III.

The VM in  question is one of the multicast recipients.

At the moment, they are going to move the multicast applications from the VM over to the physical machines.

ChadAEG wrote:

We use a fair amount of multicast applications, and have never had issues with dropped packets.

How do you determine that you don't have issues with multicast packet lost?  Do you capture the multicast on both ends and decode them and count the result.  if you do, I am willing to bet that you will definitely packet lost.  I put sniffer trace on both end and that how I confirm that there are lot of multicast packet loss.  You can see the packet hitting the ESX NIC but got lost somewhere along the vmware vswitch.

0 Kudos
ChadAEG
Contributor
Contributor

Sorry, bad joke, but there is no way for me to do that research, as Dell servers are warrantied for a max of 5 years from date shipped from the factory, so I would need to know when you bought it to be sure it's not past that.  All of ours went out last year.  If you have warranty coverage through the end of 2014 you must have bought the last one Smiley Happy

My post was more to state that we aren't seeing the issues you are, so it may be a combination of variables in your particular environment than a general bug in ESXi's networking, but you probably already gathered that from the blog post.

I'm not sure why you would assume that because your experiencing an issue in your environment everyone else is as well.  We don't have multicast packet loss, or any other type of packet loss, beyond what you would normally see for link congestion or other normal packet drop situations, unless we are dealing with faulty hardware or cabling.  Andy yes, our NOC and network engineers know how to use packet sniffers.  We are a financial institution, and we run many of those applications your referring to, and are pretty sensitive to our applications not working as expected. 

Without knowing what steps support has already had you try, it's hard to suggest what else to do.  Moving the vm to another hardware combination is a quick way to rule out firmware and nic driver issues, etc.  Depending on how much effort you want to continue to put into chasing the issue vs reverting to a dedicated physical, you could replace VMware's generic DVS with the Nexus 1000v and see if it exhibits the same behavior.  This would also give you more visibility to track the packet activity inside the virtual switch.

0 Kudos
david2009
Contributor
Contributor

ChadAEG wrote:

My post was more to state that we aren't seeing the issues you are, so it may be a combination of variables in your particular environment than a general bug in ESXi's networking, but you probably already gathered that from the blog post.

I'm not sure why you would assume that because your experiencing an issue in your environment everyone else is as well.  We don't have multicast packet loss, or any other type of packet loss, beyond what you would normally see for link congestion or other normal packet drop situations, unless we are dealing with faulty hardware or cabling.  Andy yes, our NOC and network engineers know how to use packet sniffers.  We are a financial institution, and we run many of those applications your referring to, and are pretty sensitive to our applications not working as expected. 

There are no congestion in our environment.  The ESX host has a 10Gig NIC with about 50Mbps utilization (5% utilization) so there are no congestion.

Furthermore, if the sniffer says that the multicast traffics did get to the ESX host, I do believe it because I put the sniffer in in-line mode with the ESX

host.  Even VMware TAC confirmed it and they don't know why it behaves this way.

ChadAEG wrote:

Without knowing what steps support has already had you try, it's hard to suggest what else to do.  Moving the vm to another hardware combination is a quick way to rule out firmware and nic driver issues, etc. 

VMWare is already looking at this issue for over six months.  They have verified that the box firmware and BIOS are up-to-date and certified hardware.

By the way, we only 1 virtual machine on that ESX host and we still multicast packet loss.  Moving the VM to another ESX hosts make no difference.

VMware TAC had suggested many things over the past six months without any lucks.

ChadAEG wrote:

you could replace VMware's generic DVS with the Nexus 1000v and see if it exhibits the same behavior.  This would also give you more visibility to track the packet activity inside the virtual switch.

What make you think that replacing with the Nexus 1000v will fix the issue?  I was in the process of doing just that until I realize that by doing that, I am now will have to deal with two different vendors, Cisco and VMWare for issues like this, and you know what is going to happen right?  Vendors finger pointing at each other and the customer (me) get screwed.

Cisco is a networking company, mainly hardware.  In the virtualization world, cisco does not have an advantage over VMWare vswitch or other vendors, IMHO.

0 Kudos
david2009
Contributor
Contributor

an Update to this issue:

We replace the VM guest host with a physical machine using the same IP address and same application.  MULTICAST WORKS GREAT WITHOUT ANY PACKET LOSS.

Therefore, we can confirm that VMWare is not ready for prime time multicast.

0 Kudos
FredPeterson
Expert
Expert

I am quite curious where you're at with this now as its been 8 months'ish since you abandoned VMware for this application.  I don't fault you for the decision given the problem at hand and the application not working, but you do seem to be very "its my problem its everyones problem!" and that is certainly not the case.

0 Kudos
grasshopper
Virtuoso
Virtuoso


Interesting thread that has been revived here.  I too would be curious how things are going.  I agree that there are inherent problems with multicast in some situations where it just flat out dies at the vSwitch.  The advice given by ChadAEG regarding trying the Cisco Nexus 1000v is great advice.  I am a huge fan and long time user of the 1000v.  Any multicast issues I have faced were easily resolved when using the 1000v.  Further, the support integration between VMware and Cisco is amazing.

If all else fails, the best way to defeat a multicast issue is PCIe NIC passthrough to the VM.  Give that a try as that avoids the vSwitch completely.

0 Kudos
srinivasanmit
Contributor
Contributor

Hi david, Do you have Traffic shaping enabled on your environment.

Traffic shaping enabled environments are facing packet drops for multicast traffic.

0 Kudos
gmunetworkguy
Contributor
Contributor

I have to say that I too also have this multicast issue and it VM TAC engineers are not very helpful either.  TAC engineers confirm that the VM distributed switch is not passing multicast traffics.

After multiple attempts with upgrading to the latest patches and fixes by VM, the issue is still there.  We're running ESXi 5.1 update 2 with latest patches with latest BiOS and firmware on the Dell R720

Long story shorts, we replaced these VM Servers with physical servers and no more multicast packet drops Smiley Happy.

VM is nothing but trouble.

0 Kudos