VMware Cloud Community
esnmb
Enthusiast
Enthusiast

vMotion Drops Many Pings?

We have two UCS chassis with 3 blades in each.  I am also running the Nexus 1000v on all 6 hosts.  When vmotioning some hosts drop the usual one ping, but other times I've seen as many as 14 dropped packets.

Is this an issue with the Nexus 6120's and routing or something?  I don't get this in our other site with the 1000v connecting to regular old Catalyst switches..

Thanks all.

Reply
0 Kudos
7 Replies
lwatta
Hot Shot
Hot Shot

Pretty sure this is a known issue.

Let me see if I can dig up the info that explains why.

What UCS and N1KV code are you running?

louis

Reply
0 Kudos
esnmb
Enthusiast
Enthusiast

Our Nexus is version Nexus1000v.4.2.1.SV1.4a.

The UCS is running 1.4(2b).

See attached csv.

Reply
0 Kudos
esnmb
Enthusiast
Enthusiast

I just added all vmotion networks back to the standard switches and put that VM back as well. I vmotioned and still dropped multiple packets.  It wasn't the 14-15 that I lost the other way but still not the same as non UCS systems...

Reply
0 Kudos
bleibold
Contributor
Contributor

esnmb,

Did you ever get this figured out?  We just ordered two UCS chassis to run VMware and I am confused as to how to vMotion between blades in two different chassis if their vMotion NICs are in different fabrics.  I don't want to have to route that traffic out of the FI's for performance and security reasons, but not sure how else to do it.  Today we have a set of dedicated switches that all VMware rack mount servers plug into for vMotion and use a non routable VLAN to keep all vMotion traffic from getting out on the network as that was always considered a security risk.  If I have to vMotion between chassis and the blade in chassis 1 has it's vMotion NIC in fabric a and the blade in chassis 2 has it's vMotion NIC in fabirc b, I don't see any other way to do it but use a routable VLAN in incurr the performance penalty and security issue.  I am completely new to UCS so I maybe looking at this all wrong.  Just sounds like you have the same scenario I want to do and thought I would ask.

Thanks,

Bob

Reply
0 Kudos
esnmb
Enthusiast
Enthusiast

We kinda think we have it figured out.  We are going basically put one NIC in standby while keeping the same NIC on each blade on the same 6120. This way it shouldn't have to traverse out of the 6120's...  Our issue we learned is that our consultants that designed this didn't recommend the Nexus 5k's upstream from the 6120's.  Instead we have Cisco 3560's.  So we go from 10G down to 1G of throughput.  vmotion will of course take 10G if you give it...

Reply
0 Kudos
lwatta
Hot Shot
Hot Shot

Valid concerns. As esnb's post explains you want to tie all the vmotion nics to the same fabric. This is pretty easy to do especially if you are using the VIC/Palo card. If you mix the vmotion nics between fabrics then yes the traffic has to go upstream from the UCS and come back down to the other FI.

louis

Reply
0 Kudos
bleibold
Contributor
Contributor

Makes sense. My next question would be, if you tie all your vMotion NICs to the same fabric, do you setup redundancy on your vMotion network? The two scenarios I am thinking of are:

1. No vMotion redundancy. Tie all the vMotion NICs (one vMotion NIC per server) to one fabric, all works well, vMotion traffic never leaves the fabric interconnect. Good performance and good security (in the past we were told to isolate vMotion traffic because it is sent in clear text). The vMotion VLAN doesn't even need to exist on the northbound switches. Bad part is if you lose a fabric, or one chassis loses an IOM or one blade loses a NIC, vMotions will be impacted, the scope of the impact depends on the type of outage.

2. Implement vMotion redundancy. Still tie the all the active vMotion NIC to one fabric, but have a standby NIC in the other fabric. Use a non routable layer 2 VLAN used only for vMotion and make it available to both fabric interconnects and on the upstream switches. 99% of the time, all vMotion traffic stays on the one fabric interconnect. In some type of failover event (one fabric down, lose an IOM, blade loses a NIC, etc.) vMotion traffic will still work, in some cases the vMotion traffic will have to traverse your northbound links to get to the other fabric if whatever outage you have had has put some blades in the vMotion request on different fabrics. Is it less secure as the clear text vMotion traffic has now had to leave the fabric interconnects and traverse your upstream switches? My guess is not much less secure, but maybe slightly.

Thoughts?

Thanks,

Bob

Reply
0 Kudos