LFC
Enthusiast
Enthusiast

Nexus 1000v VSM Issues

I have a vSphere 5.x environment consisting of 10 ESXi hosts. We have succesfully deployed a CIsco Nexus 1000v solution L3 implementation using 2 x VSM virtual appliances configured for HA failover. The ESX hosts are connected to 2 x Cisco Nexus 2000 Fabric extenders (using 2 x 10g adapters) which are each connected to 2 x Nexus 5000 switches.

We only have 2 x 10g adapters configured in the hosts, which are both uplinks to our Nexus dvSwitch (i.e. no Standard vSwitches).

We  also run a Virtual vCenter server which uses the dvSwitch. This was all checked for supportability and compatibility with Cisco at the design stage of the project

We are currently doing intrusive testing of the solution prior to handing it over to the customer, and have come across a potential issue wherby if both of our Nexus 1000V appliances are shutdown, when they are restarted they cannot connect to the network. To make thigs worse, if a VMotion is performed in our cluster, the VM loses network connectivity on the receiving host - moving it back to the source host does not fix this (no vethernet port exists).

Th only way to fix this is to take one of the 10g adapters out of the dvSwitch, create a Standard vSwitch and connect the 10g adapter to it and create a Standard port group on the same VLAN as the vCenter, change the Primary VSM to use this port group and hey presto, everything springs into life again.

Once we start the secondary VSM (still connected to the dvs) the two VSMs succesfully form a HA pair, at which point we can move the primary VSM back to the dvs.

To minimise the risk of both of our VSM pairs being down, we have implemented a DRS host group and a rule so VSM1 runs on hosts 1-5 and VSM2 on hosts 6-10, however I need to deliver a fix for this issue!

Thanks in advance

0 Kudos
7 Replies
lwatta
Hot Shot
Hot Shot

I would suspect that the reason your VSMs cannot reconnect to the network after shutdown/reboot is a configuration issue. Make sure your uplink port-profile and veth port-profiles that the control, mgmt, and packet network interfaces are assigned to are set with a system vlan. What you have configured is a supported configuration and does work, it's probably just the system vlan issue. Feel free to paste your running config and we can take a look.

As far as vmotion when the VSMs are down, that is expected. Keep in mind the VSM is the control plane of the N1KV solution and when a VM moves from one ESX host to another it requires changes on the control plane. If the VSM is down the VEMs cannot get the programming they need from the VSM to allow the VM access to the network.  We are working a solution to this problem but I couldn't give you a timeline on when it will be available.

You should always use affinity rules so that the primary and secondary VSMs are are different ESX hosts. We also recommend that you take the VSMs out of DRS control. We've seen issues where aggressive DRS can cause the VSMs to drop heartbeats. We recommend manual vmotion only of the VSMs.

louis

LFC
Enthusiast
Enthusiast

I would suspect that the reason your VSMs cannot reconnect to the network after shutdown/reboot is a configuration issue. Make sure your uplink port-profile and veth port-profiles that the control, mgmt, and packet network interfaces are assigned to are set with a system vlan.

We have configured System VLANs for a number of VLANs - those used by vCenter+VSM (VLAN74),ESXi Management, and VEM (VLAN75), VMotion (VLAN76) and FT (VLAN77)

What you have configured is a supported configuration and does work, it's probably just the system vlan issue. Feel free to paste your running config and we can take a look.

I could send you this privatley, as my client would not allow me to post this in the public domain

As far as vmotion when the VSMs are down, that is expected. Keep in mind the VSM is the control plane of the N1KV solution and when a VM moves from one ESX host to another it requires changes on the control plane. If the VSM is down the VEMs cannot get the programming they need from the VSM to allow the VM access to the network.  We are working a solution to this problem but I couldn't give you a timeline on when it will be available.

From the Nexus 1000 v documentation: While the VSM is down, the VEMs continue to forward traffic using the last known configuration. Any new virtual machines that are started on those VEMs will not have connectivity because the VSM will not be available to set up the port configurations. When the virtual machine is migrated, the virtual Ethernet (vEth) ports will not be configured on the new host because the VSM is not there.

When the VSM is a VM, does the above not apply to the VSM?

You should always use affinity rules so that the primary and secondary VSMs are are different ESX hosts. We also recommend that you take the VSMs out of DRS control. We've seen issues where aggressive DRS can cause the VSMs to drop heartbeats. We recommend manual vmotion only of the VSMs.

Anti-Affinity Rules are already in place and tested as working correctly

0 Kudos
lwatta
Hot Shot
Hot Shot

If you wouldn't mind sending the running config that would really help. You can private message it to me or send to me via email (lwatta@cisco.com).

When the VSM is a VM you get around control plane issues by using system vlans. The system vlans get the VSM on the network and talking even when the VEM can't get programmed.

louis

0 Kudos
LFC
Enthusiast
Enthusiast

Louis

The config should be in your mailbox

Regards

Sean

0 Kudos
LFC
Enthusiast
Enthusiast

Louis,

Your advice to change the Capability L3Control: yes statement to "no" on port-profile VLAN_74_VDC_Management has worked! The VSM can now connect to the network after having been shutdown.

Many thanks for your peseverance, its very much appreciated!

Regards,

Sean

0 Kudos
LFC
Enthusiast
Enthusiast

Louis,

Our network team is asking if we need to have the System VLANs for 74,75,76 and 77 defined on the uplink port-profile in addition to them being defined on the port-profiles themselves.

They seem to think this is not required. Can you offer any advice?

Thanks again


Sean

0 Kudos
lwatta
Hot Shot
Hot Shot

They need to be on the uplink port-profile as well. If you think of the VEM as a switch with veths coming into th switch and eths going out(upstream) if you put system vlans on only the veth port-profiles that allows the VMs and vmk interfaces to talk into the VEM and to other veths but without system vlans on the eth ports the traffic can never leave or enter the VEM from the upstream switch.

I'll send you a few slides that helps explain the concept and why we need system vlans and where to put them.

louis

0 Kudos