Cisco Nexus 1000v / Two hosts shown with VDS Statu...

GreyhoundHH · ‎09-15-2015

Hi,

Two of our ESXi hosts are being shown with "VDS status Warning" in out Cisco Nexus 1000v dvSwitch.It's just "Warning" not "Our of sync"...

VMs on these hosts seem to run properly but we're not able to vMotion VMs onto theses hosts. An error is reported than that all used networks are not accessible.

What can I do to identify the problem? Neither vSphere Client nor the web Client is providing any further information.

We're running this combination:

VEM Version: 4.2.1.2.2.3.0-3.1.1

VSM Version: 4.2(1)SV2(2.3)

System Version: VMware ESXi 5.1.0 Releasebuild-2191751

Any help appreciated.

Kind regards

BenLiebowitz · ‎09-15-2015

We ran the 1000v in our 4.1 environment but when we migrated to 5.x, we switched from the 1000v to a regular vDS switch.

Have you tried rebooting the hosts? What about restarting vCenter itself?

Ben Liebowitz, VCP vExpert 2015, 2016, & 2017 If you found my post helpful, please mark it as helpful or answered to award points.

grasshopper · ‎09-15-2015

Hi GreyhoundHH,

Please perform the following:

0. Gather a vm-support bundle from an affected host.

1. Generate a VEM support bundle (vem-support -t /var/tmp all)

Note: The vm-support already generates a vem-support, but I find it handy to have the VEM logs easily accessible to share with TAC

Tip: The '-t' chooses the target location to place the logs. Here we place them in the same location that the vm-support logs land by default (/var/tmp)

2. Use WinSCP in SCP Mode to download the logs from the ESXi host to your desktop (share with support as appropriate)

3. Determine the Primary VSM MAC Address:

vemcmd show card | grep 'Primary VSM MAC'

Tip: Also review the full output of the above (without the grep) so you can see overall status (i.e. headless mode, etc.)

4. Confirm VEM to VSM health / Connectivity by performing:

vem-health check <Primary VSM MAC from step #3 above>

5. Confirm that there are no Invalid or Orphan VMs (KB1003742).

6. Building on #5 above, confirm VM inventory looks healthy when logged directly into ESXi host via vSphere client as root (Virtual Machine tab, sort by State)

7. Review the Summary Page of the ESXi host. Do you see the expected port groups listed?

8. Check for APDs (KB2004684). If there was a storage blip affecting the 1000v and/or ESXi host, you could have ghosted dvPorts

The advice about restarting the affected host and vCenter is a good choice as well. Give the 1000v a full 12 minutes to come up fully after any restarts (desired, not required).

If the problem persists, you may need to iterate through some trial and error (i.e. disconnect/reconnect host, remove host from 1000v + revert to vSS and rejoin 1000v, etc.). When i find myself in these situations, I like to document all VMs on the affected host with PowerCLI. This includes gathering any Tag information, bluefolder membership, etc. in case I need to re-register VMs (unlikely, but sometimes required if facing orphan/invalid issues).

Those are the things I would normally start with as a server guy. If you haven't already, engage your network team so they can check their logs (our vem logs are only line card level, their logs are more like the switch). For example, they may want to run a 'show system redundancy status' and perhaps perform a fail-over from one VSM to the other during off-hours (i.e. make the other one primary.)

Personally, I would start by evacuating the affected hosts. Typically you can still vMotion off the host, just not target the host for new workloads (i.e. existing dvPorts still work headless, net-new dvPorts fail). If you can't evacuate a host, check for DRS rules (Cluster > Edit Settings).

Take no action unless the host is in maintenance mode.

All

Cisco Nexus 1000v / Two hosts shown with VDS Status warning