I have at the moment a single ESXi host - 2 nics in a standard vSwitch plus 2 extra as uplinks from the Nexus module.
i'm not entirely sure what happened but after shutting the ESX server down and bringing it back up in a new location connectivity was lost completely. the Service Console was a port-group inside the Nexus switch so i think that must have failed. as the VSM was hosted on the same ESX it will not have come up at the time but i thought the VEM could run without the VSM?
anyway, I revmoved the VEM from the ESX server and reconfigured a few of the networks on the standard vSwitch to get it up and running. no problems there but a bit of a nightmare if it was a production system. i readded the ESX server to vCenter and everything looked ok EXCEPT vCenter thought the VEM was still installed. it wasn't. I tried to remove the distrbuted switch but this failed with a meaningless error message. i ended up removing the ESX server from vCenter and re-adding. Now the Distributed switch doesn't think it's installed but I now cant add the host to the nexus switch!!
I get this error message:
Cannot complete a Distributed Virtual Switch operation for one or more host members. - i've not got the detailed text to hand
is there any way to do a cleanup so it is completely uninstalled? this is becoming a bit of a nightmare!
Rob,
If you migrated the service console to the VEM and used the "system vlan" directive on both the port-profile for the Service Console and the uplink port-profile then I beleive it should have worked. If you still have the configuration of the VSM if you could post it so we could take a look it would help us debug what caused the issue. In production environments we recommend customers use HA mode which provides two VSMs one being primary and the second a standby. We recommend that these instances be kept on different ESX hosts to prevent an outage like you experienced.
When you say you removed the VEM how did you remove it? Did you use vem-remove -d? If you ran that command then the VEM module should be unloaded and all parts of the VEM removed.
If the vem parts are still on the ESX host (as in vemcmd still works) then do the following
root@cae-cali-172--#cd /usr/lib/ext/cisco/nexus/vem-v100/sbin
root@cae-cali-172--#hotswap.sh -u
root@cae-cali-172--vem-remove -d -v
Then try to add your host back. If the above fails or If /usr/lib/ext/cisco... does not exist then try the following to clean the DVS info out of ESX
1. Note the first line of "/usr/lib/vmware/bin/net-dvs -l". It will look like "switch af 34 3c ..." The hex sequence is the switch uuid in ASCII.
2. Try doing "net-dvs -d -n "switch-uuid-in-ascii"" so like --->
-
> root@cmhlab-vm4 ~]# /usr/lib/vmware/bin/net-dvs -d -n "d0 97 06 50 59 a2 f2 52-78 26 2f 5f ff 15 d4 2b"
Hopefully that will get you back up and running.
louis
hi Louis, thanks for the prompt reply!
i've had a run through those commands and it seems that the VEM isn't installed on the ESXi host anyway. to prove this i have built an extra ESXi server and i get the same issue. it looks like it is a problem with the vCenter server.
i have removed the extensions from vCenter and re-added as well as removing the DVS.
i have also redeployed the VSM and called it a different name to try and bypass this issue still no luck however!

this is the error message i get when i try to add a host (even a freshly installed one!)
here is the VSM config as it is at the moment..
NGNX-Nexus-VSM1(config-port-prof)# sh run
version 4.0(4)SV1(1)
telnet server enable
banner motd # NOTICE TO USERS=============================================================================This
=============================#
ssh key rsa 2048
ip domain-lookup
ip host NGNX-Nexus-VSM1 10.100.5.101
kernel core target 0.0.0.0
kernel core limit 1
system default switchport
vrf context management
ip route 0.0.0.0/0 10.100.5.254
switchname NGNX-Nexus-VSM1
vlan 1
vlan 100
name M100-Private-Management
vlan 101
name M101-Database-Servers
vlan 102
name M102-Customer-Facing
vlan 103
name M103-Internet-Facing
vlan 104
name M104-ESX-Service-Console
vlan 111
name M111-Nexus-Control
vlan 112
name M112-Nexus-Packet
vdc NGNX-Nexus-VSM1 id 1
limit-resource vlan minimum 16 maximum 513
limit-resource monitor-session minimum 0 maximum 64
limit-resource vrf minimum 16 maximum 8192
limit-resource port-channel minimum 0 maximum 256
limit-resource u4route-mem minimum 32 maximum 80
limit-resource u6route-mem minimum 16 maximum 48
port-profile system-uplink
capability uplink
vmware port-group
switchport mode trunk
switchport trunk allowed vlan 2-998
no shutdown
system vlan 111-112
state enabled
interface mgmt0
ip address 10.100.5.101/24
interface control0
boot kickstart bootflash:/nexus-1000v-kickstart-mz.4.0.4.SV1.1.bin sup-1
boot system bootflash:/nexus-1000v-mz.4.0.4.SV1.1.bin sup-1
boot kickstart bootflash:/nexus-1000v-kickstart-mz.4.0.4.SV1.1.bin sup-2
boot system bootflash:/nexus-1000v-mz.4.0.4.SV1.1.bin sup-2
svs-domain
domain id 2
control vlan 111
packet vlan 112
svs mode L2
NGNX-Nexus-VSM1(config-port-prof)#
NGNX-Nexus-VSM1(config-port-prof)#
NGNX-Nexus-VSM1(config-port-prof)#
NGNX-Nexus-VSM1(config-port-prof)#
NGNX-Nexus-VSM1(config-port-prof)#
NGNX-Nexus-VSM1(config-port-prof)# exit
NGNX-Nexus-VSM1(config)# svs connection VC
NGNX-Nexus-VSM1(config-svs-conn)# vmware dvs datacenter-name Nottingham
NGNX-Nexus-VSM1(config-svs-conn)# protocol vmware-vim
NGNX-Nexus-VSM1(config-svs-conn)# remote ip address 10.100.1.10
NGNX-Nexus-VSM1(config-svs-conn)# connect
Note: Command execution in progress..please wait
NGNX-Nexus-VSM1(config-svs-conn)# exit
Rob,
We are thinking that your VEM module might be installed correctly. Can you tell us how you installed the VEM module on the ESXi hosts? Did you use VUM or are you installing it manually with RCLI?
louis
Hi Louis,
i was using VUM. looking at the logs it looked like that failed due to a dodgy firewall. i've fixed that issue now and applied the VEM patch. I now get a different error..
i'm now just trying to remove all the extensions from the ESXi host and will try again.
p.s. is there any way of resetting the admin password on the VSM? for some reason i now cant log into it!
thanks again,
Rob
Rob,
To reset the password on the VSM you can use this guide
The error message you are getting now also can mean that the VEM module is not loaded. Make sure that VUM is going through all the steps to install the VEM. We have seen the VUM process die and cause errors like you are seeing. So make sure that the VUM processes are still running.
louis
hi Louis,
it looks to me like the VEM is being installed properly, i cant see anything strange in the logs. VUM shows the VEM patch as being installed ok.
have you got any other ideas?
Rob,
I'm out of ideas at this point. I can escalate to engineering if you have not found a workaround.
louis
Hi louis, I did get this working in the end - it was a combination of firewall ports between the update server and the ESX hosts and a seemingly corrupted ESXi install.
after starting afresh with the correct ports over and a clean install of ESXi it worked correctly. still not perfect every time but at least when it fails this time i can rebuild the host and re-add.
thanks for your help,
Rob
