VMware Cloud Community
virtualkannan
Contributor
Contributor

Error while adding a host to Cisco Nexus 1000v vDS

Hi All,

We have esx hosts running on 4.1.0, 260247 Version.  We are trying to add this hosts to the Cisco Nexus 1000v vDS.  We did the following;

(1)  My Network team gave me a XML file of Cisco Nexus 1000v Switch

(2) They asked me to just add this to the Plug ins

(3) It is in the Available Plung-ins column but not downloaded and installed (The network team doesn't require that "download and install" option)..a couple of other switches which are already in use are in that column only

(4) I right clicked on the new switch created and selected the option "Add Host" and then selected the required host and then selected 4 vmnics 2 for production uplink and 2 for system uplink

(5) then I just moved by clicking "Next" then "Finish"

(6) In the Tasks section I can see several tasks ran then it gave an error message as follows;

Cannot complete a vNetwork Distributed Switch operation for one or more host members

"vDS operation failed on host xxxxxxx.domainname,  got (vmodl.fault.SystemError) exception"

When I click the "Submit Report" the following information is available;

Event Details:

Type: info

Time: 12/16/2010 8:45:19 PM

Task Details:

Name: ReconfigureDvs_Task

Descriptiion: Reconfigure vNetwork Distributed Switch

Error Stack:

vDS operation failed on host xxxxxxxx.domainname, got (vmodl.fault.SystemError) exception

Additional Event Details:

Tye ID: Info

DataCenter Objecte Type: Datacenter

DVS Object Type: DistributedVirtualSwitch

ChainID: 70374

Additional Task Details:

VCBuild: 258902

Task ID: Task

Cancelable: False

Cancelled: False

Description ID: DistributedVirtualSwitch.reconfigure

Event Chain ID: 70374

But the same switch is working well and good in other esx hosts with the same version of ESX

Any idea or guidance on this error message.  Or if I need to explain it still more please tell me.

Any help in this would be greatly appreciated.

Thank You.

0 Kudos
20 Replies
lwatta
Hot Shot
Hot Shot

Most likely the problem is VUM. Since you say it worked fine with other hosts at the same version there must be something about this host that VUM does not like.

When you install an ESX host into a Nexus 1000V vCenter talks to the VSM and to VUM. VUM pulls the nexus 1000V code for your ESX host from the VSM and then installs it on the ESX host. The error you are getting is a generic error from VUM.

First thing to check are the VUM logs on the server where you installed VUM. The logs are in C:\Documents and Settings\All Users\Application Data\VMware\VMware Update Manager\Logs

Take a look at the log and it should point you to the problem.

Also make sure you can still get the webserver on the Nexus 1000V and that VUM is still running on your server. If you simply want to get the host up and running you can manually install the bits using esxupdate.

louis

0 Kudos
DSeaman
Enthusiast
Enthusiast

Did you verify the VEM is installed on the host you are trying to add to the vDS? 

Derek Seaman
0 Kudos
RBurns-WIS
Enthusiast
Enthusiast

One thing to note is VUM will not allow you to push VEM bits & add the host running the vCenter VM to the DVS.  If this is the case, just VMotion the VC VM to another host, and then re-add the problem host to the DVS.

What build is the "problem host"?

What is the build of a known "successfully added" host?

Regards,

Robert

0 Kudos
virtualkannan
Contributor
Contributor

Hi Iwatta,


Thanks for the information.

I checked the logs of the Update Manager.  I'm able to patch other esx servers without any issues.  Only these hosts in a particular cluster (I've a cluster with 3 esx hosts).  Only in this cluster this problem is coming up..

Yes I saw an update from a link from VMWare KB to install manually.

I'm still checking with.

Please tell me if I need to give any more information or any specific iniformation on this.

Thanks for the help.

0 Kudos
virtualkannan
Contributor
Contributor

Hi Dseaman,

Yes.. this is another important information which I got from a internet link.  The link helped with the steps on how to install/configure all of the required settings.

But unfortunately I missed that link.  Searching for it again.  If it is possible could you please post any steps or links to check/configure this.

Thanks for the help and support.

Thank You.

0 Kudos
virtualkannan
Contributor
Contributor

Hi Robert,

No, the VC is not running on the host which we are trying to connect to the vDS.

The build of the "Problem Host" is --  ESX, 4.1.0, 320092 (3 hosts in a cluster are in this same build) (The cluster is HA & DRS Enabled)

The build of the known "Successful Hosts" are -- ESX, 4.0.0, 208167 (3 Hosts in a cluster are in this same build) (The cluster is HA & DRS Enabled)

                                                                    -- ESX, 4.1.0, 260247 (2 Hosts in a cluster are in this same build) (This cluster is HA & DRS Enabled)  

Thanks for the help.

Please tell me if I need to give any more information or any specific information to help me with.

Thank You.

0 Kudos
RBurns-WIS
Enthusiast
Enthusiast

Are you able to install the 4.1 VEM software manually, and then add the host - successfully?

Just to confirm you're trying to install version SV1.3 or later of the 1000v correct?

Robert

0 Kudos
gogogo5
Hot Shot
Hot Shot

Hello

I'd like to chime in on this as we are experiencing identical issues as yourself (even down to the error message on the Submit dialog box).  I have logged an SR with VMware and they are currently investigating.
However to date I can share my observations:
1.  We run ESXi 4.1 on HP ProLiant BL460c blades and are using SV1.3b of the N1K.
2.  We have seen this issue on build 260247 and 320137.  I don't believe this issues is related to the build of ESXi.
3.  In reply to RBurns-WIS - yes, we are able to manually install the VEM via baseline (having added the correct VEM as part of the baseline) then added the host, all works fine.
4.  Virtualkannan - I have found that the VEM is successfully copied to the ESXi host but then the "vDS operation failed on host xxxxxxx.domainname,  got (vmodl.fault.SystemError) exception" pop up box appears.
You can check that the VEM binary has successfully copied by enabling Local or Remote TSM.  On one of your hosts that is having this issue, at the CLI type the following:
cd /tmp/updatecache
then type:
ls -ltrhs
You should see a file called cross_cisco-vem-v122-esx_4.0.4.1.3.2.0-2.0.1.vib (remember I am using vSV13b).
5.  I have also seen this issue when using SV13a in our lab too.
So it seems that the VEM binary file is being copied to the host ready for install but then the initiation of the install seems to be the problem.
Please let me know if the VEM binary is copied to your problem ESXi host.
Cheers
-gogogo5
RBurns-WIS
Enthusiast
Enthusiast

Thanks for your input gogogo.

We see this problem often.  Usually it's a VUM issue which doesn't invoke the installer correctly.  Manual obviously works, the VEM vib does get copied.

Let us know if VMware comes back with any explanation.

Robert

0 Kudos
virtualkannan
Contributor
Contributor

Hello gogogo5,

Thank you very much for the inputs given.  I'll check what you mentioned and will update you.

I'll also post the server that hosts the esx too. 

Again thank you all much for your support and help.

0 Kudos
virtualkannan
Contributor
Contributor

Hi All,

This issue is resolved.  The actual problem is with the Update Manager and that was corrected by re-configuring that.

I take this opportunity to thank you all for your valuable assistance and insight given in this.

Thank You.

0 Kudos
gogogo5
Hot Shot
Hot Shot

Hello virtualkannan - when you say it was fixed by reconfiguring VUM, can you be more specific over what you did?

Thanks.

0 Kudos
funkyjunky
Contributor
Contributor

I've got this same problem, I've turned off HA on ESX cluster and VEM installation using VUM was successfull.

0 Kudos
gogogo5
Hot Shot
Hot Shot

Good one, just tried that (disabling HA) and VUM/VEM installation was successful.

Thanks for sharing your tip!  I'll relay this to the SR I have open.

-gogogo5

0 Kudos
funky_junky
Contributor
Contributor

I think it's a bad idea because disabling HA, DRS, FT and DPM are prerequisites for instaaling VEM:

"When performing any VUM operation on hosts which are a part of a cluster, ensure that VMWare High Availability (HA), VMware Fault Tolerance (FT), and VMware Distributed Power Management (DPM) features are disabled for the entire cluster. Otherwise, VUM will fail to upgrade the hosts in the cluster", page 5, document "Cisco Nexus 1000V Virtual Ethernet Module Software Installation and Upgrade Guide, Release 4.0(4) SV1(3)"

0 Kudos
andrew_axon
Enthusiast
Enthusiast

We are using ESXi 4.1.0, 348481 and had the same issue adding a host to a vDS. We are using Nexus 1010.

There appears to be a bug with VUM adding the host to the vDS in this way.

I can see from the logs that VUM downloads the correct vib from the VSM and stages it to the host, however it fails with error “vDS operation failed on host <hostname>, got (vmodl.fault.systemerror)"

Our fix was to create a baseline within VUM with the correct Nexus VEM patch and remediate hosts using VUM.

We were then able to add the host to the vDS successfully.

0 Kudos
todd_shawcross
Contributor
Contributor

I have the same issue and I can get it to work after disabling HA too, however this isn't ideal.  Did you ever get a fix for this from VMware for your open SR??

0 Kudos
LunThrasher
Enthusiast
Enthusiast

I was just reading up on this thread as I was having a similar problem and just wanted to share what I did to fix the issue:

- I tried to remediate manually using VUM but got an error that the host is in a HA cluster with HA and DRS enabled.

- We have a lot of vApps created so I couldn't remove HA on the cluster due to vApps

- I put the host into maintenance mode, removed it from the cluster and added it back into vcenter as a single server (i.e. not in any cluster)

- Went into networking and added the host to the vsm with all physical adapters and migrating the management network as well. Successful !!

- Re-add the host back into the cluster and take it out of maintenance mode

Tutorials for System Admins www.sysadmintutorials.com
0 Kudos
simon_wright
Enthusiast
Enthusiast

Thanks this worked perfect, but why it does not work when the host is in a cluster is beyond me.

0 Kudos