VMware Cloud Community
FreddyFredFred
Hot Shot
Hot Shot

strange error on reconfigure vm task

Since migrating (fresh installs) to vCenter 6 and vRO appliance 6.0.3, I seem to be getting more and more random failures in elements that have never failed before. I've posted previously about random failures in accessing the host property of a datastore (still happening) and now, two days in a row, I get the strange error below.

The first time this happened I was using Orchestrator Java client while working on a new workflow. I ran my workflow and while trying to set the inputs, I clicked on the box to select a VC:VirtualMachine and on the right side of the dialog where you select a VM, I was getting text that was flashing with an error that looks similar to the error below.

The next day an overnight workflow was running. The error happened while running a config spec workflow (the same configspec section from the clone/sysprep windows workflow).

The error I got was:

Attribute xsi:nil not allowed on element spec, which is not nillable.

while parsing call information for method ReconfigVM_Task

at line 3, column 2

while parsing SOAP body

at line 2, column 1

while parsing SOAP envelope

at line 1, column 38

while parsing HTTP request for method reconfigure

on object of type vim.VirtualMachine

at line 1, column 0

Is there any logs I can check on the vCenter side to see what might be causing these random failures? Most of he time, waiting a few seconds (at worst a minute) and trying again seems to work. The only problem is all the automation grinds to a halt when this happens so I'm finding myself putting in more and more checks and loops to make things work properly for items that never used to fail.

0 Kudos
7 Replies
dvatov
VMware Employee
VMware Employee

Can you attach server.log? I would guess that some of the variables you use to initialize config spec is not initialized from time to time. The vCenter log to look at is vpxd-<max num>.log.

0 Kudos
FreddyFredFred
Hot Shot
Hot Shot

I've attached two sets of logs.

The '1' version is for the error I posted above.

The '2' version is for an error I posted about previously (where I can't access the host property of a storage object I'm checking). The logs make it seem like it's a  similar error and by chance, it happened as I was preparing the first set of logs. About 40 seconds after the failure the workflow was run again by another request and it was fine.

I did notice there's an exception a little earlier in the logs but I don't know if that's related.

Looking at the times in the vRO client (I hope all the times are in sync between vRO and vCenter so give or take a second or two perhaps):

for log set 1:

Workflow started at 2016-02-14 00:03:26.964

Error occurred at 2016-02-14 00:03:26.964

for log set 2:

workflow started: 2016-02-15 11:11:22.174

Error occurred at 2016-02-15 11:11:42.082

0 Kudos
dvatov
VMware Employee
VMware Employee

Yes, both errors seems to have common root cause. Which version of vCO and vCenter plugin are you using? Just saw you are on 6.0.3.

0 Kudos
FreddyFredFred
Hot Shot
Hot Shot

Any idea on what the problem is or how I can fix it (or if it's a vRO/vCenter problem?)

0 Kudos
dvatov
VMware Employee
VMware Employee

The properties that are not initialized are not recently introduced and there are no known issues in this part of the plugin. Is there a possibility requests to be cut on the wire so they arrive at vCenter incomplete. Just guessing. You may enable 'verbose' logging for vCenter to print soap requests arriving at him. On vRO side you can change client-config.wsdd file located in o11nplugin-vsphere.dar/o11nplugin-vsphere-core-XXX.jar/lib/ folder (dar is also in zip format). This will print in {VRO_HOME}/bin/axis-log.log file axis requests. This file is not rotated so disable this logging once not needed.

0 Kudos
FreddyFredFred
Hot Shot
Hot Shot

So if I'm understanding you correctly, I should grab this file off my vRO appliance: /usr/lib/vco/app-server/plugins/o11nplugin-vsphere.dar and replace the file in there with the one attached.

Take a backup of the original on the vRO server, replace with the new .dar file and then restart the vRO service/server?

At the same time, change vcenter to verbose logging and then to get a workflow to fail and revert all changes once I got the logs with a failure?

Thanks

0 Kudos
dvatov
VMware Employee
VMware Employee

You can do it one at  time. Maybe the problem will be clear after the first step.

0 Kudos