Since migrating (fresh installs) to vCenter 6 and vRO appliance 6.0.3, I seem to be getting more and more random failures in elements that have never failed before. I've posted previously about random failures in accessing the host property of a datastore (still happening) and now, two days in a row, I get the strange error below.
The first time this happened I was using Orchestrator Java client while working on a new workflow. I ran my workflow and while trying to set the inputs, I clicked on the box to select a VC:VirtualMachine and on the right side of the dialog where you select a VM, I was getting text that was flashing with an error that looks similar to the error below.
The next day an overnight workflow was running. The error happened while running a config spec workflow (the same configspec section from the clone/sysprep windows workflow).
The error I got was:
Attribute xsi:nil not allowed on element spec, which is not nillable.
while parsing call information for method ReconfigVM_Task
at line 3, column 2
while parsing SOAP body
at line 2, column 1
while parsing SOAP envelope
at line 1, column 38
while parsing HTTP request for method reconfigure
on object of type vim.VirtualMachine
at line 1, column 0
Is there any logs I can check on the vCenter side to see what might be causing these random failures? Most of he time, waiting a few seconds (at worst a minute) and trying again seems to work. The only problem is all the automation grinds to a halt when this happens so I'm finding myself putting in more and more checks and loops to make things work properly for items that never used to fail.
Can you attach server.log? I would guess that some of the variables you use to initialize config spec is not initialized from time to time. The vCenter log to look at is vpxd-<max num>.log.
I've attached two sets of logs.
The '1' version is for the error I posted above.
The '2' version is for an error I posted about previously (where I can't access the host property of a storage object I'm checking). The logs make it seem like it's a similar error and by chance, it happened as I was preparing the first set of logs. About 40 seconds after the failure the workflow was run again by another request and it was fine.
I did notice there's an exception a little earlier in the logs but I don't know if that's related.
Looking at the times in the vRO client (I hope all the times are in sync between vRO and vCenter so give or take a second or two perhaps):
for log set 1:
Workflow started at 2016-02-14 00:03:26.964
Error occurred at 2016-02-14 00:03:26.964
for log set 2:
workflow started: 2016-02-15 11:11:22.174
Error occurred at 2016-02-15 11:11:42.082
Yes, both errors seems to have common root cause. Which version of vCO and vCenter plugin are you using? Just saw you are on 6.0.3.
Any idea on what the problem is or how I can fix it (or if it's a vRO/vCenter problem?)
The properties that are not initialized are not recently introduced and there are no known issues in this part of the plugin. Is there a possibility requests to be cut on the wire so they arrive at vCenter incomplete. Just guessing. You may enable 'verbose' logging for vCenter to print soap requests arriving at him. On vRO side you can change client-config.wsdd file located in o11nplugin-vsphere.dar/o11nplugin-vsphere-core-XXX.jar/lib/ folder (dar is also in zip format). This will print in {VRO_HOME}/bin/axis-log.log file axis requests. This file is not rotated so disable this logging once not needed.
So if I'm understanding you correctly, I should grab this file off my vRO appliance: /usr/lib/vco/app-server/plugins/o11nplugin-vsphere.dar and replace the file in there with the one attached.
Take a backup of the original on the vRO server, replace with the new .dar file and then restart the vRO service/server?
At the same time, change vcenter to verbose logging and then to get a workflow to fail and revert all changes once I got the logs with a failure?
Thanks
You can do it one at time. Maybe the problem will be clear after the first step.