Throwing vmodl.fault.RequestCanceled without actua...

schnarff · ‎07-28-2009

I'm running VMware Server 2.0.0, Build 122956, and Vix 1.6.2, Build 127388, on an Ubuntu 8.04.03-LTS system. The setup works reasonably well most of the time - I've been able to do VM management via the Perl portion of the API, and generally speaking, it's good to go.

The reason I'm here is that I'm having an intermittent failure of the system that makes no sense (as best I can tell). The system is primarily used to manage a pair of virtual machines that I'm using to run a virus testing sandbox, with my Perl script running a process that boils down to:

Grab infected file, copy it over to clean virtual machine A
Start tcpdump on the host OS (which requires me to fork off a child process so it doesn't block)
Run the infected file on virtual machine A (clean virtual machine B is just sitting there so that it can be infected, and I can capture the traffic going over the network)
Wait 5 minutes
Kill the tcpdump process (no further fork, just a syscall)
Revert both virtual machines to their clean, pre-infected state
Loop back to the start with another file.

It works well for about 2-3 days, before my Perl script hangs during the process of reverting the second virtual machine. At that point, my hostd.log shows the following:

Activation : Invoke done on

Throw vmodl.fault.RequestCanceled

Result:

(vmodl.fault.RequestCanceled) {

dynamicType = ,

msg = ""

}

Activation : Invoke done on

Throw vmodl.fault.RequestCanceled

Result:

(vmodl.fault.RequestCanceled) {

dynamicType = ,

msg = ""

}

Activation : Invoke done on

Throw vmodl.fault.RequestCanceled

Result:

(vmodl.fault.RequestCanceled) {

dynamicType = ,

msg = ""

}

RecordOp REMOVE: sessionList, ha-sessionmgr

Event 3653 : User akirk logged out

RecordOp ASSIGN: latestEvent, ha-eventmgr

Obviously, I'm not manually issuing a cancellation request of any kind, since the goal is to just let my script run as long as it has files to process (which is essentially infinitely, given the amount of data I'm trying to look at). Is there any other reason such a request might be issued? Do I need to do a host disconnect and reconnect on each iteration of my loop? Or can VMware Server and/or VIX just not handle a long-running loop like that well?

lemke · ‎07-28-2009

The hostd logs suggest that its decided to kill the property collector used by the perl script, which could hang it since it'll never see answers.

The next-to-last line wiht the 'logout' is curious though -- that says that user 'akirk' disconnected from the server. Was that the user running the script?

There's no reason what you're doing shouldn't work, and its possible you're just hitting a bug in Server 2.0. Doing the connect/disconnect inside the loop may make it better, but we'd have to know why it cancelled the propertyCollector to know for sure.

schnarff · ‎07-28-2009

First off, thanks for such a prompt and relevant response. I have to admit I figured no one would have a clue, so it's nice to see that someone is responding.

Yes, "akirk" was in fact the user running the script - that's the name of both the Linux user account the Perl script is being run under, and the VMware Server user that I'm connecting as.

I've already gone ahead and put the connect/disconnect inside of the loop, and the script is currently running. Unfortunately, it'll probably be a couple of days before this happens again, based on my experience to date.

In the meantime, is there anything I could supply (other logs, config info, etc.) that would help you determine what's going on here? I'm happy to try different things out, provide relevant info, etc. to get this resolved.

lemke · ‎07-28-2009

The user being logged out suggests that (unless the script somehow did it) that the network connection may have been dropped, which would explain the cancellations. Is there anything suspect before the RequestCancelled faults?

The only thing actually impacted is the script, correct? hostd keeps running happily without any restarts?

schnarff · ‎07-28-2009

Given that the Perl script is running on the host operating system, I doubt it's a network connectivity issue - if lo0 is going down, I've got bigger problems.

As for other suspect stuff in the logs, I have them turned to verbose, so there's a ton to go through. Here's everything for the ~40 minutes before the failure:

RecordOp ASSIGN: info, haTask-96-vim.VirtualMachine.acquireMksTicket-55838455

RecordOp ASSIGN: info, haTask--vim.ServiceInstance.currentTime-55838456

RecordOp ASSIGN: info, haTask--vim.ServiceInstance.currentTime-55838457

SSL Handshake on client connection failed: SSL Exception:

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838458

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838459

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838460

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838461

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838462

RecordOp ASSIGN: info, haTask--vim.view.View.destroy-55838463

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838464

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838465

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838466

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838467

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838468

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838469

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838470

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838471

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838472

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838473

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838474

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838475

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838476

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838477

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838478

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838479

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838480

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838481

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838482

...though scrolling back a bit, I see:

An error occurred while loading configuration "/usr/lib/vmware/settings",not all entries are being read. It is strongly encouraged that you manually inspect the file and fix any corruptions.

...as well as a whole slew of messages like this that occurred about 2 hours before my final failure:

Activation : Invoke done on

Throw vmodl.fault.RequestCanceled

Result:

(vmodl.fault.RequestCanceled) {

dynamicType = ,

msg = ""

}

FWIW, all of these messages have different memory addresses in the "[http://N5Vmomi10ActivationE...|http://N5Vmomi10ActivationE...]" bit.

As far as what is impacted, I've had this happen several times, and the impact seems to vary. Before upgrading to Ubuntu 8.04 from 7.x (whatever the default version was there, I think 7.10), hostd would go down altogether, and I'd just have to reboot the box. When I saw this this morning (it's been somewhat low-priority for a bit, though it's now becoming more important), hostd was still up, and I could use a third virtual machine that was already powered on on the host; I just couldn't manually force a reversion to snapshot on any VM on the box (they all hung at 95% in the web GUI). Those attempts generated some more interesting errors:

RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55839255

RecordOp ASSIGN: info, haTask-80-vim.EnvironmentBrowser.queryConfigOption-55839254

RecordOp ASSIGN: info, haTask-80-vim.EnvironmentBrowser.queryConfigTarget-55839256

RecordOp ASSIGN: info, haTask--vim.event.EventManager.createCollector-55839257

AdapterServer caught unexpected exception: vector::reserve

RecordOp ASSIGN: info, haTask--vim.HistoryCollector.setLatestPageSize-55839258

Activation : Invoke done on [vim.event.EventHistoryCollector:session[5269c103-d497-d344-50cd-66d2dc70effe]5223df77-adad-cc8d-ceb8-7fc5ba372807]

Throw vmodl.fault.SystemError

Result:

(vmodl.fault.SystemError) {

dynamicType = ,

reason = "",

msg = ""

}

...

Connection to server localhost:8308 failed with error Connection refused. Retrying...

Exception while processing request: Connection refused

RecordOp ASSIGN: info, haTask--vim.ServiceInstance.retrieveContent-55839304

RecordOp ASSIGN: info, haTask--vim.SessionManager.acquireLocalTicket-55839305

: User root

Event 3658 : Failed to login user root@127.0.0.1: No permission

RecordOp ASSIGN: latestEvent, ha-eventmgr

RecordOp ASSIGN: info, haTask--vim.SessionManager.login-55839306

Activation : Invoke done on

Throw vim.fault.NoPermission

Result:

(vim.fault.NoPermission) {

dynamicType = ,

object = 'vim.Folder:ha-folder-root',

privilegeId = "System.View",

msg = ""

}

Activation : Invoke done on

Throw vim.fault.NotAuthenticated

Result:

(vim.fault.NotAuthenticated) {

dynamicType = ,

object = 'hostd.ServiceInstance:ha-internalsvc-service-instance',

privilegeId = "System.Read",

msg = ""

}

Any thoughts?

lemke · ‎07-28-2009

I didn't expect an actual networking (at hardware) issue, but rather that somebody ran out of fd's and got rude.

This is looking more like maybe hostd is getting sickly. It failing to talk to itself on 8308 looks really bad.

Any live VMs should happily keep running -- they're in their own process, and hostd won't intefere.

I think you can ignore the whine about the 'settings' file, its probably empty anyways.

If you haven't already, try the serve forums and see if anyone there has some ideas -- I'm on the Vix side of things.

schnarff · ‎07-28-2009

Probably not so much on running out of fd's - this box is pretty much restricted to this task.

I'll go ahead and post to the server forums, but I reserve the right to say "he told me to!" if I get flamed for cross-posting.

All

Throwing vmodl.fault.RequestCanceled without actually having cancelled a request