I'm running VMware Server 2.0.0, Build 122956, and Vix 1.6.2, Build 127388, on an Ubuntu 8.04.03-LTS system. The setup works reasonably well most of the time - I've been able to do VM management via the Perl portion of the API, and generally speaking, it's good to go.
The reason I'm here is that I'm having an intermittent failure of the system that makes no sense (as best I can tell). The system is primarily used to manage a pair of virtual machines that I'm using to run a virus testing sandbox, with my Perl script running a process that boils down to:
Grab infected file, copy it over to clean virtual machine A
Start tcpdump on the host OS (which requires me to fork off a child process so it doesn't block)
Run the infected file on virtual machine A (clean virtual machine B is just sitting there so that it can be infected, and I can capture the traffic going over the network)
Wait 5 minutes
Kill the tcpdump process (no further fork, just a syscall)
Revert both virtual machines to their clean, pre-infected state
Loop back to the start with another file.
It works well for about 2-3 days, before my Perl script hangs during the process of reverting the second virtual machine. At that point, my hostd.log shows the following:
Throw vmodl.fault.RequestCanceled
(vmodl.fault.RequestCanceled) {
dynamicType = ,
msg = ""
}
Throw vmodl.fault.RequestCanceled
(vmodl.fault.RequestCanceled) {
dynamicType = ,
msg = ""
}
Throw vmodl.fault.RequestCanceled
(vmodl.fault.RequestCanceled) {
dynamicType = ,
msg = ""
}
RecordOp REMOVE: sessionList, ha-sessionmgr
Event 3653 : User akirk logged out
RecordOp ASSIGN: latestEvent, ha-eventmgr
Obviously, I'm not manually issuing a cancellation request of any kind, since the goal is to just let my script run as long as it has files to process (which is essentially infinitely, given the amount of data I'm trying to look at). Is there any other reason such a request might be issued? Do I need to do a host disconnect and reconnect on each iteration of my loop? Or can VMware Server and/or VIX just not handle a long-running loop like that well?
The hostd logs suggest that its decided to kill the property collector used by the perl script, which could hang it since it'll never see answers.
The next-to-last line wiht the 'logout' is curious though -- that says that user 'akirk' disconnected from the server. Was that the user running the script?
There's no reason what you're doing shouldn't work, and its possible you're just hitting a bug in Server 2.0. Doing the connect/disconnect inside the loop may make it better, but we'd have to know why it cancelled the propertyCollector to know for sure.
First off, thanks for such a prompt and relevant response. I have to admit I figured no one would have a clue, so it's nice to see that someone is responding.
Yes, "akirk" was in fact the user running the script - that's the name of both the Linux user account the Perl script is being run under, and the VMware Server user that I'm connecting as.
I've already gone ahead and put the connect/disconnect inside of the loop, and the script is currently running. Unfortunately, it'll probably be a couple of days before this happens again, based on my experience to date.
In the meantime, is there anything I could supply (other logs, config info, etc.) that would help you determine what's going on here? I'm happy to try different things out, provide relevant info, etc. to get this resolved.
The user being logged out suggests that (unless the script somehow did it) that the network connection may have been dropped, which would explain the cancellations. Is there anything suspect before the RequestCancelled faults?
The only thing actually impacted is the script, correct? hostd keeps running happily without any restarts?
Given that the Perl script is running on the host operating system, I doubt it's a network connectivity issue - if lo0 is going down, I've got bigger problems.
As for other suspect stuff in the logs, I have them turned to verbose, so there's a ton to go through. Here's everything for the ~40 minutes before the failure:
RecordOp ASSIGN: info, haTask-96-vim.VirtualMachine.acquireMksTicket-55838455
RecordOp ASSIGN: info, haTask--vim.ServiceInstance.currentTime-55838456
RecordOp ASSIGN: info, haTask--vim.ServiceInstance.currentTime-55838457
SSL Handshake on client connection failed: SSL Exception:
SSL Handshake on client connection failed: SSL Exception:
SSL Handshake on client connection failed: SSL Exception:
SSL Handshake on client connection failed: SSL Exception:
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838458
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838459
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838460
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838461
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838462
RecordOp ASSIGN: info, haTask--vim.view.View.destroy-55838463
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838464
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838465
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838466
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838467
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838468
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838469
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838470
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838471
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838472
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838473
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838474
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838475
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838476
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838477
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838478
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838479
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838480
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838481
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55838482
...though scrolling back a bit, I see:
An error occurred while loading configuration "/usr/lib/vmware/settings",not all entries are being read. It is strongly encouraged that you manually inspect the file and fix any corruptions.
...as well as a whole slew of messages like this that occurred about 2 hours before my final failure:
Throw vmodl.fault.RequestCanceled
(vmodl.fault.RequestCanceled) {
dynamicType = ,
msg = ""
}
FWIW, all of these messages have different memory addresses in the "[http://N5Vmomi10ActivationE...|http://N5Vmomi10ActivationE...]" bit.
As far as what is impacted, I've had this happen several times, and the impact seems to vary. Before upgrading to Ubuntu 8.04 from 7.x (whatever the default version was there, I think 7.10), hostd would go down altogether, and I'd just have to reboot the box. When I saw this this morning (it's been somewhat low-priority for a bit, though it's now becoming more important), hostd was still up, and I could use a third virtual machine that was already powered on on the host; I just couldn't manually force a reversion to snapshot on any VM on the box (they all hung at 95% in the web GUI). Those attempts generated some more interesting errors:
RecordOp ASSIGN: info, haTask--vmodl.query.PropertyCollector.Filter.destroy-55839255
RecordOp ASSIGN: info, haTask-80-vim.EnvironmentBrowser.queryConfigOption-55839254
RecordOp ASSIGN: info, haTask-80-vim.EnvironmentBrowser.queryConfigTarget-55839256
RecordOp ASSIGN: info, haTask--vim.event.EventManager.createCollector-55839257
AdapterServer caught unexpected exception: vector::reserve
RecordOp ASSIGN: info, haTask--vim.HistoryCollector.setLatestPageSize-55839258
Activation : Invoke done on [vim.event.EventHistoryCollector:session[5269c103-d497-d344-50cd-66d2dc70effe]5223df77-adad-cc8d-ceb8-7fc5ba372807]
(vmodl.fault.SystemError) {
dynamicType = ,
reason = "",
msg = ""
}
...
Connection to server localhost:8308 failed with error Connection refused. Retrying...
Connection to server localhost:8308 failed with error Connection refused. Retrying...
Exception while processing request: Connection refused
RecordOp ASSIGN: info, haTask--vim.ServiceInstance.retrieveContent-55839304
RecordOp ASSIGN: info, haTask--vim.SessionManager.acquireLocalTicket-55839305
Event 3658 : Failed to login user root@127.0.0.1: No permission
RecordOp ASSIGN: latestEvent, ha-eventmgr
RecordOp ASSIGN: info, haTask--vim.SessionManager.login-55839306
(vim.fault.NoPermission) {
dynamicType = ,
object = 'vim.Folder:ha-folder-root',
privilegeId = "System.View",
msg = ""
}
Throw vim.fault.NotAuthenticated
(vim.fault.NotAuthenticated) {
dynamicType = ,
object = 'hostd.ServiceInstance:ha-internalsvc-service-instance',
privilegeId = "System.Read",
msg = ""
}
Any thoughts?
I didn't expect an actual networking (at hardware) issue, but rather that somebody ran out of fd's and got rude.
This is looking more like maybe hostd is getting sickly. It failing to talk to itself on 8308 looks really bad.
Any live VMs should happily keep running -- they're in their own process, and hostd won't intefere.
I think you can ignore the whine about the 'settings' file, its probably empty anyways.
If you haven't already, try the serve forums and see if anyone there has some ideas -- I'm on the Vix side of things.
Probably not so much on running out of fd's - this box is pretty much restricted to this task.
I'll go ahead and post to the server forums, but I reserve the right to say "he told me to!" if I get flamed for cross-posting.