Hello,
I'm currently facing a quite annoying problem.
I've installed the SDK for perl with the aim of monitoring ESX server resources. Provided pl scripts (in apps directory) are working correctly. However the scripts take an amazing long time to execute (about 40-60 seconds minimum!). After checking log files on the ESX server, it seems that there's an important delay between the initiation of the "Util::connect" and the "connected/authenticated" message in the ESX syslog.
I've tried SDK for perl 1.6 and the vSphere SDK for Perl 4, without differences.
Does someone have already faced that performance problem?
Any clue or investigation recommandations would be very appreciated.
Yannick
Environment: ESX 3.5, perl scripts launched on Debian Server (Lenny
Hereunder an exemple of script used:
/var/tmp/testsimple.pl --entity HostSystem --server 192.168.1.200 --username xxx --password xxx
Lenny$ cat /var/tmp/testsimple.pl
#!/usr/bin/perl
use strict;
use warnings;
use VMware::VIRuntime;
my %opts = (
entity => {
type => "=s",
variable => "VI_ENTITY",
help => "ManagedEntity type: HostSystem, etc",
required => 1,
},
);
Opts::add_options(%opts);
Opts::parse();
Opts::validate();
Util::connect();
Obtain all inventory objects of the specified type
my $entity_type = Opts::get_option('entity');
my $entity_views = Vim::find_entity_views(
view_type => $entity_type);
Process the findings and output to the console
foreach my $entity_view (@$entity_views) {
my $entity_name = $entity_view->name;
Util::trace(0, "Found $entity_type:
$entity_name\n");
}
Disconnect from the server
Util::disconnect();
There might be something worth investigating in the authentication delay. However, I ran your script against my current environment (~100 hosts, no VMs yet). The script execution was painfully slow. I run into this problem with large environments quite frequently, but there is a fix.
So when you use the find_entity_views() call by default it will fetch all properties of an entity. This can be quite expensive in terms of script run time. It's particularly bad when you have a _view call nested in a loop.
You can probably dramatically increase the speed of your script by changing it to the following:
#!/usr/bin/perl use strict; use warnings; use VMware::VIRuntime; my %opts = ( entity => { type => "=s", variable => "VI_ENTITY", help => "ManagedEntity type: HostSystem, etc", required => 1, }, ); Opts::add_options(%opts); Opts::parse(); Opts::validate(); Util::connect(); # Obtain all inventory objects of the specified type my $entity_type = Opts::get_option('entity'); my $entity_views = Vim::find_entity_views( view_type => $entity_type, properties => \[ 'name' ]); # Remove the backslash before the '[' here, the forum will not parse it properly. # Process the findings and output to the console my $entity_cnt = @{$entity_view}; foreach my $entity_view (@$entity_views) { my $entity_name = $entity_view->name; Util::trace(0, "Found $entity_type: $entity_name\n"); } # Disconnect from the server Util::disconnect();
If you notice the properties argument to the find_entity_views() call. This will fetch only the name property. You can of course add additional properties to the array reference. But you'll find this will dramatically increase your script execution time. In my environment, the script went from taking nearly several minutes to executing in under 10s.
There might be something worth investigating in the authentication delay. However, I ran your script against my current environment (~100 hosts, no VMs yet). The script execution was painfully slow. I run into this problem with large environments quite frequently, but there is a fix.
So when you use the find_entity_views() call by default it will fetch all properties of an entity. This can be quite expensive in terms of script run time. It's particularly bad when you have a _view call nested in a loop.
You can probably dramatically increase the speed of your script by changing it to the following:
#!/usr/bin/perl use strict; use warnings; use VMware::VIRuntime; my %opts = ( entity => { type => "=s", variable => "VI_ENTITY", help => "ManagedEntity type: HostSystem, etc", required => 1, }, ); Opts::add_options(%opts); Opts::parse(); Opts::validate(); Util::connect(); # Obtain all inventory objects of the specified type my $entity_type = Opts::get_option('entity'); my $entity_views = Vim::find_entity_views( view_type => $entity_type, properties => \[ 'name' ]); # Remove the backslash before the '[' here, the forum will not parse it properly. # Process the findings and output to the console my $entity_cnt = @{$entity_view}; foreach my $entity_view (@$entity_views) { my $entity_name = $entity_view->name; Util::trace(0, "Found $entity_type: $entity_name\n"); } # Disconnect from the server Util::disconnect();
If you notice the properties argument to the find_entity_views() call. This will fetch only the name property. You can of course add additional properties to the array reference. But you'll find this will dramatically increase your script execution time. In my environment, the script went from taking nearly several minutes to executing in under 10s.
Thanks for the answer, I will give it a try and come with feedbacks soon.
Yannick
Hi stumpr,
The slowness problem came from 2 sides
First of all, an important kernel-mode cpu load was responsible of the general slowness of the script execution during kernel calls. Hereunder a debug showing the time spent for each script steps.
Lenny:/var/tmp# ./test-20090610.pl --entity HostSystem --server 192.168.1.200 --username xxx --password xxx
Script start
10-06-09 00:27:59 - Options defined
10-06-09 00:27:59 - Options added
10-06-09 00:27:59 - Options parsed
10-06-09 00:27:59 - Options validated
10-06-09 00:28:32 - Connected <= 33"
10-06-09 00:28:32 - Opts::get_option finished
10-06-09 00:29:02 - Vim::find_entity_views finished <= 30"
10-06-09 00:29:02 - Found HostSystem:
localhost.localdomain
10-06-09 00:29:02 - Name extraction finished
10-06-09 00:29:02 - Disconnected
After reducing the kernel-mode load average:
Lenny:/var/tmp# ./test-20090610.pl --entity HostSystem --server 192.168.1.200 --username xxx --password xxx
Script start
10-06-09 00:31:06 - Options defined
10-06-09 00:31:06 - Options added
10-06-09 00:31:06 - Options parsed
10-06-09 00:31:06 - Options validated
10-06-09 00:31:09 - Connected <= 3"
10-06-09 00:31:09 - Opts::get_option finished
10-06-09 00:31:26 - Vim::find_entity_views finished <= 20"
10-06-09 00:31:26 - Found HostSystem:
localhost.localdomain
10-06-09 00:31:26 - Name extraction finished
10-06-09 00:31:26 - Disconnected
Then I've applied the changes you proposed:
Lenny:/var/tmp# ./test2-20090610.pl --entity HostSystem --server 192.168.1.200 --username xxx --password xxx
Script start
10-06-09 01:02:45 - Options defined
10-06-09 01:02:45 - Options added
10-06-09 01:02:45 - Options parsed
10-06-09 01:02:45 - Options validated
10-06-09 01:02:47 - Connected
10-06-09 01:02:47 - Opts::get_option finished
10-06-09 01:02:48 - Vim::find_entity_views finished
10-06-09 01:02:48 - Found HostSystem: localhost.localdomain
10-06-09 01:02:48 - Name extraction finished
10-06-09 01:02:48 - Disconnected
So wonderful, the script execution time went from 2'03'' to 3''
Thanks stumpr
I have the same kind of problem when using nagios check_esx perl script .
I have to wait 33 seconds for a results when i ask cpu usage for intance.
i tried toolkit 1.5 , 1.6 and vsphere with the same output
Any idea ?
Regards
It's probably the same general issue. By default the VI Perl API will get all properties of a managed entity (host, vm, etc). This is obviously a large set of data. If you place this call into a for loop (or even worse, a nested for loop), performance for a script will be slow. However, if you just ask for the properties you require it will signifigantly speed up (orders of magnitude). It obviously makes a large difference if you're iterating over a large set of VMs.
There is a properties named parameter that can be passed to the view call to speed up the script. You of course have to know what properties you'll be using for each VM, but that isn't too hard to determine from a script.
So I did a quick pass over a script I found called check_vmware3.pl. It doesn't use this properties named argument in its view calls. I'm not surprised b/c until vSphere Perl, there was no mention of it in the documentation. I think I found it after someone discussions here on the forums while looking over the Perl SDK source.
The answer is likely to contact the maintainer of the plugin code and have them update the plugin. You can have them PM me here if they need more details on it. You could also update the script with the properties parameter where appropriate. Of course you'll have to keep making those changes everytime a new revision is released.
That looks almost identical to the script I saw. It's well written, good script. It just doesn't make use of the properties named argument, but that's not at fault of the script author(s). The parameter I'm talking about wasn't documented and for a lot of VI Perl scripters, not a critical issue. I only really find it necessary when I work with large environments, though I can see it being more an issue in your side with your monitoring tools (especially if you're working within a set polling period).
It would not be hard to update the script and it would likely result in a signifigant performance increase (lower script execution time). However, not being the author of this script, it would take me a couple hours to read through the code to see what values I need to update with the properties named argument. Also, I don't have a nagios environment with which to test the script, I'd have to hack out the script a bit to not work as a Nagios plugin to do the work myself.
I think the best path is to open a dialogue with the script author, which from the comments seems to be:
If you like, you can direct him to this thread and let him know to PM me on the forums here, I'll be happy to point in him the right direction. I suspect he can probably pick it up just by reading this thread and quickly update his script appropriately.
i already contacted the author and waiting for his reply.
Seems that this mail goes to Trash
keep you informed
Well, if he's unresponsive its possible to fix it. I just don't have the environment for it. If you like I can probably make the edits which I think will speed it up, but you'll have to run it and report back any errors.
Hi , strumpr, dodgers,
I'm also facing the problem of slowness of check_esx3. In fact the first purpose of this post was to give some clues on what was reducing the speed of scripts like check_esx3 (and especially check_esx3). Unfortunately I didn't have the time to apply strumpr's advice to check_esx3 yet.
So, in case of no answer of Kostyantynand, if you go for making the edit, there's no problem for me to give you a hand for the tests and error reports if any.
Yannick
Hi, guy's
Feel free, i waiting for your solution !!
Regards
I'll see what time I can spend on it this weekend. No promises, sometimes I do something besides tech ;).
It's not hard, just a bit tedious. Have to go and review all the property calls for each view and update the view call.
I too suffer from this slow execution time of check_esx3. I wish I knew how to write some perl. Anyhow it sounds promising that it might be possible to fix, looking forward to it!
I caught you other thread before this. I had some firefighting today (and conviently used my weekend to not work, sorry!).
I'll take a look at the script in the next few days time permitting and put up a first draft of some changes. I don't have the environment to test it (I'll see if I can build one), but it seems that enough of you would be willing to try it and report back errors that I can use to adjust for any typos or if I were to over-prune the properties and break the script .
Hehe, dont appologize for not working on weekends
That would be much appreciated. I would of course help with testing.
I haven't forgot about this, just been a hectic week. I'll try to get you guys an updated version of the script posted previously by this weekend so you can test it. Hopefully we can turn it around and see if it provides enough of a speed boost to solve any issues you are having.
Ok, I made a first pass on this. I was a little conservative, so there might be some additional optimizations to be made. I also have one question I have to explore on my side about update_view_data(), want to be sure its not fetching all properties.
Finally, I have noticed on a script I was running vs ESX 3.5u4 that there was some very, very slow performance when using the services object. I notice this script uses it as well and it might be a part of the issue.
But try this new version and see if it provides any performance increase. Bear in mind I couldn't run it since I don't have a Nagios setup, but just copy and paste any errors back to the forum here and I can tidy it up. It'll likely just be a typo or another property I missed when trying to optimize it.
Seem there's no change from previous script - running ESX 3.5.0 153865
First
time ./check_esx3-forum1.pl -H esxserver -f ./auth -l CPU -s usage
CHECK_ESX3-FORUM1.PL OK - cpu usage=68.00 MHz(0.32%) | cpu_usagemhz=68.00Mhz;; cpu_usage=0.32%;;
real 0m31.882s
user 0m7.988s
sys 0m0.200s
Second
time ./check_esx3-forum1.pl -H esxserver -f ./esx_auth -l CPU
CHECK_ESX3-FORUM1.PL OK - cpu usage=238.10 MHz(1.15%) | cpu_usagemhz=238.10Mhz;; cpu_usage=1.15%;;
real 0m11.624s
user 0m6.216s
sys 0m0.156s
Yeah, it may be the update_view calls. I'll take a quick look at my environment to validate. That would overwrite everything I did. The script author probably didn't need to make so many update view calls to be honest, some are immediately after the initial get view calls.