Hi there,
We are using the VI Perl Toolkit to run scripts for monitoring and graphing our ESX as well as our ESXi hosts, some directly and others via Virtual Center. We are doing this on a Centos 5.3 host with toolkit 1.6. For some reason the toolkit is however consuming quite a lot of CPU, which means we cannot run the scripts on as many hosts as we would like, even though the monitor host is a dual quad core machine. I am not much of a developer myself, but I have ran Dprof and dprofpp on the script we're using, check_esx3.pl by op5, and also the sample script provided by vmware, datacenterlisting.pl, to find anything similar. Both of them seem to be making hundreds of thousands of calls to libXML, which is what's consuming all that CPU time. I have also tried doing this on the new vMA platform, which is running toolkit 4.0 with the same result.
So what I would like to know is; how can we make our scripts execute faster when the toolkit itself is so slow?
Output from dprofpp running datacenterlisting.pl:
dprofpp tmon.out
Total Elapsed Time = 12.25789 Seconds
User+System Time = 6.547893 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
30.0 1.969 8.716 25888 0.0001 0.0003 ComplexType::deserialize
28.0 1.838 5.688 148014 0.0000 0.0000 XML::LibXML::Element::getChildrenByTagName
27.9 1.828 1.828 111179 0.0000 0.0000 XML::LibXML::Node::nodeName
19.4 1.272 1.272 148014 0.0000 0.0000 XML::LibXML::Node::_childNodes
7.70 0.504 1.776 148014 0.0000 0.0000 XML::LibXML::Node::childNodes
6.23 0.408 0.408 121601 0.0000 0.0000 XML::LibXML::Node::textContent
5.99 0.392 0.392 112826 0.0000 0.0000 XML::LibXML::Node::DESTROY
4.87 0.319 1.041 2 0.1595 0.5205 Vim::login
4.29 0.281 0.281 69100 0.0000 0.0000 XML::LibXML::Element::getAttributeNodeNS
3.34 0.219 0.550 2625 0.0001 0.0002 Class::MethodMaker::Engine::create_methods
2.52 0.165 0.202 2625 0.0001 0.0001 Class::MethodMaker::scalar::scal0000
2.28 0.149 0.149 1161 0.0001 0.0001 Crypt::SSLeay::Conn::read
1.99 0.130 0.130 10 0.0130 0.0130 XML::LibXML::_parse_string
1.70 0.111 0.137 24388 0.0000 0.0000 DynamicData::get_property_list
0.92 0.060 0.118 22 0.0027 0.0054 main::BEGIN
Output from running dprofpp on check_esx3.pl:
dprofpp tmon.out
Total Elapsed Time = 6.755634 Seconds
User+System Time = 2.915634 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
28.6 0.834 1.881 33053 0.0000 0.0001 XML::LibXML::Element::getChildrenByTagName
21.8 0.636 0.636 247000 0.0000 0.0000 XML::LibXML::Node::nodeName
15.4 0.449 2.543 5611 0.0001 0.0005 ComplexType::deserialize
9.23 0.269 1.086 2 0.1347 0.5428 Vim::login
8.40 0.245 0.575 2625 0.0001 0.0002 Class::MethodMaker::Engine::create_methods
6.28 0.183 0.183 33053 0.0000 0.0000 XML::LibXML::Node::_childNodes
5.38 0.157 0.206 2625 0.0001 0.0001 Class::MethodMaker::scalar::scal0000
4.63 0.135 0.135 249895 0.0000 0.0000 XML::LibXML::Node::DESTROY
4.01 0.117 0.300 33053 0.0000 0.0000 XML::LibXML::Node::childNodes
2.85 0.083 0.083 14278 0.0000 0.0000 XML::LibXML::Element::getAttributeNodeNS
2.33 0.068 0.115 863 0.0001 0.0001 Net::SSL::read
2.16 0.063 0.098 2625 0.0000 0.0000 Class::MethodMaker::Engine::install_methods
1.92 0.056 0.056 27375 0.0000 0.0000 XML::LibXML::Node::textContent
1.85 0.054 0.062 5641 0.0000 0.0000 DynamicData::get_property_list
1.68 0.049 0.049 2625 0.0000 0.0000 Class::MethodMaker::Engine::check_opts
Any hints/clues much appreciated.
I probably would have had some time today to work on the script everyone is having issues with, but unfortunately I had a near meltdown of my VMware environment due to some SAN misconfiguration.
I'll swing back to it. You will see a lot of XML calls since the scripts are parsing soap (XML). Not much you can do about that. The real core of the issue is using a trick to reduce the size of the XML data returned by only asking for a subset of properties (just those you need). That check_esx.pl script essentially gets all properties of objects so it will be thick, and will only get slower the larger the environment.
Can you explain it a bit more to me? I don't have the environment needed to really see the problem, is the script taking too long to complete in a timely polling interval with your monitoring application?
Thanks for your answer. Indeed the script is taking too long to execute, and while it is running it is using 100% cpu. It doesn't make sense to me that a script that only gathers a few values uses so much resources, but it is understandable if the feature to get precise properties was undocumented. The script only requres a few perl modules and an esx or esxi to connect to to make it work. Help would be much appreciated anyhow, and I would gladly assist with testing. Cheers. ![]()
