esxcfg-perf.pl (Compare performance metrics across multiple ESX(i) hosts)

Version 1

    Table of Contents

    • Author

    • Description

    • Category

    • Requirements

    • Configurations

    • Version Support

    • Configurations

    • Sample Execution

    • Notes

     

    Author

    William Lam

     

    Description

     

    This script allows a user to compare performance metric(s) across multiple ESX(i) host(s) which maybe useful when troubleshooting an issue with a particular hosts that maybe seeing a specific issue the other systems are not. Collecting performance metrics is not the easiest thing to do and if you don't have vCenter to help aggregate all the stats, you'll need to login to each host to collect the stats that you're most interested in. With this script, you're able to pull out specific metrics and compare against specific set of hosts.

     

    Category

    • Troubleshooting

     

    Requirements

     

    Version Support

    • Supports ESX and ESXi

     

    Usage

    [vi-admin@scofield perf]$ ./esxcfg-perf.pl
    Required command option 'hostlist' not specified.
    Required command option 'metriclist' not specified.
    
    Synopsis: ./esxcfg-perf.pl OPTIONS
    
    
    Command-specific options:
       --aggregate (default 'no')
          Only display aggregated statistics - N/A to all metrics [yes|no]
       --end_date
          End Date YYYY-MM-DD
       --hostlist (required)
          List of ESX(i) host to perform operations on
       --metriclist (required)
          List of ESX(i) host metrics to collect
       --start_date
          Start Date YYYY-MM-DD
    
    Common VI options:
       --config (variable VI_CONFIG)
          Location of the VI Perl configuration file
       --credstore (variable VI_CREDSTORE)
          Name of the credential store file defaults to <HOME>/.vmware/credstore/vicredentials.xml on Linux and <APPDATA>/VMware/credstore/vicredentials.xml on Windows
       --encoding (variable VI_ENCODING, default 'utf8')
          Encoding: utf8, cp936 (Simplified Chinese), iso-8859-1 (German), shiftjis (Japanese)
       --help
          Display usage information for the script
       --passthroughauth (variable VI_PASSTHROUGHAUTH)
          Attempt to use pass-through authentication
       --passthroughauthpackage (variable VI_PASSTHROUGHAUTHPACKAGE, default 'Negotiate')
          Pass-through authentication negotiation package
       --password (variable VI_PASSWORD)
          Password
       --portnumber (variable VI_PORTNUMBER)
          Port used to connect to server
       --protocol (variable VI_PROTOCOL, default 'https')
          Protocol used to connect to server
       --savesessionfile (variable VI_SAVESESSIONFILE)
          File to save session ID/cookie to utilize
       --server (variable VI_SERVER, default 'localhost')
          VI server to connect to. Required if url is not present
       --servicepath (variable VI_SERVICEPATH, default '/sdk/webService')
          Service path used to connect to server
       --sessionfile (variable VI_SESSIONFILE)
          File containing session ID/cookie to utilize
       --url (variable VI_URL)
          VI SDK URL to connect to. Required if server is not present
       --username (variable VI_USERNAME)
          Username
       --verbose (variable VI_VERBOSE)
          Display additional debugging information
       --version
          Display version information for the script
    

     

    Configurations

     

    1. Download and upload esxcfg-perf.pl.pl to your vMA 4.0 host

     

    2. Set the script have execution permission:

     

    [vi-admin@scofield perf]$ chmod +x esxcfg-perf.pl.pl
    

     

    3. Ensure all ESX(i) hosts that you would like to use in the script is being managed by VMware vMA 4.0

     

    4. To list current servers being managed by vMA, run the following command:

     

    [vi-admin@scofield perf]$ sudo vifp listservers
    himalaya.primp-industries.com   ESX
    reflex.primp-industries.com     vCenter
    esxi4-1.primp-industries.com    ESXi
    

     

    5. To add a new server into vMA management, run the following command:

     

    [vi-admin@scofield perf]$ sudo vifp addserver esxi4-2.primp-industries.com
    root@esxi4-2.primp-industries.com's password:
    

     

    You now should be able to list all the servers and any newly added hosts will show up and now you're ready to use the script

     

    The script requires has two basic requirements:

     

    • 1) A host file containing the hostnames of all the systems you would like to run the script against

    • 2) A metrics file containing the metrics that you would like to extract and compare against all hosts

     

    Here is an example of a hosts file:

     

    [vi-admin@scofield perf]$ cat hosts
    himalaya.primp-industries.com
    esxi4-1.primp-industries.com
    esxi4-2.primp-industries.com
    

     

    Here is an example of a metrics file:

     

    [vi-admin@scofield perf]$ cat metrics
    disk.deviceWriteLatency.average
    disk.deviceReadLatency.average
    

     

    Before we get started, I'll walk you through on how to figure out the performance metric names which will be in the metrics file

     

    Let's say we would like extract CPU average (%) and average (mhz) for set of hosts

     

    1. Login to the vSphere Client to one your hosts

     

    2. Go to the performance tab and click on the specific chart options (CPU,DISK,NETWORK,MEMORY,SYSTEM,MANAGEMENT)

     

     

     

    3. From here you'll generate unique performance identifier which will be used in the script in the format of X.Y.Z where

    X = Counter Type

    Y = Internal Name

    Z = Rollup Type

     

    e.g.

     

    CPU Average (%) = cpu.usage.average

    CPU Average (mhz) = cpu.usagemhz.average

     

    Note: The Interna Name is case sensitive, ensure you type exactly what you see from the vSphere Client and the rest should be all lower case

     

    You can also get the list of all the available performance metrics by taking a look at the vSphere API Performance Manager documentation:

     

    CPU = cpu

    DISK = disk

    NETWORK = net

    MEMORY = mem

    SYSTEM = sys

     

    Sample Execution

     

    Here is an example collecting CPU stats for a single host:

     

    hostlist:

    [vi-admin@scofield perf]$ cat hosts1
    himalaya.primp-industries.com
    

     

    metriclist:

    [vi-admin@scofield perf]$ cat metrics1
    cpu.usage.average
    cpu.usagemhz.average
    

     

    [vi-admin@scofield perf]$ ./esxcfg-perf.pl --hostlist hosts1 --metriclist metrics1
    Processing performance statistics ...
    
    Start Date: realtime
    End   Date: realtime
    
    HOSTNAME                   | OBJECT                                    | METRIC                    | VALUE          | UNITS
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 1                                         | cpu.usage.average         | 3.94           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 10                                        | cpu.usage.average         | 8.43           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 11                                        | cpu.usage.average         | 7.97           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 12                                        | cpu.usage.average         | 8.28           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 13                                        | cpu.usage.average         | 7.91           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 14                                        | cpu.usage.average         | 8.38           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 15                                        | cpu.usage.average         | 8              | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 2                                         | cpu.usage.average         | 4.46           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 3                                         | cpu.usage.average         | 3.88           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 4                                         | cpu.usage.average         | 4.87           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 5                                         | cpu.usage.average         | 3.96           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 6                                         | cpu.usage.average         | 4.15           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 7                                         | cpu.usage.average         | 4.17           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 8                                         | cpu.usage.average         | 8.54           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | 9                                         | cpu.usage.average         | 8.16           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | TOTAL                                     | cpu.usage.average         | 13.82          | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | TOTAL                                     | cpu.usagemhz.average      | 2359           | MHz
    -----------------------------------------------------------------------------------------------------------------------------------
    

     

    You'll notice here that the Object column will specify the specific instance of the counter and in this case this will be each of the CPU cores and TOTAL is a keyword which will specify an aggregation of the metric values. Not all performance metrics will have an aggregated value, but you'll be able to denote it by the keyword. Also note the statistics extracted are real time statistics.

     

    This next example will extract the same values but across multiple hosts and we'll be only interested in aggregating the statistics and we'll be extracting both the CPU (%/mhz) Average and Maximum

     

    hostlist:

    [vi-admin@scofield perf]$ cat hosts2
    himalaya.primp-industries.com
    esxi4-1.primp-industries.com
    esxi4-2.primp-industries.com
    

     

    metriclist:

    [vi-admin@scofield perf]$ cat metrics2
    cpu.usage.maximum
    cpu.usagemhz.maximum
    cpu.usage.average
    cpu.usagemhz.average
    

     

    [vi-admin@scofield perf]$ ./esxcfg-perf.pl --hostlist hosts2 --metriclist metrics2 --aggregate yes
    Processing performance statistics ...
    
    Start Date: realtime
    End   Date: realtime
    
    HOSTNAME                   | OBJECT                                    | METRIC                    | VALUE          | UNITS
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-1                    | TOTAL                                     | cpu.usage.average         | 3.29           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-2                    | TOTAL                                     | cpu.usage.average         | 2.95           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | TOTAL                                     | cpu.usage.average         | 13.42          | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-1                    | TOTAL                                     | cpu.usage.maximum         | 3.29           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-2                    | TOTAL                                     | cpu.usage.maximum         | 2.95           | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | TOTAL                                     | cpu.usage.maximum         | 16.65          | Percent
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-1                    | TOTAL                                     | cpu.usagemhz.average      | 141            | MHz
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-2                    | TOTAL                                     | cpu.usagemhz.average      | 126            | MHz
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | TOTAL                                     | cpu.usagemhz.average      | 2290           | MHz
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-1                    | TOTAL                                     | cpu.usagemhz.maximum      | 141            | MHz
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-2                    | TOTAL                                     | cpu.usagemhz.maximum      | 126            | MHz
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | TOTAL                                     | cpu.usagemhz.maximum      | 2841           | MHz
    -----------------------------------------------------------------------------------------------------------------------------------
    

     

    Notice in this run, we specified the flag --aggregate yes to add up the values versus looking an individual instances of the counter, by default this is set to 'no'

     

    The next example we'll specify a time interval to query for the statistics and only if the performance data is available, will data show up for the set of hosts. You can easily verify this by checking the time interval of interest on your vSphere Client on each of your hosts to ensure you'll get back data as a point of reference.

     

    hostlist:

    [vi-admin@scofield perf]$ cat hosts2
    himalaya.primp-industries.com
    esxi4-1.primp-industries.com
    esxi4-2.primp-industries.com
    

     

    metriclist:

    [vi-admin@scofield perf]$ cat metrics3
    disk.deviceWriteLatency.average
    disk.deviceReadLatency.average
    

     

    [vi-admin@scofield perf]$ ./esxcfg-perf.pl --hostlist hosts2 --metriclist metrics3 --start_date 2010-02-04 --end_date 2010-02-05
    Processing performance statistics ...
    
    Start Date: 2010-02-04T00:00:00
    End   Date: 2010-02-05T00:00:00
    
    HOSTNAME                   | OBJECT                                    | METRIC                    | VALUE          | UNITS
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T0:L0                       | disk.deviceReadLatency.ave| 11             | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T1:L0                       | disk.deviceReadLatency.ave| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.945445000000000000000000100000003D0000| disk.deviceReadLatency.ave| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.ATA_____WDC_WD2002FYPS2D01U1B0________| disk.deviceReadLatency.ave| 2              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T0:L0                       | disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T1:L0                       | disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.945445000000000000000000100000003D0000| disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.ATA_____WDC_WD2002FYPS2D01U1B0________| disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    

     

    Here you'll see that only results returned for himalaya as the other hosts were actually recently rebuilt and they don't have performance stats during this time period.

     

    Though if we ran the same script but remove the --start_date and --end_data you get:

     

    [vi-admin@scofield perf]$ ./esxcfg-perf.pl --hostlist hosts2 --metriclist metrics3
    Processing performance statistics ...
    
    Start Date: realtime
    End   Date: realtime
    
    HOSTNAME                   | OBJECT                                    | METRIC                    | VALUE          | UNITS
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-1                    | mpx.vmhba1:C0:T0:L0                       | disk.deviceReadLatency.ave| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-2                    | mpx.vmhba1:C0:T0:L0                       | disk.deviceReadLatency.ave| 3              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T0:L0                       | disk.deviceReadLatency.ave| 4              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T1:L0                       | disk.deviceReadLatency.ave| 1              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.945445000000000000000000100000003D0000| disk.deviceReadLatency.ave| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.ATA_____WDC_WD2002FYPS2D01U1B0________| disk.deviceReadLatency.ave| 5              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-1                    | mpx.vmhba1:C0:T0:L0                       | disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    esxi4-2                    | mpx.vmhba1:C0:T0:L0                       | disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T0:L0                       | disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | mpx.vmhba1:C0:T1:L0                       | disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.945445000000000000000000100000003D0000| disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    himalaya                   | t10.ATA_____WDC_WD2002FYPS2D01U1B0________| disk.deviceWriteLatency.av| 0              | Millisecond
    -----------------------------------------------------------------------------------------------------------------------------------
    

     

    Notes

     

    • When running this script against a time interval, ensure that you're matching the time and date on what your hosts are configured to and also ensure that during this interval there is data available. You can always double check by logging into the vSphere Client directly connecting to one of your hosts to check

     

    • By default aggregation is set to no

     

    • Start and End date are in the form of YYYY-MM-DD

     

    • This script is meant to be executed directly against ESX(i) host and not vCenter