VMware Cloud Community
cesprov
Enthusiast
Enthusiast
Jump to solution

Fixed, it is a 6.0x bug - WAS: PowerCLI, vSphere 6.0 bug? Missing "disk.used.latest" StatType for interval 7200

I currently have a ticket open for this with VMware support and they keep giving me the runaround on this so I am hoping to enlist the members of this community to determine whether this is a bug or not.

We were on a Windows install of vCenter Server 5.5 U2e with ESXi 5.5U2 hosts and this was working fine.  We upgraded our vCenter first to 6.0.0a, then 6.0.0b, then 6.0U1, where we are at now.  The vCenter versions are the relevant versions here, not the ESXi host installs.  This broke somewhere in this vCenter upgrade path and I'm not sure which version broke it.


There are four intervals (in seconds) of stat performance data that are collected and rolled over from table to table:  300, 1800, 7200, 86400.  I won't go into how these are rolled into each other or the statics logging levels as we believe the data within the tables is actually irrelevant to the bug at hand.  From within PowerCLI, you can see each of the StatTypes for each interval by executing the following commands:


Get-StatType -Interval 300 -Entity <name of VM> | sort

Get-StatType -Interval 1800 -Entity <name of VM> | sort

Get-StatType -Interval 7200 -Entity <name of VM> | sort

Get-StatType -Interval 86400 -Entity <name of VM>  | sort

where <name of VM> is of course the name of any VM within your vCenter.  This will output something like:

cpu.ready.summation

cpu.usage.average

cpu.usagemhz.average

disk.maxTotalLatency.latest

disk.provisioned.latest

disk.unshared.latest

disk.usage.average

disk.used.latest

...

Each one of these is a stat type for which data is collected.  If you compare the output of all 4 commands above, you should notice that the stat type "disk.used.latest" is not listed in the output of the 7200 command.  It's just gone.


You can then see the actual data for each one of these stat types by executing the following command:

Get-Stat -Entity <name of VM> -Stat <StatType> -IntervalSecs <Interval in seconds>

The issue, and what we believe to be a bug, is that "disk.used.latest" is now missing from 7200 as mentioned above.  If I execute the following commands:

Get-Stat -Entity <name of VM> -Stat disk.used.latest -IntervalSecs 300

Get-Stat -Entity <name of VM> -Stat disk.used.latest -IntervalSecs 1800

Get-Stat -Entity <name of VM> -Stat disk.used.latest -IntervalSecs 7200

Get-Stat -Entity <name of VM> -Stat disk.used.latest -IntervalSecs 86400

the 300, 1800, and 86400 intervals output the data for "disk.used.latest" just fine.  However, the 7200 interval kicks out the following error:

"Get-Stat : 6/8/2016 10:20:24 AM    Get-Stat        The metric counter "disk.used.latest" doesn't exist for entity"

The issue is that since this metric is missing in 7200, the stats for it fail to roll up into the 86400 interval.  Currently the newest data I have for disk.used.latest in 86400 is 7/11/2015 which is when we upgraded to 6.0.0a.  The oldest data for 86400 is 6/9/15.  This is because 7200 has failed to roll up this data into 86400 since we upgraded but yet the data is still being purged at one year old daily.  On 7/11/2016, all my data for "disk.used.latest" in 86400 will be gone.

VMware support keeps going off on a tangent related to how it rolls up data, completely glossing over the fact that the 7200 interval isn't collecting this disk.used.latest metric because it simply doesn't exist and therefore there is nothing to roll up into 86400.

So...  I could use some help proving me right or wrong on this.  I only have access to a 6.0U1 vCenter, where this is broke.  What I would like is for someone, or multiple people, to execute the below commands in a PowerCLI session on the following vCenter versions, 6.0 GA, 6.0.0a, 6.0.0b, 6.0U1, 6.0U1b, 6.0U2

Get-StatType -Interval 300 -Entity <name of VM> | sort

Get-StatType -Interval 1800 -Entity <name of VM> | sort

Get-StatType -Interval 7200 -Entity <name of VM> | sort

Get-StatType -Interval 86400 -Entity <name of VM>  | sort

And simply check to see if "disk.used.latest" is in the list of stat types for all intervals.  If I am right and this is a bug, it should be missing for the 7200 interval, depending on the vCenter version.  Checking this on each 6.0 vCenter version should tell us on what version it broke and if future versions have fixed it.  It would probably help if the vCenter was upgraded from a previous 5.x version, not sure if this issue will be present in new installs. And I'm not sure if VCSA would show this problem or not.


Report back here your findings, if you too are missing the "disk.used.latest" stat type in any of the intervals or if you have it present in all intervals, and what version/build number of vCenter you checked against.


Thanks in advance!

Reply
0 Kudos
1 Solution

Accepted Solutions
cesprov
Enthusiast
Enthusiast
Jump to solution

I have now pinpointed the exact bug in vCenter that causes this issue and it appears to affect every install of 6.0b through 6.0U2 (I didn't test 6.0 RTM or 6.0a).  On v5.5U2 vCenter databases, the VPX_STAT_COUNTER table has the following indexes:

IX_VPX_SC_ENTITY_ID

VPX_STAT_COUNTER_M1

VPX_STAT_COUNTER_M2

VPX_STAT_COUNTER_P1

VPX_STAT_COUNTER_U1

After upgrading to 6.0b through 6.0U2, or even on a brand-new 6.0U2 install (i.e. no upgrade), the VPX_STAT_COUNTER table has only the following two indexes:

PK_VPX_STAT_COUNTER

VPX_STAT_COUNTER_P1

The problem is that the stats_rollup2_proc stored procedure (the SQL Agent task "Past Week stats rollup" calls l_stats_rollup2_proc which in turn calls stats_rollup2_proc) contains a reference to one of these missing indexes:

SET @sqlCommand_rt3 = 'INSERT INTO ' + ... + ' VPX_STAT_COUNTER SC  WITH(INDEX(VPX_STAT_COUNTER_M1) ...

This causes the execution of the SELECT statement within that INSERT to return:


Msg 308, Level 16, State 1, Line 1

Index 'VPX_STAT_COUNTER_M1' on table 'VPX_STAT_COUNTER' (specified in the FROM clause) does not exist.

The effect of this is that the SELECT statement within @sqlCommand_rt3's INSERT statement, which is supposed to read VPX_STAT_DEF.ROLLUP_TYPE = 3 counters from HIST2 tables, returns 0 rows due to the above error, meaning VPX_STAT_DEF.ROLLUP_TYPE = 3" counters from HIST2 tables are never rolled up into HIST3 tables because @sqlCommand_rt3 isn't SELECTing any data.  To make matters worse, the error handling in this stored procedure is not catching this error, or at least not reporting it up to the SQL Agent so history shows this task as running successfully when it clearly is not.


The answer that VMware support seems to be implying is that the VPX_STAT_COUNTER indexes are erroneously missing from 6.0 installs and should be there.  Either that or those indexes were removed on purpose and someone forgot to update the stored procedures to use one of the remaining indexes (my testing indicates that changing the stored procedure to use VPX_STAT_COUNTER_P1 instead may also fix the issue).  Either way, based on my testing, this affects every 6.0 vCenter install, upgrade or new install doesn't matter.  Most people probably don't notice the issue as most .latest counters don't seem to be exposed through the GUIs.  Unless you are specifically looking for .latest counters in interval 7200 using PowerCLI, or have a third party program that looks for them as was my case, you probably don't even notice you're missing this performance data from the 7200 interval and beyond.


The fix is simple: recreate the missing indexes on the VPX_STAT_COUNTER table and the .latest counters then roll up to 7200/HIST3 properly again the next time the task executes.  By default the Weekly and Monthly tasks run at 2:15a so you need to wait until day 2 to see the .latest counters appear for the 86400/HIST4 interval.

View solution in original post

Reply
0 Kudos
7 Replies
LucD
Leadership
Leadership
Jump to solution

Can you check what this returns for a specific VM ?

$vmName = 'MyVM'

Get-VM -Name $vmName | %{

    Get-Stat -Entity $_ -Stat disk.used.latest -IntervalSecs 300,1800,7200,86400 |

    Group-Object -Property Instance |

    Select @{N='VM';E={$_.Group[0].Entity.Name}},

        @{N='Instance';E={$_.Group[0].Instance}},

        @{N='Oldest';E={$_.Group | Sort-Object -Property Timestamp -Descending | select -First 1 -ExpandProperty Timestamp}}

}


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

Reply
0 Kudos
cesprov
Enthusiast
Enthusiast
Jump to solution

The output for that script results in:

Get-Stat : 6/16/2016 11:35:59 AM    Get-Stat        The metric counter "disk.used.latest" doesn't exist for entity

"MyVM".

At line:2 char:5

+     Get-Stat -Entity $_ -Stat disk.used.latest -IntervalSecs 300,1800 ...

+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    + CategoryInfo          : ResourceUnavailable: (disk.used.latest:String) [Get-Stat], VimException

    + FullyQualifiedErrorId : Client20_RuntimeDataServiceImpl_CheckUserMetrics_MetricDoesntExist,VMware.VimAutomation.

   ViCore.Cmdlets.Commands.GetViStats

VM      Instance  Oldest

--      --------  ------

MyVM           6/16/2016 11:30:00 AM

MyVM DELTAFILE 6/16/2016 11:30:00 AM

MyVM DISKFILE  6/16/2016 11:30:00 AM

MyVM OTHERFILE 6/16/2016 11:30:00 AM

MyVM SWAPFILE  6/16/2016 11:30:00 AM

Since "disk.used.latest" doesn't show as a stat type for the 7200 interval, your script resulted in the same error I see when running:

Get-Stat -Entity MyVM -Stat disk.used.latest -IntervalSecs 7200

Reply
0 Kudos
LucD
Leadership
Leadership
Jump to solution

Can you see values for that interval under the Performance tab ?


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

Reply
0 Kudos
cesprov
Enthusiast
Enthusiast
Jump to solution

I don't know if "disk.used.latest" is or ever was exposed through Performance in either the C# or web clients.  When you look at the stat types in PowerCLI, you can see that each stat type follows the format:


category.internalname.rollup


So "disk used.latest" can be read as Category = Disk, Internal Name = Used, Rollup = Latest


When you look at Performance for Disk in either client, there is only one item that shows up as Rollup = Latest and it is Highest Latency.  When viewed through PowerCLI, you can see that High Latency shows as "disk.maxTotalLatency.latest".  There are other ".latest" Rollup stat types that don't show up in Performance also, such as "disk.provisioned.latest" and "disk.unshared.latest", presumably because theya re not exposed in the GUI.


And now as I am documenting this, I found the problem is much larger than just the "disk.used.latest" stat type being missing, my 7200 interval is missing all the ".latest" stat types.


When I set all 4 intervals to stat level 3 (please note that I do not normally run stat collection at level 3, I just raised all to 3 for this test), these are the exposed stat types present for Category = Disk for each interval :


300:

disk.busResets.summation

disk.commands.summation

disk.commandsAborted.summation

disk.commandsAveraged.average

disk.maxTotalLatency.latest

disk.numberRead.summation

disk.numberReadAveraged.average

disk.numberWrite.summation

disk.numberWriteAveraged.average

disk.provisioned.latest

disk.read.average

disk.scsiReservationConflicts.summation

disk.unshared.latest

disk.usage.average

disk.used.latest

disk.write.average

1800:

disk.maxTotalLatency.latest

disk.provisioned.latest

disk.read.average

disk.scsiReservationConflicts.summation

disk.unshared.latest

disk.usage.average

disk.used.latest

disk.write.average

7200:

disk.read.average

disk.usage.average

disk.write.average

86400:

disk.maxTotalLatency.latest

disk.provisioned.latest

disk.unshared.latest

disk.usage.average

disk.used.latest

Note that the 7200 interval shows no ".latest" stat types but yet 86400 does.  Soooo...is the bug here that 7200 is missing all the ".latest" or is it that 86400 includes the ".latest" when it shouldn't?  I can tell you for a fact that before upgrading to 6.0, my 5.5U2 vCenter collected data for these ".latest" stat types as I have data in 86400 that rolled up from 7200 that is slowly being purged.  Was a change made in 6.0 to no longer collect these ".latest" stat types for 7200 and 86400 but they forgot to remove the counters in 86400?  Doubtful.  Still think the bug is that 7200 is missing these stat types.

Circling back to your question, "Can you see values for that interval under the Performance tab?", I can't see "disk.used.latest" in either client because "disk.used.latest" is not exposed to either client.  However, "disk.maxTotalLatency.latest" is exposed in the GUI and shows up as Highest Latency.  When I view Highest Latency through the clients' GUI, it graphs data for 300, 1800 and 86400 but 7200 shows nothing.  86400 chops off on 07/11/2015, same day "disk.used.latest" does, which happens to be when we upgraded vCenter to 6.0.



MissingLatest.jpg

Reply
0 Kudos
cesprov
Enthusiast
Enthusiast
Jump to solution

So this issue was brought to us by a third party company that uses the performance data to analyze overall usage of VMware environments including SANs, networking, etc.  They are now stating that they are seeing this problem with our customers and appears to be related to vCenter 5.5 upgrades to 6.0.  I am 99.9999% confident this is a bug at this point.  I have a case opened with VMware but getting them to actually test the upgrade scenarios needed to determine how and when this occurs is like pulling teeth.

Reply
0 Kudos
cesprov
Enthusiast
Enthusiast
Jump to solution

Updating this in case anyone else finds it as I now know this is a much wider issue than just us.  The vendor of the third party monitoring software has told us that they are now seeing this with other clients' 6.0 vCenters so it's not unique to our vCenter install.  I can also now reproduce the problem by restoring our pre-upgrade 5.5U2 database (where all .latest counters are present in all 4 intervals) to a new vCenter install and then upgrading it to 6.0b.  After the upgrade, I am missing the .latest counters from both the 7200 and 86400 intervals, which is worse than in my production system where I am only missing the .latest counters from the 7200 interval.  So I think it's more than obvious that this is some sort of bug in the 6.0b install.  Not sure if the same issue will occur upgrading from 5.5U2 to any other 6.0 version yet as I haven't tested anything other than 6.0b yet.  I am trying to get this escalated within VMware support to determine the actual cause and fix instead of reinstalling a new vCenter as support advised.  Like I am really going to nuke my entire vCenter because 4 performance counters are missing /rolleyes.

Reply
0 Kudos
cesprov
Enthusiast
Enthusiast
Jump to solution

I have now pinpointed the exact bug in vCenter that causes this issue and it appears to affect every install of 6.0b through 6.0U2 (I didn't test 6.0 RTM or 6.0a).  On v5.5U2 vCenter databases, the VPX_STAT_COUNTER table has the following indexes:

IX_VPX_SC_ENTITY_ID

VPX_STAT_COUNTER_M1

VPX_STAT_COUNTER_M2

VPX_STAT_COUNTER_P1

VPX_STAT_COUNTER_U1

After upgrading to 6.0b through 6.0U2, or even on a brand-new 6.0U2 install (i.e. no upgrade), the VPX_STAT_COUNTER table has only the following two indexes:

PK_VPX_STAT_COUNTER

VPX_STAT_COUNTER_P1

The problem is that the stats_rollup2_proc stored procedure (the SQL Agent task "Past Week stats rollup" calls l_stats_rollup2_proc which in turn calls stats_rollup2_proc) contains a reference to one of these missing indexes:

SET @sqlCommand_rt3 = 'INSERT INTO ' + ... + ' VPX_STAT_COUNTER SC  WITH(INDEX(VPX_STAT_COUNTER_M1) ...

This causes the execution of the SELECT statement within that INSERT to return:


Msg 308, Level 16, State 1, Line 1

Index 'VPX_STAT_COUNTER_M1' on table 'VPX_STAT_COUNTER' (specified in the FROM clause) does not exist.

The effect of this is that the SELECT statement within @sqlCommand_rt3's INSERT statement, which is supposed to read VPX_STAT_DEF.ROLLUP_TYPE = 3 counters from HIST2 tables, returns 0 rows due to the above error, meaning VPX_STAT_DEF.ROLLUP_TYPE = 3" counters from HIST2 tables are never rolled up into HIST3 tables because @sqlCommand_rt3 isn't SELECTing any data.  To make matters worse, the error handling in this stored procedure is not catching this error, or at least not reporting it up to the SQL Agent so history shows this task as running successfully when it clearly is not.


The answer that VMware support seems to be implying is that the VPX_STAT_COUNTER indexes are erroneously missing from 6.0 installs and should be there.  Either that or those indexes were removed on purpose and someone forgot to update the stored procedures to use one of the remaining indexes (my testing indicates that changing the stored procedure to use VPX_STAT_COUNTER_P1 instead may also fix the issue).  Either way, based on my testing, this affects every 6.0 vCenter install, upgrade or new install doesn't matter.  Most people probably don't notice the issue as most .latest counters don't seem to be exposed through the GUIs.  Unless you are specifically looking for .latest counters in interval 7200 using PowerCLI, or have a third party program that looks for them as was my case, you probably don't even notice you're missing this performance data from the 7200 interval and beyond.


The fix is simple: recreate the missing indexes on the VPX_STAT_COUNTER table and the .latest counters then roll up to 7200/HIST3 properly again the next time the task executes.  By default the Weekly and Monthly tasks run at 2:15a so you need to wait until day 2 to see the .latest counters appear for the 86400/HIST4 interval.

Reply
0 Kudos