Hello VMmark team,
I am trying to benchmark a vSAN cluster. I started with 4 tiles. The run is being marked as non-compliant. There are quite a few exceptions reported from the weathervane application. I have two VMware clusters for the SUT and the client systems and an external SAN storage system with SSDs providing the shared iSCSI datastores for infrastructure operations. All systems are connected via two 10Gbps ethernet switches. The vSAN traffic has 2x dedicated 10Gbps ports on the SUT. There are 2x 10Gbps ports dedicated for the vMotion and iSCSI datastores. I do not see any dropped packets, re-transmits, etc., on the switches, so I am not sure if these exceptions are due to the network. Can you please advise what else could be potentially causing these issues? Thanks.
Warnings Messages::
p0 : WeathervaneAuction0 Exceptions : 5
p0 : WeathervaneElastic0 Exceptions : 1
p1 : WeathervaneAuction0 Exceptions : 2
p1 : WeathervaneElastic0 Exceptions : 231
p2 : WeathervaneAuction0 Exceptions : 5
p2 : WeathervaneElastic0 Exceptions : 446
rampdown : WeathervaneAuction0 Exceptions : 4
rampdown : WeathervaneElastic0 Exceptions : 377
p0 : WeathervaneAuction1 Exceptions : 3
p0 : WeathervaneElastic1 Exceptions : 1
p1 : WeathervaneAuction1 Exceptions : 1
p1 : WeathervaneElastic1 Exceptions : 1
p2 : WeathervaneAuction1 Exceptions : 4
p2 : WeathervaneElastic1 Exceptions : 2
rampdown : WeathervaneAuction1 Exceptions : 3
rampdown : WeathervaneElastic1 Exceptions : 1
p0 : WeathervaneAuction2 Exceptions : 5
p0 : WeathervaneElastic2 Exceptions : 149
p1 : WeathervaneAuction2 Exceptions : 6
p1 : WeathervaneElastic2 Exceptions : 373
p2 : WeathervaneAuction2 Exceptions : 5
p2 : WeathervaneElastic2 Exceptions : 348
rampdown : WeathervaneAuction2 Exceptions : 1
rampdown : WeathervaneElastic2 Exceptions : 228
p0 : WeathervaneAuction3 Exceptions : 4
p0 : WeathervaneElastic3 Exceptions : 108
p1 : WeathervaneAuction3 Exceptions : 5
p1 : WeathervaneElastic3 Exceptions : 184
p2 : WeathervaneAuction3 Exceptions : 5
p2 : WeathervaneElastic3 Exceptions : 312
rampdown : WeathervaneAuction3 Exceptions : 2
rampdown : WeathervaneElastic3 Exceptions : 218
Summary ::
Run_Is_NOT_Compliant
Turbo_Setting : 0
Number_of_Workloads_Missing : 0
Number_of_Compliance_Issues (identified by '*' or '+') : 6
Issues Found :
Tile0-weathervaneelastic-p0
Tile2-weathervaneelastic-p0
Tile2-weathervaneauction-p1
Tile2-weathervaneauction-p2
Tile3-weathervaneauction-p2
Tile3-weathervaneelastic-p2
Median_Phase : p2
I looked in the wrf files from weathervane, and I see the following sampling of exceptions:
19:26:28.029 [pool-3-thread-91] WARN c.v.w.w.common.core.Operation - Operation:run Execution Failed for GetNextBid for behavior UUID 41ad0ea8-7390-44d3-99f7-96b5ee58113d Failure Reason = com.vmware.weathervane.workloadDriver.common.exceptions.OperationFailedException: Incomplete response received when retrieving current bid for auction 30294
19:26:28.124 [pool-3-thread-91] WARN c.v.w.w.common.core.Operation - Operation:run restarting userId = 5033, operation = GetNextBid, behavior UUID 41ad0ea8-7390-44d3-99f7-96b5ee58113d Failure Reason = com.vmware.weathervane.workloadDriver.common.exceptions.OperationFailedException: Incomplete response received when retrieving current bid for auction 30294
..
19:28:07.232 [epollEventLoopGroup-3-42] WARN i.n.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.timeout.ReadTimeoutException: null
| 200| 3733.09| 0.021| 37484| 0| 0|GetNextBid:15313/0(60/0.000/0.000), GetUserProfile:354/0(2/0.014/0.000), AddImageForItem:69/0(5/0.037/0.000), AddItem:523/0(2/0.015/0.000), Login:359/0(3/0.015/0.000), GetPurchaseHistory:339/0(2/0.039/0.000), GetImageForItem:120/0(5/0.023/0.000), GetActiveAuctions:5448/0(2/0.011/0.000), UpdateUserProfile:185/0(2/0.017/0.000), PlaceBid:1153/0(2/0.010/0.000), GetBidHistory:164/0(2/0.014/0.000), JoinAuction:2656/0(3/0.042/0.000), HomePage:344/0(2/0.021/0.000), GetItemDetail:2800/0(2/0.025/0.000), Register:0/0(2/0.000/0.000), NoOperation:0/0(9999999/0.000/0.000), Logout:345/0(3/0.014/0.000), GetCurrentItem:2520/0(3/0.015/0.000), GetAuctionDetail:2786/0(2/0.032/0.000), GetAttendanceHistory:164/0(2/0.012/0.000), LeaveAuction:1842/0(2/0.011/0.000), | Sep 28,2019 19:28:23 EDT
| Time| TP| Avg RT| Ops| Ops| Ops|Per Operation: Operation:Total/FailedRT(RT-Limit/AvgRT/AvgFailingRT)| Timestamp
| (sec)| (ops/s)| (sec)| Total| Failed| Fail RT|
Hi,
Thank you for the response. I enabled esxtop collection and ran with 2 tiles to produce the exceptions. Looking at the logs through the visualizer tool, I didn't see any obvious CPU/Memory/Disk issues. I also looked at the weathervane log files. There are indeed quite a few Java exceptions in the logs, however it's not clear to me that they are caused by any infrastructure related issues.
Also, in the score file, the exceptions are marked with either * or +, but how do I use those to link back to the exceptions? i.e. the compliance issues seem to be some deviation on the expected numbers? How do I go about finding what those deviations are and how they are caused?
VMmark 3.1 TileScore : v1.2 12202018
Computing_results_for_test_with_tile_count: 0 ...
Turbo Mode Enabled
Tiles = 2 : Enabled Workloads : WV DS3Web WV_Elastic Standby (6)
Esxtop Power Mode Enabled
First Sample: 1570204440 Fri Oct 4 11:54:00 2019
Info: 2 : 1570204560 : 2 : 6
Calculating Turbo Timing & Scoring : Run_Is_NOT_Compliant
Run_start 1570204440 : Fri Oct 4 11:54:00 2019
Start_time 1570204560 : Fri Oct 4 11:56:00 2019
End_time 1570206180 : Fri Oct 4 12:23:00 2019
Run_end 1570206300 : Fri Oct 4 12:25:00 2019
Duration_in_minutes : 27.00
Steady_state_start 1570204860 : Fri Oct 4 12:01:00 2019
Steady_state_end 1570205760 : Fri Oct 4 12:16:00 2019
Phase_0_begin 1570204860 : Fri Oct 4 12:01:00 2019
Phase_1_begin 1570205160 : Fri Oct 4 12:06:00 2019
Phase_2_begin 1570205460 : Fri Oct 4 12:11:00 2019
TILE_0_Scores: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby
p0 3594.93 570.61 1000.60 748.80 534.00 1.00
p1 3583.22 575.16 1038.60 599.40 365.40 1.00
p2 3589.78 574.47 1043.60 771.40 564.80 1.00
TILE_0_Ratios: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby Geo.Mean
p0 1.00 1.00 1.36 1.50 1.54 1.00 1.26
p1 1.00 1.01 1.41 1.20 1.05 1.00 1.12
p2 1.00 1.00 1.42 1.54 1.63 1.00 1.29
TILE_0_QoS: WeathervaneAuction% WeathervaneElastic% DVDStoreA DVDStoreB DVDStoreC
p0 0.69 | 0.01 1.25 | 1.20* 685.47 803.00 895.39
p1 0.61 | 0.01 1.07 | 0.64 603.70 702.33 824.17
p2 0.51 | 0.00 0.92 | 0.50 583.07 714.58 805.06
TILE_1_Scores: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby
p0 3590.95 581.78 968.60 721.80 514.60 1.00
p1 3601.37 571.88 985.20 560.00 345.80 1.00
p2 3591.02 581.17 974.60 733.40 521.60 1.00
TILE_1_Ratios: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby Geo.Mean
p0 3594.93 570.61 1000.60 748.80 534.00 1.00
p1 3583.22 575.16 1038.60 599.40 365.40 1.00
p2 3589.78 574.47 1043.60 771.40 564.80 1.00
TILE_1_QoS: WeathervaneAuction% WeathervaneElastic% DVDStoreA DVDStoreB DVDStoreC
p0 0.94 | 0.37 1.77 | 1.76* 757.03 917.33 1047.67
p1 0.46 | 0.00 0.54 | 0.26 743.37 857.50 1021.33
p2 0.57 | 0.00 1.01 | 1.01* 755.17 887.25 1006.39
p0_score = 2.49
p1_score = 2.21
p2_score = 2.53
Infrastructure_Operations_Scores: vMotion SVMotion XVMotion Deploy
Completed_Ops_PerHour 3.50 2.00 2.00 1.00
Avg_Seconds_To_Complete 5.11 78.22 111.85 288.65
Failures 0.00 0.00 0.00 0.00
Ratio 0.13 0.11 0.11 0.12
Number_Of_Threads 1 1 1 1
EsxtopPower_Results:
p0 Avg_Watts Target
284.77 10.0.0.35
352.80 10.0.0.36
286.00 10.0.0.37
278.00 10.0.0.38
p1 Avg_Watts Target
283.40 10.0.0.35
356.33 10.0.0.36
286.00 10.0.0.37
278.00 10.0.0.38
p2 Avg_Watts Target
287.10 10.0.0.35
350.37 10.0.0.36
286.00 10.0.0.37
278.00 10.0.0.38
Warnings Messages::
p1 : WeathervaneAuction0 Exceptions : 1
p2 : WeathervaneAuction0 Exceptions : 1
rampdown : WeathervaneAuction0 Exceptions : 1
Summary ::
Run_Is_NOT_Compliant
Turbo_Setting : 1
Number_of_Workloads_Missing : 0
Number_of_Compliance_Issues (identified by '*' or '+') : 3
Issues Found :
Tile0-weathervaneelastic-p0
Tile1-weathervaneelastic-p0
Tile1-weathervaneelastic-p2
Median_Phase : p0
Unreviewed_VMmark3_EsxtopPower_Avg_Watts : 1201.57
Unreviewed_VMmark3_Applications_Score : 2.49
Unreviewed_VMmark3_Infrastructure_Score : 0.12
Unreviewed_VMmark3_Score : 2.02
Unreviewed_VMmark3_Power_Efficiency* : 1.6781
thanks,
Historically, these types of exceptions are a result of storage bottlenecks and/or under-provisioned clients. My suggestion would be to do a baseline run with just 1 tile while enabling esxtop collection. See the section "How to Enable and Analyze esxtop Performance Data" in the VMmark User's Guide for details. Afterwards, you can review the detail and continue to add tiles until you start seeing non-compliant results.
Beyond this you can also review the SAR data for weathervane. This is found within the "workloadfiles" directory of a VMmark results folder (ex: /root/VMmark3/results/Results_20190924155520-1tile-run1/workloadfiles). You'll unzip the corresponding tile's weathervane-outputN.zip where N is the tile number and you'll see a massive quantity of additional weathervane specific log files that you can review. Execute "man sar" to see the options for sar and viewing it's output.
Hi,
Thank you for the response. I enabled esxtop collection and ran with 2 tiles to produce the exceptions. Looking at the logs through the visualizer tool, I didn't see any obvious CPU/Memory/Disk issues. I also looked at the weathervane log files. There are indeed quite a few Java exceptions in the logs, however it's not clear to me that they are caused by any infrastructure related issues.
Also, in the score file, the exceptions are marked with either * or +, but how do I use those to link back to the exceptions? i.e. the compliance issues seem to be some deviation on the expected numbers? How do I go about finding what those deviations are and how they are caused?
VMmark 3.1 TileScore : v1.2 12202018
Computing_results_for_test_with_tile_count: 0 ...
Turbo Mode Enabled
Tiles = 2 : Enabled Workloads : WV DS3Web WV_Elastic Standby (6)
Esxtop Power Mode Enabled
First Sample: 1570204440 Fri Oct 4 11:54:00 2019
Info: 2 : 1570204560 : 2 : 6
Calculating Turbo Timing & Scoring : Run_Is_NOT_Compliant
Run_start 1570204440 : Fri Oct 4 11:54:00 2019
Start_time 1570204560 : Fri Oct 4 11:56:00 2019
End_time 1570206180 : Fri Oct 4 12:23:00 2019
Run_end 1570206300 : Fri Oct 4 12:25:00 2019
Duration_in_minutes : 27.00
Steady_state_start 1570204860 : Fri Oct 4 12:01:00 2019
Steady_state_end 1570205760 : Fri Oct 4 12:16:00 2019
Phase_0_begin 1570204860 : Fri Oct 4 12:01:00 2019
Phase_1_begin 1570205160 : Fri Oct 4 12:06:00 2019
Phase_2_begin 1570205460 : Fri Oct 4 12:11:00 2019
TILE_0_Scores: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby
p0 3594.93 570.61 1000.60 748.80 534.00 1.00
p1 3583.22 575.16 1038.60 599.40 365.40 1.00
p2 3589.78 574.47 1043.60 771.40 564.80 1.00
TILE_0_Ratios: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby Geo.Mean
p0 1.00 1.00 1.36 1.50 1.54 1.00 1.26
p1 1.00 1.01 1.41 1.20 1.05 1.00 1.12
p2 1.00 1.00 1.42 1.54 1.63 1.00 1.29
TILE_0_QoS: WeathervaneAuction% WeathervaneElastic% DVDStoreA DVDStoreB DVDStoreC
p0 0.69 | 0.01 1.25 | 1.20* 685.47 803.00 895.39
p1 0.61 | 0.01 1.07 | 0.64 603.70 702.33 824.17
p2 0.51 | 0.00 0.92 | 0.50 583.07 714.58 805.06
TILE_1_Scores: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby
p0 3590.95 581.78 968.60 721.80 514.60 1.00
p1 3601.37 571.88 985.20 560.00 345.80 1.00
p2 3591.02 581.17 974.60 733.40 521.60 1.00
TILE_1_Ratios: WeathervaneAuction WeathervaneElastic DVDStoreA DVDStoreB DVDStoreC Standby Geo.Mean
p0 3594.93 570.61 1000.60 748.80 534.00 1.00
p1 3583.22 575.16 1038.60 599.40 365.40 1.00
p2 3589.78 574.47 1043.60 771.40 564.80 1.00
TILE_1_QoS: WeathervaneAuction% WeathervaneElastic% DVDStoreA DVDStoreB DVDStoreC
p0 0.94 | 0.37 1.77 | 1.76* 757.03 917.33 1047.67
p1 0.46 | 0.00 0.54 | 0.26 743.37 857.50 1021.33
p2 0.57 | 0.00 1.01 | 1.01* 755.17 887.25 1006.39
p0_score = 2.49
p1_score = 2.21
p2_score = 2.53
Infrastructure_Operations_Scores: vMotion SVMotion XVMotion Deploy
Completed_Ops_PerHour 3.50 2.00 2.00 1.00
Avg_Seconds_To_Complete 5.11 78.22 111.85 288.65
Failures 0.00 0.00 0.00 0.00
Ratio 0.13 0.11 0.11 0.12
Number_Of_Threads 1 1 1 1
EsxtopPower_Results:
p0 Avg_Watts Target
284.77 10.0.0.35
352.80 10.0.0.36
286.00 10.0.0.37
278.00 10.0.0.38
p1 Avg_Watts Target
283.40 10.0.0.35
356.33 10.0.0.36
286.00 10.0.0.37
278.00 10.0.0.38
p2 Avg_Watts Target
287.10 10.0.0.35
350.37 10.0.0.36
286.00 10.0.0.37
278.00 10.0.0.38
Warnings Messages::
p1 : WeathervaneAuction0 Exceptions : 1
p2 : WeathervaneAuction0 Exceptions : 1
rampdown : WeathervaneAuction0 Exceptions : 1
Summary ::
Run_Is_NOT_Compliant
Turbo_Setting : 1
Number_of_Workloads_Missing : 0
Number_of_Compliance_Issues (identified by '*' or '+') : 3
Issues Found :
Tile0-weathervaneelastic-p0
Tile1-weathervaneelastic-p0
Tile1-weathervaneelastic-p2
Median_Phase : p0
Unreviewed_VMmark3_EsxtopPower_Avg_Watts : 1201.57
Unreviewed_VMmark3_Applications_Score : 2.49
Unreviewed_VMmark3_Infrastructure_Score : 0.12
Unreviewed_VMmark3_Score : 2.02
Unreviewed_VMmark3_Power_Efficiency* : 1.6781
thanks,