Foyman1973
Enthusiast
Enthusiast

Whats the maximum duration of a VMmark test?

I was hoping that I could not only use VMmark for a typical benchmark across configurations for testing but for a long term load test as well.  Currently I have 4 tiles that will run my cluster at 75-80% for the normal 3 hour run but when I attempt to run it for 24 hours or longer, it only loads the cluster up to about 30%.  Is it possible to get a test run to go for more than 3 hours and maintain normal processing load?

 

Labels (1)
0 Kudos
16 Replies
dmorse
VMware Employee
VMware Employee

Can you confirm that you're only changing the RunTimeSeconds value from 10800 to 86400 (or some other larger value), and that alone lowers the cluster utilization from 75-80% to 30%?

You should be able to increase the run time and the SUT utilization should be the same.

0 Kudos
Foyman1973
Enthusiast
Enthusiast

Correct, I made a copy of my 4 Tile properties tile and just modified the runtime.  Screen cap of the  runs today, a preliminary "turbo" to test the file and then the second attempt set for 24 hours.  Looking at the individual VMs, there is a lot of CPU activity on the DS3DB boxes but not much on the rest.

Foyman1973_1-1625692333416.png

 

Example of Tile running  for the same period, everything runs in turbo and mostly just the DS3DB during the long attempt. (hopefully its readable on publish)

Foyman1973_2-1625692489832.png

 

0 Kudos
dmorse
VMware Employee
VMware Employee

Interesting.  Can I ask why you'd want to run one VMmark run for 24 hours?  This is not something we designed the benchmark to do.

One option you could try is to run the standard 3-hour test in a loop?  Would that satisfy what you're trying to accomplish?

0 Kudos
Foyman1973
Enthusiast
Enthusiast

I do realize that I am going outside the design of the platform.  One of the things I have noticed is that if there is a bug or something introduced via a driver or firmware update that it does not necessarily show up in low grade or short term testing.  We've had a few occasions over the years where short term testing was fine and then when we move a new configuration into production it develops an issue or highlights a problem when placed under load for 2-3 weeks.  So the idea here is that since this tool has a great workload simulation, it would also make a great long-term burn-in test for new ESXi, driver and firmware combinations.  Hopefully catching a configuration issue or software issue before it makes it into our heavily loaded clusters.

Having said that, if looping 3 hours tests is the solution here I am willing to investigate that option.  It wouldn't be as continuous of a load test as I was planning but could be enough.

0 Kudos
dmorse
VMware Employee
VMware Employee

To make VMmark3 run in a continuous loop, you could edit the /root/VMmark3/tools/VMmark3-STAX.sh script, so instead of

source /usr/local/staf/STAFEnv.sh
java -jar ~/VMmark3/tools/VMmarkWindow.jar
java -Xms512m -Xmx1024m -jar /usr/local/staf/services/stax/STAXMon.jar

change it to:

source /usr/local/staf/STAFEnv.sh
java -jar ~/VMmark3/tools/VMmarkWindow.jar
while :
do

java -Xms512m -Xmx1024m -jar /usr/local/staf/services/stax/STAXMon.jar

done

I attached it as well (rename the embedded .txt file to .sh).  Note: I have not tested this myself, but it should work.

0 Kudos
Foyman1973
Enthusiast
Enthusiast

Seems pretty straight forward.  If I read that correctly it is an endless loop so I would have to use the 'canceltest.sh' script to end the test?

0 Kudos
dmorse
VMware Employee
VMware Employee

Yes, just control-C to terminate the benchmark script itself then run the canceltest script to clean up.

0 Kudos
Foyman1973
Enthusiast
Enthusiast

I discovered 2 things about the loop, 1 It does not work from the UI Job submission.  2, when launched from CLI it ran the test 2 times then exited normally.

Something else I noticed from previous attempts at just running extremely long duration single tests is that it basically filled up the drives for these 4 nodes, forcing me to redeploy all 4 in each tile.  Maybe they just need bigger disks to handle long runs? 

DS3WebA0
DS3WebB0
AuctionNoSQL0
ElasticLB0

0 Kudos
dmorse
VMware Employee
VMware Employee

Yes, I did learn from talking to a coworker that disk space is definitely a problem if you try to run these workloads for longer than the designed 3 hour duration.

I'm not sure why the loop had issues.  What if you try this from the prime client shell:

while true; do /root/VMmark3/tools/VMmark3-STAX.sh; sleep 60; done 

 

0 Kudos
Foyman1973
Enthusiast
Enthusiast

Looks like all that does is open up 2 monitor windows that don't show any running jobs.  I looked at the results folder and it does not have any new or updated results folders either.  It did not exit the loop so I assume it is still running that command every 60 seconds, but there does not appear to be any visible result after the first 2 'STAX 3 Job Monitor' windows were opened

0 Kudos
Foyman1973
Enthusiast
Enthusiast

Interesting, additionally, I left the command running in the terminal and forgot about it until now.  I did a CTRL+C and closed the terminal window then closed the job monitor windows, every time I close 1 another pops up so apparently there are probably a lot of these "queued" up to open.  Only 2 will open at a time so that might be the answer to why only 2 runs executed before?

0 Kudos
dmorse
VMware Employee
VMware Employee

My apologies, please try this instead:

while true; do /root/VMmark3/tools/VMmark3-STAX-console.sh; sleep 60; done 

 

Foyman1973
Enthusiast
Enthusiast

Sorry for the delay in replying.  I was able to get it to loop using the console script.  Unfortunately that only worked for about 6 runs before the ElasticLB on Tile3 filled it's disk and started failing tests over the weekend.  I appreciate the help but it doesn't seem like this platform is going to work for actual load and burn-in testing since it cannot run for very long without breaking something.  Back to the drawing board I guess.

If there is a feature request option, I would like to put this use case on the list though.  I still think it could be very useful for long-term testing as well as the current short-term benchmark, if the storage issue could be resolved.

0 Kudos
Foyman1973
Enthusiast
Enthusiast

A little update on this research project.  I found that I can loop the turbo tests so long as I have at least a 15 minute pause between tests.  So using your example I just modified the sleep cycle:

while true; do /root/VMmark3/tools/VMmark3-STAX-console.sh; sleep 900; done 

It does not achieve a very high duty cycle overall since the sleep combined with the normal test setup period has more "idle" time than load time but it at least keeps putting workload on the cluster with vMotions, svMotions etc. so definitely better than nothing.

I'm wondering if I keep increasing the sleep on a normal 3 hour test if I can find a enough pause for whatever background post-test process to finish up and stop filling up the ElasticLB nodes.

If I find that good value I will try to update this thread if it doesn't get locked out by the moderators for lack of activity.

Foyman1973
Enthusiast
Enthusiast

Another quick update.  I found that so far, after 7 days, the three hour test seems to be in a stable loop with a 45 minute pause in between normal 3 hour tests.

while true; do /root/VMmark3/tools/VMmark3-STAX-console.sh; sleep 2700; done 
dmorse
VMware Employee
VMware Employee

@Foyman1973That's great to hear, thanks for sharing!

0 Kudos