After a lot of work (mainly struggling with Windows since I'm a UNIX guy: errors with Exchange 2007 not starting hub transport because IPv6 was disabled, STAF refusing connections on W2K8 because of Windows Firewall, and other problem I'll post separately about regarding the DB restores that occur on the OlioDB, OlioWeb, DS2Web* and DS2DB systems where I had to edit the xml file cd to root dir) I finally was able to run a number of tests on 1, 2 and 3 tiles.
However I'm seeing some weird behavior with the DS2DB system.
First, the pre-built template does NOT contain a database... you have to go through the guide to create it (Page 121)... I lost at least a day diagnosing a non-compliant run that showed 0's for all DS2 stuff and it was because I had thought there was a database (there is for Olio) but there isn't, and the scripts just go along their way (even though the DS2DB restore specifically shows an error stating that the DS2 database does not exist)... it would have been nice to have that in the documentation unless I'm missing something (please do tell if I am).
Second, I've had some compliant and non-compliant results due to QoS numbers on the DS2DB. I started trying to debug with MySQL to see what could be the problem and it seems to be related to the CUST_HIST table within DS2. Sometimes it will take 2 minutes to do the reload (at the beginning of the test) of the DS2DB (running ConfigServer_DS2db.sh), other times it could take 20-30 minutes. Usually when the reloads are quick, the test results are compliant... when the reload takes a long time, I get non-compliant test scores due to the QoS metrics (slightly over 500).
I've tried re-cloning systems, and moving them to differerent storage but I can't find a pattern at this point.
Is anyone else having this issue?
Thanks.
pg 55, step 4. You need to create the database after setting up the template. The VM is too big to distribute with a prebuilt DB.
The restore will take about 20-30 minutes if you have a dirty DB from a full 3 hour run (the bulk of that time is in the CUST_HIST table). I have spent a LOT of time trying to speed this up without success (only solution seems to be using MySQL 5.1 with partitions, planned for a future release). It will take ~2 minutes if the DB is already clean.
What is your storage for the DS2 DB? It is fairly disk intensive.
What do you get if you run just a single web frontend manually?
ds2webdriver --target=<DS2webA> --n_threads=10 --db_size=20GB --think_time=0.085
Also check to make sure you haven't run out of disk space on the DB VM. This can happen if you run for a long time or do multiple runs without restoring.
pg 55, step 4. You need to create the database after setting up the template. The VM is too big to distribute with a prebuilt DB.
The restore will take about 20-30 minutes if you have a dirty DB from a full 3 hour run (the bulk of that time is in the CUST_HIST table). I have spent a LOT of time trying to speed this up without success (only solution seems to be using MySQL 5.1 with partitions, planned for a future release). It will take ~2 minutes if the DB is already clean.
What is your storage for the DS2 DB? It is fairly disk intensive.
What do you get if you run just a single web frontend manually?
ds2webdriver --target=<DS2webA> --n_threads=10 --db_size=20GB --think_time=0.085
Also check to make sure you haven't run out of disk space on the DB VM. This can happen if you run for a long time or do multiple runs without restoring.
Doh, another bonehead move on my part... totally missed step 4.
Understood, and yes, from a dirty DB that's what I do see... I had thought the clean process may in some way be related to the non-compliant runs where they miss the QoS.
The DB VM mount point /var/lib/mysql is getting close:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 8981488 1323400 7201844 16% /
udev 1960636 104 1960532 1% /dev
/dev/sdb1 36116556 31076004 3205932 91% /var/lib/mysql
/dev/sdc1 20635700 61956 19525508 1% /mnt
And I just use the standard scripts with the test which I believe does the restore everytime at the beginning of the test.
Back-end storage is NetApp over NFS using dedicated 10GbE on both the filer and ESX hosts. I think jumbo frames are turned on but need to verify that.
It was just seeming a little weird that it was sproadic in that one run will be compliant and the next run, not so much.
Thanks for the insight and help and I will try single web front end manually to see how it performs.
Brian
Your diskspace looks fine. The DB doesn't actually grow much if you restore after every run.
We haven't used NFS for VMmark 2 so it is entirely possible that it will require some additional tunings to achieve acceptable QoS.
Thanks again for the information. I think we have tuned it as much as possible but I'll go through any additional best practices to make sure we didn't miss anything that could be causing any QoS problems. We also are using Akorri BalancePoint to monitor and report and it doesn't look like there are any bottlenecks at this point. I'm trying to see if related to the additional tiles as well.