vmark/fileserver workload

zghauri · ‎08-12-2007

I can't seem to run the fileserver workload with any other workload without it failing on startup for a single tile. I'm running one tile and all other workloads work fine for any duration of execution. However, when you add fileserver into the mix, it fails on startup leaving the tbench_srv processes running on the client. Any help would be greatly appreciated...

kimono · ‎08-12-2007

any errors come back from the Stax Monitor stuff?

/kimono/

zghauri · ‎08-12-2007

Just the following:

20070812-20:36:41 Info Info: Process FileServer on fileserver0 = shell /home/f

ileserver/dbench/src/dbench -c /home/fileserver/dbench/

src/client_plain.txt -p 1066 -l 1000 45 client0

20070812-20:36:44 Info Process: Tile 0: FileServer failed to start/complete. R

eturned: RC = 1, STAFResult = None

However, I can run it independently with no issues.

kimono · ‎08-12-2007

Might need to wait for the experts. I've not seen that error, not in troubleshooting info either. i'd maybe go and check out the fundamentals, networking and host files between your client and VMs, and double check your vmmark.config and staf.cfg file...

/kimono/

bherndon · ‎08-12-2007

Since you can run the fileserver as a standalone workload via the harness, I'll have to assume that the network configurations and harness setup are probably OK. I would try to add workloads in addition to the fileserver until I find one that causes the error. Try adding them in the following order (one at a time): Standby, java, mail, database, and finally web. (You could likely combine the standby and java ) Try a short, ~15 minute, run after adding in each workload. I have a few other questions:

\- What type of system are you running on? Is it maxed out on CPU with a full tile?

\- Are you running on a private network to the client? If not, SPECweb generates a fair bit of traffic and might be choking the network before fileserver starts.

\- Have you tried tuning the workload delay parameters to start fileserver up first?

zghauri · ‎08-12-2007

I'll try the suggestions. Actually, I have been doing much of that but in "larger quantities". I know the webserver and fileserver "don't like each other" and I know that everything excluding webserver doesn't make fileserver happy either. However, I'll try and be more methodical in the testing as you suggested and report back.

As for your questions:

-I'm running all the Tile 1 workload VMs on a Dell 2950 and nothing else so resources should not be an issue. The client is all alone on a 2950 as well.

-The network between the two is private gigabit.

-I have not tried tuning the workload so fileserver get's a "headstart" but will do so.

Thanks

kimono · ‎08-12-2007

Makes me think something is duplicated in the staf.cfg - perhaps Machinenickname is incorrect?

/kimono/

bherndon · ‎08-12-2007

Two more questions:

\- How much memory in on the ESX box?

\- Are you running the client natively or in a VM?

kimono · ‎08-13-2007

Yes I would be interested to know the resourcing specification on the host. Ironically, I got the same error today on webserver0. I know the whole tile is configured 100% perfect, as it's been used successfully to VMMARK new hosts.

Out of interest I ran a single tile on an old DL380G4 with not enough RAM, and I get this error. So I think it's a response time / timeout thing, as that VM is so sluggish it takes 10 minutes for the logon box to appear!

/kimono/

zghauri · ‎08-13-2007

Checked all the STAF.cfg(s) and all looks well. No duplications for MACHINENICKNAME.

zghauri · ‎08-13-2007

32GBs on both workload ESX host and client box. And the "alone" comment from my previous post refers to running the client on bare-metal with no other services offered or running on the box. You're not allowed to run clients inside a VM for a compliant run right?

I'd be surprised if there are any physical resource constraints causing this issue. But I do tend to get surprised more often than I'd like...;-)

zghauri · ‎08-13-2007

Problem solved. It was an order of execution issue. Once I moved the fileserver delay parameter to allow it to start first and reordered the other workloads appropriately, everything works fine!!! Now the question is, does changing the order of execution "invalidate" the results. I would not think so but I'd like to be sure. Thanks for the suggestions.

psmith2006 · ‎08-13-2007

Make sure that you have set the registry parameters on the client

as indicated on page 55 of the benchmarking guide. This will make

sure that the workload processes can get the port numbers they need.

Changing the /DELAYTIME values is OK.

kimono · ‎08-13-2007

What can cause the RC=1 error? I've tried the workaround as suggested, for example the following causes the error to occur on the fileserver:

MailServer/DELAYTIME="250"

JavaServer/DELAYTIME="540"

Standby/DELAYTIME="170"

WebServer/DELAYTIME="5"

Database/DELAYTIME="360"

FileServer/DELAYTIME="180"

and the following causes it on the webserver

MailServer/DELAYTIME="250"

JavaServer/DELAYTIME="540"

Standby/DELAYTIME="170"

WebServer/DELAYTIME="180"

Database/DELAYTIME="360"

FileServer/DELAYTIME="5"

In my case I feel it's just the host is too slow (2 year old single core with slightly overcomitted RAM, non compliant test I know!) because the exact tiles work fine on our over specced 585's.

/kimono/

All

vmark/fileserver workload