VMware Communities
rbphilip
Contributor
Contributor

Workstation 6.5.2 still slower than 6.0.5

I have a Workstation 6.0 VM in which I do Xilinx FPGA compiles. Not a lot of disk i/o but enough processor usage that a single CPU VM is pegged at 100% and a dual processor VM is running about 60%.

It always seemed a little slower after I upgraded to 6.5.1, but I never bothered to quantify it. Today, after upgrading to 6.5.2, I did.

The host machine is a 2.4Ghz machine running XPx64. The guest VMs are XPx32 professional.

Under Workstation 6.0.5 a complete build of my FPGA takes 17 minutes and 45 seconds with one processor configured, 16 minutes with two processors.

Under Workstation 6.5.2 a complete build of my FPGA takes 19 minutes and 30 seconds with one processor configured, 18 minutes ten seconds with two processors.

Again, this is the very same VM, compiling the same FPGA. There is very little disk access.

Perhaps there are some interesting new tuning parameters for 6.5.2 that could help?

Thoughts, anyone?

0 Kudos
26 Replies
newbie93
Hot Shot
Hot Shot

Probably not much help but,

I think that there is an undiscovered (yet to be named) law:

"The relative version number of a software package is proportional to the amount of disk space and memory it needs and is inversely proportional to its performance".

We can call it "Moore's inverse law". (No offense to Mr. Moore).

Microsoft's products as well, seem to be following it quite nicely.

This probably explains why it takes the same amount of time to perform a task on a computer as it did 10 years ago, even though processor speeds have increased dramatically. Moore's law and Moore's inverse law cancel each other out and yield a net coefficient of 1.

0 Kudos
rbphilip
Contributor
Contributor

"The relative version

number of a software package is proportional to the amount of disk

space and memory it needs and is inversely proportional to its

performance".

Not actually true. I also tried Workstation 5.5.8 and it was the slowest of all at 21:30 for ths single-processor version. Didn't bother with the dual processor...

And I tried running the compile on "bare" hardware.....

So.. the numbers are:

On bare metal (Q6600 @ 2.4Ghz) running XPx32 - a complete build of the FPGA takes 14 minutes and 30 seconds

Under Workstation 5.5.8 a complete build o fmy FPGA takes 21 minutes and 30 seconds with one processor

Under Workstation 6.0.5 a complete build of my FPGA takes 17 minutes

and 45 seconds with one processor configured, 16 minutes with two

processors.

Under Workstation 6.5.2 a complete build of my FPGA takes 19 minutes

and 30 seconds with one processor configured, 18 minutes ten seconds

with two processors.

Obviously the "bare hardware" version is fastest.

WS 6.0.5 comes in at 81.7% of bare metal speed.

WS 6.5.2 comes in at 74.3% of bare metal speed

WS 5.5.8 comes in at 67.4% of bare metal speed.

There was clearly a signigicant speed improvement from 5.5.8 -> 6.0.5, but then a decrease in speed from 6.0.5 -> 6.5.2

I keep thinking there must be some sort of tuning parameter that would bring 6.5.2 up to the speed of 6.0.5

0 Kudos
vern4
Contributor
Contributor

What newbie93 said, or as I have often said ...

"Intel giveth and Microsoft taketh away."

I too have noticed that using my first PC of my very own at work (a 12 MHz 486 with 256 Mb RAM, if I recall correctly) I often waited impatiently for Windows (3.something) to do stuff. Now, with a 2.8 GHz Quad-core and 4 Gb RAM, I'm still waiting.

Grrr.

I wish that Microsoft would find the engineers that wrote file handling (why it takes 10s of seconds to delete a file or populate a "save as" dialog box I'll never know) and "System Idle Processes" (which often seems to consume 90+% of my CPU), shoot them, and then get a couple wizards to rewrite this code. Of course, that will never happen because those engineers are no doubt VPs by now.

Double Grrr!

0 Kudos
rbphilip
Contributor
Contributor

"I wish that Microsoft would find the engineers that wrote file handling

(why it takes 10s of seconds to delete a file or populate a "save as"

dialog box I'll never know) and "System Idle Processes" (which often

seems to consume 90+% of my CPU), shoot them, and then get a couple

wizards to rewrite this code. "..

I'm assuming you're actually joking here, or have virus/malware problems.My Windows machines (even VMs) populate their "save as" dialogs pretty much as fast as they can display the graphic.

And you do realize that "system idle processes" only run when nothing else is running on your computer, right?

0 Kudos
admin
Immortal
Immortal

You could try experimenting with execution modes, if your host supports AMD-V or VT-x. See .

0 Kudos
joe1600
Enthusiast
Enthusiast

u r rite,there has been a reduction in performance from version 6.0.5 to 6.5.2,hopefullt the next major release ie woks .7.0 would overcome this..I also Suggest Vmware TEAM TO Change the website to HTTPS FROM HTTP..

reagrds

JOe

Joe Joseph,Thanks in Advance If you find my reply useful, feel free to mark it as Helpful or Correct.
0 Kudos
joe1600
Enthusiast
Enthusiast

hii there is redcution performance from 6.0 (major version) to 6.5.2 i too felt it...hope it would be cleared in next major release,,,

regards

JOe

Joe Joseph,Thanks in Advance If you find my reply useful, feel free to mark it as Helpful or Correct.
0 Kudos
rbphilip
Contributor
Contributor

I spent a bit of time playing with these VT-x and 6.5.2, and the VM is actually much slower when using hardware virtualization. 13% slower, by my measurement.

Fortunately, 6.0.5 has all the features I need, so I'll stick with in and its better performance...

I sure wish I knew what it was about 6.5.2 that made it slower.

0 Kudos
admin
Immortal
Immortal

13% slower than Workstation 6.0.5 using VT-x, or 13% slower than Workstation 6.5.2 using binary translation?

Can you share your workload?

0 Kudos
rbphilip
Contributor
Contributor

"13% slower than Workstation 6.0.5 using VT-x, or 13% slower than Workstation 6.5.2 using binary translation?"

VT is 13% slower than 6.5.2 using binary translation.

6.5.2 using binary translation is already about 13% slower than 6.0.5 using binary translation.

I have a reasonably large FPGA compile (Xilinx EDK 10.1) that is a mix of processor and I/O.

Times on a 2.4Ghz Q6600 are posted at the beginning of the thread.

0 Kudos
admin
Immortal
Immortal

VT is 13% slower than 6.5.2 using binary translation.

That's a bit unexpected for a CPU-bound workload, but not surprising for a workload with a good deal of I/O.

6.5.2 using binary translation is already about 13% slower than 6.0.5 using binary translation.

That's disturbing. If you wouldn't mind collecting some statistics for me, send me a PM, and I'll give you instructions.

0 Kudos
rbphilip
Contributor
Contributor

"That's disturbing. If you wouldn't mind collecting some statistics for me, send me a PM, and I'll give you instructions.

"

Ah. If only I knew how to send you a PM Smiley Happy

How about we go offline to e-mail. You can get me at Rob.Philip@leksak.org

It's pretty easy for me to switch back and forth between 6.0.5 and 6.5.2

Rob

0 Kudos
rbphilip
Contributor
Contributor

I have found and resolved my problem.

Workstation 6.0.5 had experimental support for VT-x, but defaulted to an execution mode of "Binary Translation".

Workstation 6.5.2 has a dropdown that allows you to select "automatic", "binary translation" or "VT-x".

It turns out that (for me) if I leave the execution mode as "automatic" that inpires VT-x to be "automatically" selected or perhaps a mixture of BT and VT-x. If I explicitly set the machine to use "Binary Translation" I regain my 6.0.5 speed. In fact, it's a few percent faster than 6.0.5 was.

Apparently the logic used to decide between Binary Translation and VT-x is complex, and possibly flawed. It is a good idea to try your virtual machine both ways, and pick the one that is appropriate.

Solved!

Rob

0 Kudos
ksc
VMware Employee
VMware Employee

It turns out that (for me) if I leave the execution mode as "automatic" that inpires VT-x to be "automatically" selected or perhaps a mixture of BT and VT-x. If I explicitly set the machine to use "Binary Translation" I regain my 6.0.5 speed. In fact, it's a few percent faster than 6.0.5 was.

Apparently the logic used to decide between Binary Translation and VT-x is complex, and possibly flawed. It is a good idea to try your virtual machine both ways, and pick the one that is appropriate.

It is incredibly complex (which is why we generally won't discuss how each version decides), but it's not flawed. The default provides better performance in general - your case happens to be one of the exceptions. I very much agree that if you need that last 10% of performance, it is necessary to try both modes - I wish this were as easy to communicate to everyone trying to run benchmarks!

In general, compilation workloads would do slightly better under Binary Translation - they are I/O and MMU heavy - and computation workloads or benchmarks would do slightly better under VT - they are syscall-heavy. WS6.0 had very little tuning, and WS6.5 had a lot more tuning of defaults. (Based on nuances like which Windows version contains a HAL that interacts well or poorly with VT).

We were pleased when the option arrived in the UI to choose amongst the different modes. No competitor has such options - they all run in VT mode only.

0 Kudos
rbphilip
Contributor
Contributor

I stand corrected regarding the potential flaw!

In my case (an FPGA compile that takes 20-ish minutes) that last 10-15% really feels important. For anything interactive I never noticed a difference.

Now that I understand what is going on, I'm more than happy. I just always get grumpy when things don't work as I believe they should, and in particular don't seem to work consistently.

I really like having the option to pick one's execution mode. I'm looking forward to the day I build an i7 or i5 machine that supports the virtual MMU to go along with the VT-x.

I am intrigued by your comment about nuances of Windows HAL interacting well or poorly with VT. I'm running XPx64 so I can use all my 8G of memory and have large VMs. Is there a "better" choice from the standpoint of VMWare performance?

0 Kudos
oznet
Contributor
Contributor

I am intrigued by your comment about nuances of Windows HAL interacting well or poorly with VT. I'm running XPx64 so I can use all my 8G of memory and have large VMs. Is there a "better" choice from the standpoint of VMWare performance?

I personally would be interested to see what kind of performance you get on a Linux host.

0 Kudos
rbphilip
Contributor
Contributor

"I personally would be interested to see what kind of performance you get on a Linux host.

"

I'm betting this sort of question can be answered by the VMWare guy "ksc" that commented on the nuances of the Windows HAL. i can't imagine they haven't done this sort of experiment.

I can see myself doing this test "someday", where someday isn't soon. This is a development machine for me, with ICH9R raid drives that are pretty full. I'd have to do a complete backup in a way that Linux will be able to read the data, install a supported Linux host, bring back the data, install Workstation for Linux and then run my test.

Perhaps the next machine I buid I'll put a Linux system on it rather than XPx64.

As much as I like the idea of a linux host, XPx64 does the job pretty much flawlessly. Now, if someone could convince me that a VM running on top of a Linux host would run at 95% of bare metal speed rather than the 82% I get now, then it might be worth the effort...

Perhaps Jim or "ksc" knows if anyone has done a comparison of XP and Linux hosts.....

0 Kudos
admin
Immortal
Immortal

k> It is incredibly complex (which is why we generally won't discuss

k> how each version decides), but it's not flawed.

Actually, it is flawed. When we decided not to use VT-x for 32-bit

guests on 65nm Core 2, I missed the "modern Windows" branch of the

decision tree.

Oops,

--jim

0 Kudos
admin
Immortal
Immortal

I am intrigued by your comment about nuances of Windows HAL interacting well or poorly with VT.

ksc was referring to the guest HAL. For processors without Intel FlexPriority, the ACPI HAL performs much worse than the Standard PC HAL, because of frequent accesses to the TPR register in the local APIC. I believe that your host does have FlexPriority, so you shouldn't see much of a difference.

0 Kudos