NT4, Citrix Metaframe 1.8, and SMP under VMware - CPU Hogging Solution

NT4, Citrix Metaframe 1.8, and SMP under VMware - CPU Hogging Solution

Overview: Windows NT 4.0 has two hardware access layers (HAL's) that can be used under VMware. The uniprocessor APIC HAL works under VMware, but won't recognize multiple processors. Our user experience on NT4 is greatly hampered by a single CPU in the guest, so the uniprocessor HAL doesn't work for us in our environment. The second HAL available is the MPS version, which is multi-processor aware. The guest runs great with this HAL, showing multiple CPU's available for use inside the guest. The problem with the MPS HAL is that the VMware host sees the guest taking 100% of the processor for each processor assigned to the guest. If you have a 4-CPU host, and a 4-CPU guest, VMware becomes very sluggish to respond to any requests. Cutting the number of processors down to 2 for the guest alleviates some of that congestion, but still won't allow those two processors to do any meaningful work for any other guests on the host.

This post describes my solution to the problem by adding the HLT instruction to the idle loop directly inside the HAL, effectively yielding processor time back to the host.

-


We run an ancient Citrix Metaframe farm (6 servers) to support our legacy ERP system. Since we moved to VMware virtualization, we have been irritated over the CPU usage on the host side being pegged at 100% when NT is configured with multiprocessor support. Switching over to the APIC HAL allows NT to behave properly in the VM world. We tried the CPUIDLE fix suggested by DoctorNet (http://communities.vmware.com/thread/102648) to keep the host happy, but that pegged the CPU inside the guest, and generally caused the guest to become less responsive than usual. Since we aren't able to replace our ERP system in the near future, I thought I'd take a crack at fixing this problem in a different way.

The multitude of articles that are available on the Web talking about similar problems all come down to the HLT instruction. I opened up HALAPIC.DLL in my favorite disassembler and found the following export and assembly:

HalProcessorIdle proc near
	sti      ; Set Interrupt Flag
	hlt      ; Enter Halt State
	retn     ; Return from Near Procedure
HalProcessorIdle endp

Ok, so it does the HLT instruction like we would expect. Opening up the HALMPS.DLL, I found the following:

HalProcessorIdle proc near
	sti      ; Set Interrupt Flag
	retn     ; Return from Near Procedure
HalProcessorIdle endp

Sure enough, no HLT instruction. Going under the assumption that this problem may just be as simple as adding the HLT instruction to the HALMPS.DLL HalProcessorIdle routine, I rebuilt a new HALVMX.DLL based on HALMPS.DLL. I would post a hex-change of the changes needed, but I also had to update the signature of the DLL so that NT would load it and not complain about a "missing or corrupt" HAL.DLL.

I then updated our BOOT.INI to reflect the updated HAL:

[d-12537]
timeout=4
default=multi(0)disk(0)rdisk(0)partition(1)\WTSRV
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WTSRV="Windows Terminal Server Version 4.00 VMware" /HAL=HALVMX.DLL
multi(0)disk(0)rdisk(0)partition(1)\WTSRV="Windows Terminal Server Version 4.00" 
multi(0)disk(0)rdisk(0)partition(1)\WTSRV="Windows Terminal Server Version 4.00 [VGA mode]" /basevideo /sos

We put the file in place on our Terminal Server farm and are bringing them online with the new HAL as users logout. So far the results are very promising. Our load on our hosts has dropped SIGNIFICANTLY. There isn't any difference in performance to the NT4 Guest, as it still thinks everything is fine.

For all of you TCO fanboys out there, run some power-saving numbers for the extra CPU cycles you aren't burning up anymore, and for the host savings in being able to consolidate more old SMP NT4 servers onto less hardware.

Installation Instructions:

1) Copy the attached HALVMX.DLL into the Windows NT System32 folder.

2) Duplicate the first BOOT.INI option, typically the one with no switches so we have a way to fall back in case this fails.

3) Add "VMware" and the /HAL=HALVMX.DLL switch to the new line as shown above.

4) Reboot your guest.

As usual, this file comes with no warranty, express, nor implied.

Eric

-


UPDATED March 2, 2010

In the past two months, we have migrated to vSphere and my halvmx.dll is working swimmingly. Here's a screenshot for one of my 6 boxes CPU usage for the past month:

Each of these machines were cloned from a template I manually installed from the source (CD) media. I used the NT4 BusLogic SCSI drivers as those were the most up-to-date drivers I could find that would work inside VMware. (VMware's drivers seem to be non-existent.) The VMware drivers for the video (VMware SVGA II driver version 11.6.0.4) and mouse work great. I'm using the AMD PCNET PCI Ethernet Adapter version 3.11 to stay compatible with VMware's hardware emulation.

Here's the hardware configuration from vCenter's perspective. The Network Adapter 1 in the picture is a "Flexible" adapter, to maintain compatibility with the AMD PCNET driver in the guest.

On the Options tab, I also changed the CPU/MMU Virtualization to Software instead of Automatic since NT4 is completely unaware of virtualization hardware.

Attachments
Comments

Eric,

Excellent work. How has it worked out after 2 months in production? Is this solution processor dependent (1 or 2 CPU)? Did you perform a fresh install or did you use a P2V app (which app)? Which version of VMware is being used?

Thanks,

Skip

Hi Skip,

Hopefully this latest update will answer your questions. If not, please don't hesitate to ask and I'll try to answer anything else you throw at me... Smiley Happy

Thanks,

Eric

Eric,

I know that this is an old post but now we have same problem on Windows 2003. Unfortunately we cannot use halvmx.dll (because it is for Win NT i think). The windows cannot start with it.

Can you explain in more details how to edit the dll file?

Thanks in advance!

tpalav,

Windows kernels above NT4 already have the HLT instruction inside the idle loop of the kernel.  If you're seeing high CPU usage on the host but not the guest, I would investigate kernel mode drivers (think SCSI drivers, video drivers, etc) first.  Boot your system in safe mode to see if the problem goes away.  If it does, then you're definitely looking at a kernel driver issue.

Also, while your host is seeing the race condition, look in Task Manager in the guest on the Performance tab, click View > Show Kernel Times.  You might see a lot of kernel CPU activity you weren't seeing before.

If you *really* want to dig into the Windows kernel, get yourself a disassembler of some sort (hex-rays IDA comes to mind) and disassemble the kernel32.sys file.  From there, start looking for entry points that have the word Cpu in them.

Happy hacking,

Eric

Version history
Revision #:
1 of 1
Last update:
‎12-21-2009 03:33 PM
Updated by: