Overview: Windows NT 4.0 has two hardware access layers (HAL's) that can be used under VMware. The uniprocessor APIC HAL works under VMware, but won't recognize multiple processors. Our user experience on NT4 is greatly hampered by a single CPU in the guest, so the uniprocessor HAL doesn't work for us in our environment. The second HAL available is the MPS version, which is multi-processor aware. The guest runs great with this HAL, showing multiple CPU's available for use inside the guest. The problem with the MPS HAL is that the VMware host sees the guest taking 100% of the processor for each processor assigned to the guest. If you have a 4-CPU host, and a 4-CPU guest, VMware becomes very sluggish to respond to any requests. Cutting the number of processors down to 2 for the guest alleviates some of that congestion, but still won't allow those two processors to do any meaningful work for any other guests on the host.
This post describes my solution to the problem by adding the HLT instruction to the idle loop directly inside the HAL, effectively yielding processor time back to the host.
We run an ancient Citrix Metaframe farm (6 servers) to support our legacy ERP system. Since we moved to VMware virtualization, we have been irritated over the CPU usage on the host side being pegged at 100% when NT is configured with multiprocessor support. Switching over to the APIC HAL allows NT to behave properly in the VM world. We tried the CPUIDLE fix suggested by DoctorNet (http://communities.vmware.com/thread/102648) to keep the host happy, but that pegged the CPU inside the guest, and generally caused the guest to become less responsive than usual. Since we aren't able to replace our ERP system in the near future, I thought I'd take a crack at fixing this problem in a different way.
The multitude of articles that are available on the Web talking about similar problems all come down to the HLT instruction. I opened up HALAPIC.DLL in my favorite disassembler and found the following export and assembly:
HalProcessorIdle proc near sti ; Set Interrupt Flag hlt ; Enter Halt State retn ; Return from Near Procedure HalProcessorIdle endp
Ok, so it does the HLT instruction like we would expect. Opening up the HALMPS.DLL, I found the following:
HalProcessorIdle proc near sti ; Set Interrupt Flag retn ; Return from Near Procedure HalProcessorIdle endp
Sure enough, no HLT instruction. Going under the assumption that this problem may just be as simple as adding the HLT instruction to the HALMPS.DLL HalProcessorIdle routine, I rebuilt a new HALVMX.DLL based on HALMPS.DLL. I would post a hex-change of the changes needed, but I also had to update the signature of the DLL so that NT would load it and not complain about a "missing or corrupt" HAL.DLL.
I then updated our BOOT.INI to reflect the updated HAL:
[d-12537] timeout=4 default=multi(0)disk(0)rdisk(0)partition(1)\WTSRV [operating systems] multi(0)disk(0)rdisk(0)partition(1)\WTSRV="Windows Terminal Server Version 4.00 VMware" /HAL=HALVMX.DLL multi(0)disk(0)rdisk(0)partition(1)\WTSRV="Windows Terminal Server Version 4.00" multi(0)disk(0)rdisk(0)partition(1)\WTSRV="Windows Terminal Server Version 4.00 [VGA mode]" /basevideo /sos
We put the file in place on our Terminal Server farm and are bringing them online with the new HAL as users logout. So far the results are very promising. Our load on our hosts has dropped SIGNIFICANTLY. There isn't any difference in performance to the NT4 Guest, as it still thinks everything is fine.
For all of you TCO fanboys out there, run some power-saving numbers for the extra CPU cycles you aren't burning up anymore, and for the host savings in being able to consolidate more old SMP NT4 servers onto less hardware.
1) Copy the attached HALVMX.DLL into the Windows NT System32 folder.
2) Duplicate the first BOOT.INI option, typically the one with no switches so we have a way to fall back in case this fails.
3) Add "VMware" and the /HAL=HALVMX.DLL switch to the new line as shown above.
4) Reboot your guest.
As usual, this file comes with no warranty, express, nor implied.
UPDATED March 2, 2010
In the past two months, we have migrated to vSphere and my halvmx.dll is working swimmingly. Here's a screenshot for one of my 6 boxes CPU usage for the past month:
Each of these machines were cloned from a template I manually installed from the source (CD) media. I used the NT4 BusLogic SCSI drivers as those were the most up-to-date drivers I could find that would work inside VMware. (VMware's drivers seem to be non-existent.) The VMware drivers for the video (VMware SVGA II driver version 220.127.116.11) and mouse work great. I'm using the AMD PCNET PCI Ethernet Adapter version 3.11 to stay compatible with VMware's hardware emulation.
Here's the hardware configuration from vCenter's perspective. The Network Adapter 1 in the picture is a "Flexible" adapter, to maintain compatibility with the AMD PCNET driver in the guest.
On the Options tab, I also changed the CPU/MMU Virtualization to Software instead of Automatic since NT4 is completely unaware of virtualization hardware.