sreenathmv
Contributor
Contributor

VM's Freeze in ESX 3.0.1 with Black Screen

I am having a problem with VMs in my VMWARE ESX 3.0.1 environment. VM’s freezes with black screen and when this happens CPU utilization will be high. The only way I can bring back my VM’s is by resetting.

This is happening frequently in my production environment and vmware support guys have no clue why this is happening and have been working on this from past one month.

Envoronment : HP DL580 G4 , guest VM’s running Windows 2003 STD SP1 with Citrix Presentation Server

Please find the vmware.log

Apr 03 01:27:58.948: vcpu-0| PIIX4: PMAccessPM got ACPI S1 request

Apr 03 01:27:58.948: vcpu-0| Msg_Hint: msg.piix4pm.guestInS1 (not shown)

Apr 03 01:29:08.226: vcpu-0| PIIX4: PM Resuming... (from S1 (0x4))

Apr 03 01:29:08.346: vcpu-1| SVGA: Unregistering IOSpace at 0x1060 (0x1060)

Apr 03 01:29:08.346: vcpu-1| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:29:08.347: vcpu-1| SVGA: Registering IOSpace at 0x1060 (0x0)

Apr 03 01:29:08.347: vcpu-1| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:29:08.494: mks| SVGA: Using extended FIFO: Caps 0x00000007, Flags 0x00000000

Apr 03 01:29:08.495: mks| HostOps hideCursor before defineCursor!

Apr 03 01:29:08.543: mks| MKS remote display status changed, enabling remoteoptimizations

Apr 03 01:30:08.481: vcpu-0| PIIX4: PMAccessPM got ACPI S1 request

Apr 03 01:30:08.481: vcpu-0| Msg_Hint: msg.piix4pm.guestInS1 (not shown)

Apr 03 01:30:25.981: vcpu-0| PIIX4: PM Resuming... (from S1 (0x4))

Apr 03 01:30:26.056: vcpu-1| SVGA: Unregistering IOSpace at 0x1060 (0x1060)

Apr 03 01:30:26.056: vcpu-1| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:30:26.057: vcpu-1| SVGA: Registering IOSpace at 0x1060 (0x0)

Apr 03 01:30:26.057: vcpu-1| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:30:26.178: mks| SVGA: Using extended FIFO: Caps 0x00000007, Flags 0x00000000

Apr 03 01:30:26.179: mks| HostOps hideCursor before defineCursor!

Apr 03 01:30:26.244: mks| MKS remote display status changed, enabling remoteoptimizations

Apr 03 01:31:26.112: vcpu-0| PIIX4: PMAccessPM got ACPI S1 request

Apr 03 01:31:26.112: vcpu-0| Msg_Hint: msg.piix4pm.guestInS1 (not shown)

Apr 03 01:33:08.106: vcpu-1| PIIX4: PM Resuming... (from S1 (0x4))

Apr 03 01:33:08.271: vcpu-1| SVGA: Unregistering IOSpace at 0x1060 (0x1060)

Apr 03 01:33:08.271: vcpu-1| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:33:08.273: vcpu-1| SVGA: Registering IOSpace at 0x1060 (0x0)

Apr 03 01:33:08.273: vcpu-1| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:33:08.464: mks| SVGA: Using extended FIFO: Caps 0x00000007, Flags 0x00000000

Apr 03 01:33:08.464: mks| HostOps hideCursor before defineCursor!

Apr 03 01:33:08.494: mks| MKS remote display status changed, enabling remoteoptimizations

Apr 03 01:34:08.391: vcpu-0| PIIX4: PMAccessPM got ACPI S1 request

Apr 03 01:34:08.391: vcpu-0| Msg_Hint: msg.piix4pm.guestInS1 (not shown)

Apr 03 01:35:08.413: vcpu-0| PIIX4: PM Resuming... (from S1 (0x4))

Apr 03 01:35:08.642: vcpu-1| SVGA: Unregistering IOSpace at 0x1060 (0x1060)

Apr 03 01:35:08.642: vcpu-1| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:35:08.643: vcpu-1| SVGA: Registering IOSpace at 0x1060 (0x0)

Apr 03 01:35:08.643: vcpu-1| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 03 01:35:08.802: mks| SVGA: Using extended FIFO: Caps 0x00000007, Flags 0x00000000

Apr 03 01:35:08.803: mks| HostOps hideCursor before defineCursor!

Apr 03 01:35:08.812: mks| MKS remote display status changed, enabling remoteoptimizations

Apr 03 09:16:21.025: mks| Ignoring update request in VGA_Expose (mode change pending).

Apr 03 09:20:04.683: mks| SOCKET 7 recv error 5: Input/output error

Apr 03 09:20:04.683: mks| SOCKET 7 destroying VNC backend on socket error: 5

0 Kudos
14 Replies
soleblazer
Hot Shot
Hot Shot

Is there anything going on in the service console, high cpu/memory? If the service console gets hosed it could affect vm's.

Kind of hard to say, is it all vm's and all OS's withing those vm's? Are you using alot of resources on the host?

0 Kudos
GotToBeStrong
Contributor
Contributor

I too have had this issue happen to me, but only on one specific Guest OS. I have not yet updated to 3.0.1 though. I am running 4 servers, all Win2k3 SP2. The one OS in question is 32bit. The only resource intensive service running on that OS is a SQL engine for my Veritas Backup Exec. I am looking into moving the backup exec database off of that box and removing the SQL server service. Aside from that It is used for a file server. I cannot isolate a specific trigger process from within windows, and the symptom is so rare that it's hard to catch, and impossible to recreate. The Support team at VM suggested to turn off Hyperthreading and reduce the number of VCpu's to that machine but my other guest OS's are running with the full functionality of my processors and do not have a problem, so i hesitate to take such drastic measures. I hope that I can be of some assistance finding the cause of this annomoly. I am only glad to see that this is not isolated to just my one VM.

0 Kudos
ramram77
Contributor
Contributor

CPU and Memory utilization is not much on both service console and host.

I have checked resource usage, everything is normal. All the VM's are running windows 2K3 SP1 and this is happening on most of the VM's.

0 Kudos
sreenathmv
Contributor
Contributor

CPU and Memory utilization is not much on both service console and host.

I have checked resource usage, everything is normal. All the VM's are running windows 2K3 SP1 and this is happening on most of the VM's.

0 Kudos
dheerajms
Enthusiast
Enthusiast

I can think about the following w.r.t. your problem right now:

There could be many reasons for this like LUN locking, VMware Tools not updated after applying some patches etc. How is your storage designed? Zoning in place if you have a SAN Switch? Hope you have not over committed the resources. When you say CPU utilization is high, whose CPU utilization? VM or the ESX host? Server/Storage Firmware up-to-date? Check the best practice for BIOS/Virtualization/64-Bit setting for your CPU type.

0 Kudos
Paul_Lalonde
Commander
Commander

Looks like power management... (I know, strange). Are you running a screensaver or anything like that? If so, turn it off. I'd even go as far as to check the BIOS of the VM itself and make sure it's not doing anything with APM or ACPI to shut down devices.

Paul

0 Kudos
sreenathmv
Contributor
Contributor

Hi,

Thanks for the reply.

Iam not using any storage, everything is local.

CPU behaviour when the system hangs with black screen.

1. Have assigned 2 vCPU's per VM's

2. when the system hangs, utilization on vCPU0 will be utilized 100% and utilization on vCPU1 will be around 5 to 10%

I have already done the following without any sucess.

1. Disable Memory ballon driver

2. Memory and CPU reservations

3. Disable screensaver

4. Standby set to Never in power managment

Sreenath

0 Kudos
Kemsies
Enthusiast
Enthusiast

Hi,

I had this error twice a whole ago. Both VM's had W2K3 SP1. Before this error happened to the first VM I had added on CPU and I wasn't able to get the VM running again always black screen and first CPU at 100%. I found out that there was a problem with the HAL.

So I did the following:

\- boot the VM with a W2K3 SP1 CD

\- copy the HAL DLL file you need to your system32 folder

\- edit the boot.ini with the name of your HAL DLL file e.g. /HAL=halmps.dll[/b]

\- reboot the VM

0 Kudos
sreenathmv
Contributor
Contributor

Hi,

In my case i have not upgraded from uniprocessor to multiprocessor.

All the VM's are built with 2 vCPU's.

Regards

Sreenath

0 Kudos
dheerajms
Enthusiast
Enthusiast

Use 2 vCPU's only if required., if the application you are running on that VM really makes use of it.

Otherwise, 1 should be sufficient.

-Dheeraj.

0 Kudos
Freitag
Enthusiast
Enthusiast

there must be a special reason do give 2 vCPU, otherwise 1 should be selected.

0 Kudos
TiBoReR
Enthusiast
Enthusiast

I have the same problem. A VM freezes an need to reset it to get it working.

ESX 3.0.1 fully patched all local HD. HP DL360 G5.

2 VMs running on it (W2k3 R2 32bits). Only 1 experiencing this issue. The one experiencing the issue is DC, file server, Backup server with tape drive on the ESX host mapped to that VM with backup exec 11D and symantec anti virus server.

0 Kudos
Todaysits
Contributor
Contributor

If you are using 2 vCPUs and a uniprocessor kernel, that could be the problem. A single vCPU and multiprocessor kernel is supported, but not the other way around. And why not either take away a vCPU, or upgrade to multi?

The only black screen lockups I get are due to virtual memory running out on the guest due to a leak in McAfee Virusscan.

Scott

0 Kudos
sreenathmv
Contributor
Contributor

Hi,

The issue for VM freeze in my case was due to standby.

After making the desired changes

in registry to disable standby , My VM is running without any issues

The below script disables standby.

Windows Registry Editor Version 5.00

\[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACPI\Parameters]

"AMLIMaxCTObjs"=hex:04,00,00,00

"Attributes"=dword:00000070

\[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACPI\Parameters\WakeUp]

"FixedEventMask"=hex:20,05

"FixedEventStatus"=hex:00,84

"GenericEventMask"=hex:18,50,00,10

"GenericEventStatus"=hex:10,00,ff,00

Let me know if this resolves your issue....

Regards

Sreenath

0 Kudos