VMware Cloud Community
wapiti10
Enthusiast
Enthusiast

Windows Server 2003 R2 64 bit- 16GB RAM - 4 VCPU's- Extremely Long Boot times

ESX 3.5 Host with 16 Cores and 64 GB RAM

Guest in Question: Windows Server 2003 R2 64 bit- 16GB RAM - 4 VCPU's

Started with 1 GB RAM, and the server booted fine. I configured my server with updates and shut down.

I added 15 GB of RAM and the server began taking up to 30 minutes to boot all the way up. Hanging on the Windows Splash screen with the scrolling graphic for the majority of that time.

Since, here is what I have done (checking each time that the server sees the amount of RAM and the correct number of cpus, It does):

scoured the community.

checked the services for services that didn't start,nothing Glaring

Checked the event log, (event log service didn't start until windows finally came up).

checked limits and reservations = no reservations and unlimited is checked.

-no ballooning or swapping taking place on host

-Shut down the server,

-removed 14 GB of RAM from guest(2 total now) = boot in 2 minutes

-shutdown, add 2 GB of RAM(4 total now) = boot in 2 minutes

-shutdown, add 4 GB of RAM(8 total now) = boot in 2 minutes

-shutdown, add 4 GB of RAM(12 total now) = boot in 2 minutes

-shutdown, add 4 GB of RAM(back to 16 now) = boot in 2 minutes

-let server stand, selected "restart" from the shutdown menu = 30 minutes to come back to windows

-doubled page file to 10 GB (I know that Ideally I want my Page file to = the amount of Memory in the OS, but my OS part is too small, COULD THIS BE MY PROBLEM? though it wouldn't explain why it booted in 2 minutes with 16 GB of RAM when I stepped the server up...)

-selected "restart" from the shutdown menu = 30 minutes to come back to windows.

OK so you see my issue, I am looking for some help and I have some additional questions:

1. could it be the pagefile?

2. could I have a bad pair memory in my host? is there a log or a memory test in ESX that I could look at to find out?

3. any other suggestions?

thanks,

Dallas

Dallas
Reply
0 Kudos
58 Replies
RParker
Immortal
Immortal

Sounds right, physical servers take longer to boot with more memory, it's probably doing a memory test.

Have you tried going into the bios of the VM, and turning memory testing off?

Reply
0 Kudos
aguacero
Hot Shot
Hot Shot

Any particular reason why you gave the VM 4 vCPUs? I would recommend always starting with 1 vCPU and if necessary add more! Check your drive for space and page file location. Latest vmware tools installed. Latest patches to the esx host. If you had bad memory, you would probably get "panic attacks" by the esx hosts. You can download from Veeam their monitor product with a eval license http://www.veeam.com/vmware-esx-monitoring.html to assist you where to look at if it's outside of the box or inside.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Reply
0 Kudos
wapiti10
Enthusiast
Enthusiast

I am in the BIOS and I dont see where to disable memory testing? a co worker also suggested this prior to me posting today.

can you give me any direction?

Dallas

Dallas
Reply
0 Kudos
Randy_B
Enthusiast
Enthusiast

I've seen this more with the 64bit systems and have found that setting the page file to "system managed" in Windows helps some. We also saw this when using a older, slower san for testing. I think the delay is waiting for Windows to create the page file, I believe system managed allows it to start out smaller and grow if needed.

Reply
0 Kudos
kjb007
Immortal
Immortal

The option you're looking for will usually be Boot Time Diagnostic Screen. It's disabled by default. You should be able to check the vmware.log in the same folder as your vm for more info. Also, do you have clear pagefile on shutdown activated also in windows? This would also cause problems. Usually, when a boot takes a long time, on the windows splash screen, it could be memory, but it could also be network connections and/or mapped drives that are timing out before windows comes up.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
madcult
Enthusiast
Enthusiast

We have seen a similar behaviour in our system if we increase the number of vCPUs in a virtual machine.

If we've added 4 vCPUs to a VM it took a long time to boot. If we've added only 2 vCPUs it was a lot of faster. The fastest boot was with only 1 vCPU.

Reply
0 Kudos
AWo
Immortal
Immortal

We're facing such an issue with ESX 3.5 and some W2K3 (SE/EE) guests. Sometimes it comes up nomally, sometimes it takes up to 15 minutes to boot. Even if the guest is the only one on the host.

VMware Tools are installed.

I don't know why this sometime happen and sometimes not.

We also have W2K3 system were we never saw such an behavior in the same cluster.

AWo

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
Reply
0 Kudos
kjb007
Immortal
Immortal

Are you sure you're using the correct HAL on those VM's?

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
madcult
Enthusiast
Enthusiast

Since it is Microsoft Windows I hope so. If I change hardware e.g. increase cpu from 2 to 4 in a VM and start it after that there is a balloon-tip shown within the VM that new hardware has been detected and installed correctly. If I look in the device manager (I don't know the correct english word) in the control panel of this VM I see 4 cpus. So in my opinion it should be the correct HAL. Am I wrong?

Reply
0 Kudos
ncarde
Enthusiast
Enthusiast

We're seeing this issue also (happened when we increased the vRAM from 1GB to 12GB -- I am suspicous of it taking extra time due to:

-Windows and its pagefile mechanism,

-ESX creating the .vswp file during boot (we set a memory reservation of 10GB so that lowered the .vswp to 2GB but it still takes quite a bit of time to boot...

Reply
0 Kudos
kjb007
Immortal
Immortal

Seeing 4 CPUs does not always mean you're using the correct HAL. Check your device manager, and under the computer branch, and see if it says acpi multiprocessor. Also, check the c:\windows\repair\setup.log and look for the string "hal", it should have an "m" in there (halmacpi.dll NOT halaacpi.dll )

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
kjb007
Immortal
Immortal

The pagefile is created/processed at boot time, if you have clear pagefile on exit policy set, then it will be cleared out at shutdown. I have very rearely seen a scenario where that large of a pagefile is beneficial. From what I remember, a pagefile over 4 GB is actually bad, since a core dump will sometimes not process correctly in a pagefile over the 4 GB size. Try lowering the size of your pagefile, but I'm not sure that will help here. I'm not sure how long a .vswp file creation would take, since ESX creating a disk takes not long at all.

Still, if it helped speed up your boot, it's all the better, but now you have a reservation to deal with as well. I would check out the pagefile a little more.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
ncarde
Enthusiast
Enthusiast

With you 100% on the dubiousness of large pagefiles being beneficial -- unfortunately SAP doesn't feel that way. Smiley Happy

You alluded to "....now you have a reservation to deal with..." -- you take these as being negative? For our SAP scenario we have a limited # of VM's per Physical host so I'm not seeing reservations as being an issue (especially since it cuts down on 12GB .vswp files consuming local ESX VMFS disk space for MSCS configs) but perhaps I'm missing something...

Reply
0 Kudos
TomHowarth
Leadership
Leadership

what vmware ESX version are you running, ESX 3.5 had an issue with 4vCPU guests running slowly their was a patch that sloved this. I cannot find it right now but update 1 should fix it. however 4vCPU guest with large amount of memory will take significantly longer to boot than single vCPU guest with lower memory, this is expected behaviour. windows does a memory check of all its available memory on boot writing to each sector to verify.

Tom Howarth

VMware Communities User Moderator

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
Reply
0 Kudos
kjb007
Immortal
Immortal

Yes, I try not to use reservations unless I absolutely have to use them. It adds more overhead to the DRS calculations, and is usually forgotten about until problems arrise, and questions posted on this board. Not that I have an issue with answering questions. It's good to reduce the size of the .vswp file, but that file isn't really used unless the guests are starved for memory, and have to swap to disk. If you have disk issues, this is good, but if it's not helping you to change the default, then I don't suggest changing it. It helped marginally here, so it's good, and if the app requires it, it's good, but that isn't so here.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
ncarde
Enthusiast
Enthusiast

DRS = Not supported for MSCS config so a non-issue w/reservations & adding complexity but point taken, thank you.

I would love to figure out exactly what is causing these long reboot times though.

It doesn't appear that the vswp is getting created each time the Guest OS (Windows 2K3 Enterprise, 64-bit) is rebooted so I am definitely suspecting something within Windows.

The original poster is seeing long reboot times with 4 vCPU's -- in my case it is similar although we have 2 vCPU's (running on ESX 3.5 Update 1)

Reply
0 Kudos
kjb007
Immortal
Immortal

Maybe you can post the vmware.log after a reboot, so maybe we can see something in the log? Hopefully we can come to some resolution of not just an understanding.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
ncarde
Enthusiast
Enthusiast

Below is a vmware.log file.

VM was shutdown at 6:03 PM

VM came back online at 6:17 PM

This gap in time is represented by these two lines in the vmware.log file:

May 06 18:03:41.096: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:17:33.227: vcpu-0| SVGA: Unregistering IOSpace at 0x1060

vmware.log contents during reboot:

May 06 18:02:50.221: vcpu-1| Guest: toolbox: Got a logoff event.

May 06 18:02:50.247: vcpu-1| GuestRpc: Channel 1 reinitialized.

May 06 18:02:51.353: vcpu-1| Guest: toolbox: Got a logoff event.

May 06 18:02:52.802: vcpu-1| Guest: toolbox: VMware Tools Service Shutdown.

May 06 18:02:52.803: vcpu-1| Guest: toolbox: VMware Tools Service Stopping.

May 06 18:02:52.838: vcpu-0| TOOLS autoupgrade protocol version 0

May 06 18:02:52.840: vcpu-0| TOOLS ToolsCapabilityGuestTempDirectory received 0

May 06 18:02:52.841: vcpu-0| GuestRpc: Channel 0 reinitialized.

May 06 18:02:52.845: vcpu-1| Guest: toolbox: Service: waiting for GuestInfoServer thread.

May 06 18:02:52.845: vcpu-0| Guest: toolbox: GuestInfoServer received quit event.

May 06 18:02:52.846: vcpu-0| Guest: toolbox: GuestInfoServer exiting.

May 06 18:02:52.846: vcpu-1| Guest: toolbox: Service: GuestInfoServer thread exited.

May 06 18:03:17.933: vcpu-0| VMMouse: CMD Disable

May 06 18:03:17.933: vcpu-0| VMMouse: Disabling VMMouse mode

May 06 18:03:17.933: vcpu-0| MKS switching absolute mouse on

May 06 18:03:17.953: vcpu-1| CPU reset: soft

May 06 18:03:17.953: vcpu-0| CPU reset: soft

May 06 18:03:18.098: mks| VNCENCODE 6 encoding mode change: (640x480x16depth,16bpp)

May 06 18:03:18.103: mks| VNCENCODE 7 encoding mode change: (640x480x16depth,16bpp)

May 06 18:03:18.120: vcpu-1| CPU reset: soft

May 06 18:03:18.131: vcpu-0| SVGA: Unregistering IOSpace at 0x1060

May 06 18:03:18.131: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:03:18.266: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:03:18.287: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:03:18.607: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:03:18.634: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:03:18.637: vcpu-0| SVGA: Registering IOSpace at 0x1060

May 06 18:03:18.637: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:03:18.664: vcpu-1| CPU reset: soft

May 06 18:03:18.688: mks| VNCENCODE 7 encoding mode change: (720x400x16depth,16bpp)

May 06 18:03:18.688: mks| VNCENCODE 6 encoding mode change: (720x400x16depth,16bpp)

May 06 18:03:18.809: mks| VNCENCODE 6 encoding mode change: (640x480x16depth,16bpp)

May 06 18:03:18.818: mks| VNCENCODE 7 encoding mode change: (640x480x16depth,16bpp)

May 06 18:03:18.821: vcpu-0| SIO: Skipping bogus enable for COM1

May 06 18:03:18.822: vcpu-0| SIO: Skipping bogus enable for COM2

May 06 18:03:18.887: vcpu-0| DISKUTIL: scsi0:0 : geometry=5221/255/63

May 06 18:03:18.917: vcpu-0| DISKUTIL: scsi1:0 : geometry=525/255/63

May 06 18:03:18.917: vcpu-0| DISKUTIL: scsi1:1 : geometry=7314/255/63

May 06 18:03:19.591: vcpu-1| CPU reset: soft

May 06 18:03:19.609: vcpu-0| BIOS-UUID is 50 04 09 e1 33 cf 5c 85-6b 21 7c b9 f6 c2 0e b7

May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300

May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300

May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300

May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300

May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300

May 06 18:03:20.024: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300

May 06 18:03:20.030: mks| VNCENCODE 7 encoding mode change: (720x400x16depth,16bpp)

May 06 18:03:20.030: mks| VNCENCODE 6 encoding mode change: (720x400x16depth,16bpp)

May 06 18:03:20.303: vcpu-0| Unknown int 10h func 0x2000

May 06 18:03:38.792: mks| VNCENCODE 7 encoding mode change: (640x480x16depth,16bpp)

May 06 18:03:38.792: mks| VNCENCODE 6 encoding mode change: (640x480x16depth,16bpp)

May 06 18:03:39.164: vcpu-1| CPU reset: soft

May 06 18:03:41.090: vcpu-0| SVGA: Unregistering IOSpace at 0x1060

May 06 18:03:41.090: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:03:41.095: vcpu-0| SVGA: Registering IOSpace at 0x1060

May 06 18:03:41.096: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:17:33.227: vcpu-0| SVGA: Unregistering IOSpace at 0x1060

May 06 18:17:33.228: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:17:33.230: vcpu-0| SVGA: Registering IOSpace at 0x1060

May 06 18:17:33.231: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

May 06 18:17:36.401: mks| VNCENCODE 7 encoding mode change: (800x600x16depth,16bpp)

May 06 18:17:36.401: mks| VNCENCODE 6 encoding mode change: (800x600x16depth,16bpp)

May 06 18:17:38.273: vcpu-0| Balloon: Start: vmmemctl reset balloon

May 06 18:17:38.273: vcpu-0| Balloon: Reset (n=2 pages=0)

May 06 18:17:38.273: vcpu-0| Balloon: Reset: nUnlocked=0 (size=0)

May 06 18:17:41.952: mks| MKS remote display status changed, enabling remote optimizations

May 06 18:17:42.778: vcpu-0| GuestRpc: Channel 0, registration number 1, guest application toolbox.

May 06 18:17:42.779: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300

May 06 18:17:42.779: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300

May 06 18:17:42.779: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300

May 06 18:17:42.784: vcpu-0| TOOLS autoupgrade protocol version 2

May 06 18:17:42.800: vcpu-0| TOOLS ToolsCapabilityGuestTempDirectory received 1 C:\WINNT\TEMP

May 06 18:17:42.800: vcpu-0| TOOLS setting the tools version to '7300'

May 06 18:17:42.848: vcpu-0| TOOLS soft reset detected.

May 06 18:17:42.848: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300

May 06 18:17:42.848: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300

May 06 18:17:42.848: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300

May 06 18:17:42.848: vcpu-0| TOOLS installed version 7300, available version 7300

May 06 18:17:42.848: vcpu-0| TOOLS don't need to be upgraded.

May 06 18:17:42.942: vcpu-0| Guest: toolbox: Version: build-82663

May 06 18:17:42.943: vcpu-0| TOOLS unified loop capability requested by 'toolbox'; now sending options via TCLO

May 06 18:19:07.004: vcpu-0| VMMouse: CMD Read ID

May 06 18:19:07.004: vcpu-0| MKS switching absolute mouse on

May 06 18:19:16.329: vcpu-0| TOOLS unified loop capability requested by 'toolbox-dnd'; now sending options via TCLO

May 06 18:19:16.329: vcpu-0| GuestRpc: Channel 1, registration number 1, guest application toolbox-dnd.

May 06 18:19:16.329: vcpu-0| DISKUTIL: scsi1:1 : toolsVersion = 7300

May 06 18:19:16.329: vcpu-0| DISKUTIL: scsi1:0 : toolsVersion = 7300

May 06 18:19:16.329: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7300

May 06 18:50:51.004: vcpu-1| TOOLS unified loop capability requested by 'toolbox-ui'; now sending options via TCLO

May 06 18:50:51.004: vcpu-1| GuestRpc: Channel 3, registration number 1, guest application toolbox-ui.

May 06 18:50:51.004: vcpu-1| DISKUTIL: scsi1:1 : toolsVersion = 7300

May 06 18:50:51.004: vcpu-1| DISKUTIL: scsi1:0 : toolsVersion = 7300

May 06 18:50:51.004: vcpu-1| DISKUTIL: scsi0:0 : toolsVersion = 7300

May 06 18:50:51.011: vcpu-1| TOOLS unified loop capability requested by 'toolbox-ui'; now sending options via TCLO

May 06 18:50:51.012: vcpu-1| GuestRpc: Channel 2, conflict: guest application toolbox-ui tried to register, but it is still registered on channel 3

May 06 18:50:51.012: vcpu-1| GuestRpc: Channel 2 reinitialized.

May 06 18:50:51.012: vcpu-1| GuestRpc: Channel 2 reinitialized.

May 06 18:50:54.419: vcpu-1| GuestRpc: Channel 3 reinitialized.

May 06 19:29:01.900: mks| SOCKET 7 recv error 110: Connection timed out

May 06 19:29:01.900: mks| SOCKET 7 destroying VNC backend on socket error: 110

May 06 19:29:02.208: mks| SOCKET 6 recv error 110: Connection timed out

May 06 19:29:02.208: mks| SOCKET 6 destroying VNC backend on socket error: 110

May 06 19:54:09.792: vcpu-1| Guest: toolbox: Got a logoff event.

May 06 19:54:09.821: vcpu-1| GuestRpc: Channel 1 reinitialized.

May 06 19:54:10.858: vcpu-0| Guest: toolbox: Got a logoff event.

May 06 19:54:13.296: mks| SOCKET 8 recv error 5: Input/output error

Reply
0 Kudos
ncarde
Enthusiast
Enthusiast

FYI

I have opened an SR for this and will let everyone know what we hear back...

Reply
0 Kudos