VMware Cloud Community
wapiti10
Enthusiast
Enthusiast

Windows Server 2003 R2 64 bit- 16GB RAM - 4 VCPU's- Extremely Long Boot times

ESX 3.5 Host with 16 Cores and 64 GB RAM

Guest in Question: Windows Server 2003 R2 64 bit- 16GB RAM - 4 VCPU's

Started with 1 GB RAM, and the server booted fine. I configured my server with updates and shut down.

I added 15 GB of RAM and the server began taking up to 30 minutes to boot all the way up. Hanging on the Windows Splash screen with the scrolling graphic for the majority of that time.

Since, here is what I have done (checking each time that the server sees the amount of RAM and the correct number of cpus, It does):

scoured the community.

checked the services for services that didn't start,nothing Glaring

Checked the event log, (event log service didn't start until windows finally came up).

checked limits and reservations = no reservations and unlimited is checked.

-no ballooning or swapping taking place on host

-Shut down the server,

-removed 14 GB of RAM from guest(2 total now) = boot in 2 minutes

-shutdown, add 2 GB of RAM(4 total now) = boot in 2 minutes

-shutdown, add 4 GB of RAM(8 total now) = boot in 2 minutes

-shutdown, add 4 GB of RAM(12 total now) = boot in 2 minutes

-shutdown, add 4 GB of RAM(back to 16 now) = boot in 2 minutes

-let server stand, selected "restart" from the shutdown menu = 30 minutes to come back to windows

-doubled page file to 10 GB (I know that Ideally I want my Page file to = the amount of Memory in the OS, but my OS part is too small, COULD THIS BE MY PROBLEM? though it wouldn't explain why it booted in 2 minutes with 16 GB of RAM when I stepped the server up...)

-selected "restart" from the shutdown menu = 30 minutes to come back to windows.

OK so you see my issue, I am looking for some help and I have some additional questions:

1. could it be the pagefile?

2. could I have a bad pair memory in my host? is there a log or a memory test in ESX that I could look at to find out?

3. any other suggestions?

thanks,

Dallas

Dallas
Reply
0 Kudos
58 Replies
RolandK
Contributor
Contributor

Robert

The VM KB article is 1006016

Kind regards

Roland

Reply
0 Kudos
robertl30
Contributor
Contributor

Roland-

Thank you. That is the article I'd found also. The article states that you need to lie to VMware and tell it you are using Enterprise Edition when you are in fact using Standard edition. This does fix the glitch where you get the "This memory setting is larger than VMwares recommendation for this guest OS" warning. But does not solve the slow reboot & performance issues.

I wonder if we actually need to install Enterprise Edition of Windows 2003.

Where you got this to work correctly, are you using Windows 2003 Enterprise or Standard edition as the OS? I'm differentiating between what you tell VMware you are installing and what you are actually installing (as these do not need to be the same). Thanks.

Reply
0 Kudos
RolandK
Contributor
Contributor

Robert

We have installed Windows 2003 R2 64bit SP2 Standard Edition OS. VM is configured as a Windows 2003 Enterprise Edition 64bit. 4vcpu 16GB Memory 1vDisk 20GB the other vDisk 10 GB LSI Controller. 5 minutes to reboot.

Roland

Reply
0 Kudos
robertl30
Contributor
Contributor

@Roland

Seems our VMs are configured similarly (I have a lot more vDisks configured -- but I've tried deleting them and that doesn't fix the slowness issue). Going back to your original message I see your hosts are AMD. That could be a signifcant difference. I'm on Intel. I wonder if anyone experiencing the long reboot time/slow IO issue is running on AMD?

I heard back from VMware. They're advice: move the VM to another host and try again. I have 3 hosts in this cluster but they're all identical hardware. I've already tried it on all 3 and there was no difference. I do have some AMD machines in another datacenter. I'll try testing on that environment today to see if there's any difference.

I really appreciate the feedback on other environments. Very helpful.

Reply
0 Kudos
APlatt
Contributor
Contributor

Per my post above, I am running an AMD environment and with 4 in the same cluster, moving them between the physical hosts had no influence, only reducing the memory and lowering to 2 procs seems to affect the boot time.

Reply
0 Kudos
rrgavin
Contributor
Contributor

Follow up from VMware support....

Per our VMware TAM this was going to be fixed in 3.5 U4, now they are saying this WILL NOT be fixed until VI4... We have left our ticket open with VMware and asked that it be escalated, our BCS support is telling us that many people have opened cases for this.

Thx

Reply
0 Kudos
APlatt
Contributor
Contributor

@rrgavin, wait, your saying this will not be resolved until the next Upgrade of ESX?....version 4.0????

Thats a pretty long wait for a fix in my opinion....(-)

Reply
0 Kudos
jesse_gardner
Enthusiast
Enthusiast

I'm guessing from the hype that VI4 is going to be announced in the webcast in 30 minutes! Smiley Happy

But I'm also not satisfied with that answer.

Reply
0 Kudos
robertl30
Contributor
Contributor

Well, looks like VI4 is being announced in about 30 minutes. But I'm not going first! :smileyblush:

Reply
0 Kudos
APlatt
Contributor
Contributor

sooooo......is vsphere4 ESX4? is this what is supposed to make all our troubles go away???

Im sooo not impressed.

sigh

Reply
0 Kudos
robertl30
Contributor
Contributor

I'm getting so confused. I just built a 4 vCPU 16GB W2K3R2x64 Std SP2 system on AMD hosts. It works great. Then, to make sure I can really reproduce this issue, I built another identical VM on the original Intel host giving me the trouble. Of course: no issue now. So, apparantly it's something about this particular VM that causes the long reboot. But who knows what.

I'm going to play with cloning it and trying to move the VMDKs around to see if I can determine where the problem lies. Or, failing that, I may just abandon the troublesome VM and reinstall my app on this new working VM. Crazy.

@APlatt. Yes, vSphere is the new name for VI. It's there new "OS for the Cloud" paradigm. Virtual Center is gone, it's now vCenter, etc.

Reply
0 Kudos
robertl30
Contributor
Contributor

So I finally got this issue, well, not "fixed" but I have a work around. Following the guidance in VMware KB 1004901, I disabled Page Sharing by setting Mem.ShareScanGHz option to 0 on one of my ESX 3.5u3 hosts. I had two VMs exhibiting the long reboot issue on different servers in a 3 node cluster. At this point I can't swear to which one was where. One of the VMs was fixed once I migrated it to the Page Sharing Off host. The other VM continued to show the problem no matter what I did.

I eventually took some drastic action to get this working. On the theory that there was nothing actually wrong with the software (Windows Server) or any of my other 8 vDisks, I deleted the VM (leaving the VMDKs alone). I then created a new VM and attached the original VMDKs to it. Crossed my fingers and powered on the VM. Somewhat to my amazement, the server came up fine and could see the drives. I then clicked clicked Restart the Computer within Windows and the server correctly restarted and I had a login prompt in 2 minutes.

So, the workaround does work. But apparantly, in some cases, you really have to fight to get it to take. Here's hoping they fixed this in vSphere. I'm now faced with the quandry of removing one of my hosts from our production DRS/HA cluster or I'll need to turn Page Sharing off on all my hosts. Not sure what that will do for our capacity planning yet, but it can't be good.

BTW, I opened a case via HP for VMware support and it's been basically useless. The only advice I got from VMware was to try migrating the machine to a different host and see if it works any better.

A humble suggestion for VMware... since apparantly there are many cases of this issue reported, why not, I don't know, fix the problem? It's a bug. Hunt it down and kill it. I lost about 40 hours on this project because of this mess.

Reply
0 Kudos
robertl30
Contributor
Contributor

Good news. VMware came back with some good info on how to better work around the issue where we see large VMs (over 4GB RAM) that have very long restart times. At first they had us turn off Page Sharing at the ESX host. But it turns out we can disable this feature on a per-VM basis.

I added the Advanced VM setting "sched.mem.pshare.enable" option to False (Edit Settings, Options, Advanced, General, Configuration Parameters, Add Row). I then moved the VMs to production hosts that did not have the Page Sharing feature disabled. I restarted each server twice and the performance was normal.

VMware also provided guidance on how to determine the effectiveness of Page Sharing in general. The tool to use is in the ESX CLI and is called "esxtop". Press M to bring up the memory page and observe data on the PSHARE row.

VMware also is stating that they are still looking at engineering a fix for ESX 3.5, but that the problem is already fully resolved in vSphere 4.

Reply
0 Kudos
timUSMC
Contributor
Contributor

We just implemented a new environment with 3.5 Update 4 using DL580 G5s. We are experiencing the long boot times as well.

I was irritated to see the only fix was going to 4.0. We disabled the page sharing for now. Has any escalation been useful to try and get them to fix this in 3.5??

Reply
0 Kudos
robertl30
Contributor
Contributor

@tim

I tried. Best I could get was vSphere4, turn it off at the ESX server level, or (finally) a method for turning it off on a per-VM basis. (see above in this thread for details). My guess is it's not going to get patched in 3.5.

Reply
0 Kudos
timUSMC
Contributor
Contributor

Thanks for the tip and feedback. I just cant stop my implemetation project and upgrade to 4.0 since we are on a tight schedule but need to solve this problem. All of my VMs in this environment are 4GB so the per VM option wouldn't do much. This is not good. I feel pressured to move to 4.0 now and I am wondering if this might be VMWare's desire. Did anyone do a rush upgrade to 4.0 to fix this and ended up with more issues? I just havn't had the time to explore the upgrade path/risks.

Reply
0 Kudos
Zahni
Contributor
Contributor

I had some discussions with VM-Ware Support about this issue. This problem can't be fixed in 3.5.

I personally think this ahs been to do with the design of the VMKernel in 3.5 . he VMKernel in 3.5 is still 32-Bit code. Something goes wrong, when scanning memory in 64-Bit VM's.

In ESX 4.0, the VMKernel is now native 64-bit. This can't be changed in 3.5 Smiley Wink

Reply
0 Kudos
whinshaw
Enthusiast
Enthusiast

I am actually running vSphere and I had this problem happen to me. I created a server from template with 2 gigs of ram and it booted fine. After I changed the memory to 8 gig it became extremly slow to boot on the VM. Close to 10 minutes to boot the VM.

*Here is what I did to fix the issue:(works in vSphere and 3.5)

Since the server wasn't in production yet I shut it down and gave the server 2 gig (origional) and then applied it in vCenter. Then, I moved it back to 8 which is what the application needs. BAM, fixed. Reboots just as fast as the other servers in our production environment. I have seen this bug before in 3.5 and changing the memory always fixes it.

If you find this information useful, please award points for "correct"

or "helpful".

Wes Hinshaw

www.myvmland.com

If you find this information useful, please award points for "correct" or "helpful". Wes Hinshaw www.myvmland.com
Reply
0 Kudos
arminmacx
Contributor
Contributor

Hi Everyone,

I had a server with windows 2k3 and 1GB Ram and 2 Cpu each have 2 core and i converted it with vmware converter to new esxi 5 update 1.

At first it boot fast and work fine but after 2 days we had a problem wit our ups and power and my server goes down. after it backs up i started my vm, and it tooks too long to boot. after VM boot when my user try to log in to my active directory which is in the VM it took some time to logged in and after that some of user can see my map drive some not.

I try to see hidden drive to delete unwanted drive but there isn't any drive that gray out or not being used after convert. try and remove HP Software but still nothing.

some times i can login to server  via RDP some times not, Its smae for consol view from vsphere client..

VM system spec :

2 vCPU with 2 Core

2 GB of Ram

exact HDD from server that i converted

1 NIC E1000

I try every possible way to fix it but no luck.

Please tell me what should i do?

Best Regards

-Armin-

VCA-DCV / VCA-Cloud / VCA-WM / MCSE / MCP / MCTs https://ir.linkedin.com/in/armin-lavaee-73162652
Reply
0 Kudos