I have an ESXi 5.1.0 new install that worked great for the first couple of days. It is not yet in any kind production situation, as I am still just testing various configurations, in order to determine how we could best implement virtualization into our production environment. I am new to VMware, and am just starting out with single mode ESXi for my testing, but am planning to move toward the full clustered setup to get HA once I get to that stage. At this point, I am trying to learn how VMware even works...
All was well as it had 16GB of memory when I started, and I powered it down to add memory up to the 32 GB limit. I think I may have accidently forgotten to enter it into 'Maintenance Mode' before I told it to shutdown (I did not have any VMs powered up at the time), but at any rate, that is when all the problems began.
After upping it to 32GB and powering back on, it took forever to boot (close to 1/2 hour), and when Vsphere client finally became able to login, every one of my VMs powered-on and boot very slow (for example in Windows you will see the splash-screen for 1/2 hour before it even loads the desktop, screen drawing is even very slow like when the startup menu shows saying 'Start Windows Normally' it draws line by line, that slow).
I cannot fathom what caused this. I even tried removing the newly installed memory so I was back down to 16GB, and did not resolve issue. I have NOT implemented my iSCSI yet, so all those other VMware KBs talking about slow boot with iSCSI seem like they don't apply. In my initial testing, I am just trying to get it working on local storage, and once all is well will connect my iSCSI storage SAN and eventually the datastores will be stored and running from there.
I am also seeing this issue in the logs (in high occurrence) of:
4972: no FS driver claimed device 'control' : Not Supported
692: Couldn't read volume header from control : Not Supported
692: Couldn't read volume header from control : Not Supported
4972: no FS driver claimed device 'mpx.vmhba32:C0:T0:L0' : Not Supported
4972: no FS driver claimed device 'mpx.vmhba33:C0:T0:L0' : Not Supported
4972: no FS driver claimed device 'mpx.vmhba34:C0:T0:L0' : Not Supported
4972: no FS driver claimed device 'mpx.vmhba35:C0:T0:L0' : Not Supported
4972: no FS driver claimed device 'mpx.vmhba0:C0:T0:L0' : Not Supported
FYI, vmhba32, vmhba33, vmhba34, vmhba35 are virtual USB adapters being presented by my Intel RMM (remote management module) which does not allow user ability to remove device from there, and vmhba0 is a physical IDE CDROM.
And all of these devices I have tried physically removing (removed the RMM and the CDROM IDE disk) for testing, and this does not help the slow ESXi booting, nor does it help the slow VM running. It does remove the some of the error messages in the ESXi log, but not the: "from 'control' : Not Supported" error shown above.
The server I am using for testing with is an SR1560SF (dual xeon quad-core 3 gHz, and 32 GB mem) which had been listed previously as a VMware recipe on ESX 4.x, and as I said I seemed to have had it working just fine on 5.1 for the first couple days with very nice results.
I have also blown away and re-formatted my datastore disk which had my test VMs on it, and re-copied them all over again, and they still run slow. I then re-installed ESXi 5.1.0 fresh (did not reformat main hard disk but did select clean install), and to my surprise this behavior did not change at all. ESXi still booting slow, and VMs operate painfully slow.
So my questions are as follows:
-My installed memory shows 32761.96 in Vsphere and it does boot and run, so it would be safe to say this issue is not related to possibly slightly going over 32GB of memory, correct? And is there anything in ESXi in which a memory size change can mess anything up?
-The above errors show 'Not Supported' so is there any way I can either find the appropriate drivers to slipstream into the installation media, or else does ESXi have a 'Device Manager' where I can disable those devices so they are not seen or used by ESXi? I do see all of them listed under 'Storage Adapters' in Vsphere client...
-Do I really even need to be concerned by those 'Not Supported' messages in the first place? Those error messages, along with ones related to those vmhba's showing nmb_ThrottleLogForDevice:2319 are polluting up the logs in high frequency...
-Is there a way to troubleshoot and fix this, or should I just completely format the drive where ESXi is installed and start completely over? I see there are like several partitions on that disk after ESXi installed initially. Will blowing that all away make it forget everything, even though doing a fresh install without formatting didn't seem to fix this?
I'm really excited about Vmware product to virtualize all of our equipment, and was initially very excited about it when I first set this all up with a few test VMs, but after trying to upgrade the memory and thinking I may have not entered into 'Maintenance Mode' properly before shutdown, kinda has me a bit concerned as to why this environment is so FRAGILE, and am a bit concerned to try and deploy anything like this into my production world, because it is obvious that it would not be able to function right now as it is sitting at present time, and if this were to have happened in a production world, the results would be catastrophic.
And 4 days later into testing, I still don't know what caused this for sure, and am still not able to recover from this issue. I don't know if the corruption is with the installation where ESXi sits (on one of those 5 partitions), or if it perhaps is with a mounted datastore, or corruption on a VM or what. Even if I just try to run a single VM at a time, randomly, they all just boot so slow that you can actually see the lines drawing from left to right on their respective consoles. One day one, before this problem I was pretty impressed with how well the VMs were operating.
Since this is a test environment, there is really no load on the ESXi host, and it has 8 CPU cores clocking 3 gHz, and registers all 32GB of RAM, so why does it act like it is running on a slow platform, affecting both ESX host and all VMs?
Anybody have any insight on how I can troubleshoot this? Any help or knowledge for a newbie would be greatly appreciated.
if you have reinstalled it already then it sounds like a hardware problem to me, how new is the server you are testing it on. You say that its not on the HCL granted that does not mean it will not work but you wont be able to use that hardware platform in production if you need any kind of support from vmware or external companies, theyll just say its not supported. if this really is a test box can you blow away esxi and install windows directly on it and see if you get any errors or perf issues. might be a good way to see if its hardware or not.
Thanks for the insight. This is a brand new server too by the way...
So I found the cause of all this, but it is still not totally clear why I am seeing this behavior.
For some reason, this hardware platform with ESXi does not like the 2GB DIMMs.
I originally had 1GB x16 DIMMs when all was working fine. I switched them all to 2GB x16 and now was registering 32GB in system BIOS and ESXi all fine, but that's when the problems came.
As a previous test, I took out half those new DIMMs so I had 2GB x8 installed (16GB again) but problems still persisted.
So as per Dales123 suggestion here I also tried installing a Windows OS, and it worked fine as well using the full 32GB installed. This is what puzzled me most. No ECC errors or anything.
So as a last ditch effort, last night, I took out all of the memory and put the 1GB sticks back in (16GB total) and problems are resolved.
Also note that both the 1GB and 2GB sticks are all the same brand (Hynix), and type (PC2-5300f), etc. This motherboard is supposed to support up to 128GB of memory, and ESXi-free is supposed to support up to 32GB so still not sure why it did not work properly. I think for the moment I will continue my testing temporarily with 16GB memory, but I may try some other brand or type memory to rule out possible bad memory (still not sure why Windows worked fine though with that same 32GB).
At any rate, on with my testing...
Thanks again for your suggestions..