we had a fine working ESX 4.0 U1 farm with 4 servers (IBM HS21XM / SAN-Boot and Datastores on IBM DS4700). The farm works in EVC Mode "Intel Xeon Core 2". Now I installed the next Server into our farm (IBM HS22v 2x X5670 CPU / 48GB RAM). At the same time I upgraded the farm to ESX 4.0 U2.
Since this time I have problems with starting an VMs at our new server. One VM (XP / 1vCPU / 2 GB RAM) needs between 3-5 Minutes to start. The server has no Workload. It's his 1st started VM. The same VM needs 1 minute at an old server with good Workload. When I move the VM, with vMotion, after poweron from the new to any older server the the start speeds up.
Have anybody an idea ?
Run esxtop on the host where you start it up and look for high ready times on cpu, disk latencies for storage, or memory limits. Or check these values on the performance tab in vCenter - I prefer esxtop - it samples 4x as often by default and you can adjust the sample rate. Check vm properties in vClient as well as .vmx to make sure you do not have any limits set for this vm.
Are these hosts all part of the same cluster? Do you use resource pools?
Have you checked the logs on your host (var/log/... vmkernel, vmkwarning, as well as the vmware.log in the vm directory) ?
What about the VM's swap file? If the creation of that is slow, then the VM may take a long time to boot as well. Check to see where that is stored.
Compare all of this VM's settings to another VM that is not experiencing the problem.
Does this happen with only 1 VM, a specific group of VM's, or all of them? Does the slow boot vary across different VM types (cpus, memory, storage location, OS)?
How many disks does this VM have? To power on the VM, the host has to write a lock entry on each lun the VM has disks on. Normally this takes a (1-2) MS, but if there is contention for lun meta updates (VMotions, VM power on, VM creation, snapshots growing) then this could delay the boot as well.
What is the VM hardware version? VMTools up to date? Does the HAL match # of CPU's?
thank you for your fast answer.
I found the problem :-). In /var/log/vmkernel I found very many lines with following error message:
Jun 30 16:12:03 server05 vmkernel: 0:02:25:05.848 cpu14:4110)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41000805d040)
to NMP device "naa.600a0b8000320a84000010d24b17394e" failed on physical path "vmhba1:C0:T1:L11" H:0x2 D:0x0 P:0x0 Possible s
ense data: 0x0 0x0 0x0.
Then I checked out my SAN configuration in VMWare and found that these server goes an other path to the LUN. After changing the path all works fine.