Hi Everyone,
I have a very very weird issue on 2x HP DL380 G5 boxes with ESX 3.5i U3 loaded on each.
The setup:
DL380 G5
2x Quad core
2x 72GB Raid 1+0
6x 146GB Raid5
32 GB RAM
I installed esx 3.5i u3 as per normal, created vm's. But when I copy data from a network folder to the vm the cpu spikes and the vm is unresponsive. What makes it worse is if I copy a large file from the c: to another folder on c: the vm is unresponsive - cpu high and ping replies at about 200ms. the OS vmdk's are all on the 72GB Raid 1+0 datastore. When I add a data drive from the raid5 datastore the same happens. If I copy from the C to that drive on the raid 5 datastore the VM gets unresponsive and pings hover at about 200ms.
Any ideas?
I have no clue why this is happening, Ive deployed the same setup elsewhere with a bunch of vm's and no problems.
I don't think its network related since you said you have the same problem copying from c: to c: on the same machine.
It looks like the machine is waiting for I/O to complete. Check if you can see if I/O is stalled somewhere
Are there any messages in the Windows system log indicating disk problems?
Check your RAID controller. Is it on the HCL?
Is your write cache configured properly?
-Arnim van Lieshout
-
If you find this information useful, please award points for "correct" or "helpful".
Vmware tools installed?
I would try to copy one vmdk file inside of VI client - datastore browser (vm powered off) - what speed on disk can you see?
What is your vSwitch configuration? More to the point, does it match your network switch configuration? I'd also check for IP address conflicts (possibly hidden) since it's a new installation...
--Collin C. MacMillan
SOLORI - Solution Oriented, LLC
Hi Collin,
vSwitch config is simple - management network with a pNic connected and a VM network with a pNic connected. What I find strange is that even if i just copy from within the VM from one folder to another the vm starts being unresponsive and the cpu spikes.
I copied 2gb of files from one datastore to another and the time was about 3 minutes.
Hi g1xx3rb0y,
Welcome to the forums. What VMs (Linux/Windows) are these? And what is the virtual scsi controller type (LSILogic/BusLogic) in the VM?
--sanjana
Hi Sanjana,
The vm's are all windows 2003 R2 Sp2. Ive got a x64 and x86 vm's running. Im currently copying 3 GB of data within one of the x64 vm's to another disk - different datastore - CPU is at 100%.
controller - LSI logic
Are other vm's on this host exhibiting similar behavior? I would reinstall the vmware tools.
-KjB
VMware vExpert 2009
2 different esx hosts - same result on all the vm's
Which NIC type are you using? Flexible/Enhanced? Did you install the vmware tools during creation/migration or after the vm was up and running? I would reinstall the tools before troubleshooting something else. Which process is pegging your CPU at 100%?
-KjB
VMware vExpert 2009
Nic Type on x64 vm - e1000 - on x86 - flexible
Ive reinstalled the vmware tools on the vm with same result
processes hogging the cpu: SYSTEM and explorer.exe
the vms are vanilla - no AV nothing, just latest patches.
Just some more troubleshooting- How much cpu/RAM are these VMs configured with? And do they have any reservations set?
Hi Sanjana,
I have no issue with troubleshooting...i am pulling at strings here...havent seen this yet, and I've deployed a couple of VM solutions...
anycase,
VM's configured as follow:
starting off with single vcpu
I assigned my x64 server with 4096mb ram (will probably go upto 6GB) - Exchange 2007 mbx server
I havent set any reservations whatsoever - default settings after vm creation
the host is running 4 VM's now - the other 3 VM's all running 1vcpu with 512mb ram
I'd use the enhanced vmxnet driver instead of the e1000. It's a more tuned driver that may help in overall performance as well.
-KjB
VMware vExpert 2009
Hi KjB,
Ive set the network adapter type to enhanced...still the same problem. i have also in the meantime, reset the other esx server bios to defaults, just enabled vt, took a long shot. Tested the one x86 vm and looks much better...cpu hovering at about 13%. unfortunately I cannot do the same for the other host as the ILO isnt plugged wroking remotely, but I am busy building another vm on the host now to see if the bios defaults could have fixed the issue. I will then compare all the settings from the one with the other and see which setting is different...i am holding thumbs now!!
spoke too soon...seems to be intermittent now.
What's going on in /var/log/messages while this is happening? Sounds like unsupported/uninitialized hardware blocking an I/O process. Try the same with a totally internal vSwitch for comparison (not married to pNIC)...
--Collin C. MacMillan
SOLORI - Solution Oriented, LLC
I don't think its network related since you said you have the same problem copying from c: to c: on the same machine.
It looks like the machine is waiting for I/O to complete. Check if you can see if I/O is stalled somewhere
Are there any messages in the Windows system log indicating disk problems?
Check your RAID controller. Is it on the HCL?
Is your write cache configured properly?
-Arnim van Lieshout
-
If you find this information useful, please award points for "correct" or "helpful".
Hi guys,
We've tested the vswitch theory, we have completely removed the vnic from one of the vm's, only used console to manage. the same problem when copying from c to c. We have however noticed that the messages in the vmkernel is showing warnings about the CPU's, i have attached a screenshot. The cpu's are 2x Intel Xeon Quad core e5440. I know they are supported as I've deployed esx 3.5i on ml350 g5 which runs these models.
The raid controller is LSI SAS 1078-C2 - as far as I could see it is supported.We are going to try a firmware update for the controller.
Hi Everyone,
I "think" the issue has been resolved - we changed everything on a server - CPU, memory, raid controller, but not disks.....we still had the same issue - we also saw that the raid controllers do not have that black battery attached on both servers. So we update the BIOS revision and attached some spare batteries on both servers....we havent had an issue yet.....copies are fine...cpu does spike every now and then, but not to 100%, hovers between 13% and 30% on very large copies - 5GB>, and the vms arent unresponsive anymore.
personally it doesn't make alot of sense.....i am still monitoring the situation.
will provide feedback later.