I'm the lucky admin of a small VI3 cluster of 4 nodes based on HP BL25 and DL385. They each have 2 dual-core AMD opteron, 16Gb memory balanced among the nodes. The BL25 use SW iSCSI and the DL385 use HW iscsi. They are all connected to 2 Equallogic (that is seen as one device). All vm's are windows and the majority is Windows 2003. Each host is running about 15-20 vm's (most lightweight workload)
My problem is that performance and response is sluggish (in the lack of a better term). Each host is using about 50% of physical memory, cpu's are about 20-30% loaded, nic and storage is not 100% loaded (based on recieve/transmit and queues in esxtop). But for example SQL server vm performs badly (2 databases, 2Gb memory, 1 cpu) and WSS (on another vm seems to have been swapped out of memory). I've tested storageperformance with iometer on iSCSI and FC (HP EVA 5000) for the SQL and it didn't make any difference.
If anyone have any input on how to increase performance, I'm all ears.
CPU is the most likely cause, since you are using dual-core. SQL also isn't a good candidate for virtualization, so that may not be a good test.
Putting 20 VM's on a dual core host is overloading a bit, that's 5:1 ratio, dual core is usually around 2 or 3 to 1, that's probably why the VM's seem sluggish.
If the cpu's are the bottleneck, shouldn't they be closer to a 100% load instead of being at 20-30%?
I know that SQL Server isn't the best candidate for virtualization, but with the AMD quadcore in the horizont with NPT and VI3.5 with large memory page support, it seems within reach for our size (250 employees and lightly loaded SQL servers). Memory is the worst enemy so far when it comes to SQL, never maxed a SQL on cpu.
I agree that 20 vm's per host might be pushing it, but half of those are for our test environment spending most of their existence in idle with 1-3 users. We are currently waiting for AMD barcelona to expand our VI3 hardware
Well the CPU's manage not only the VM's but the host. So maybe the VM;s are only at 20-30% load, but the overall system has to manage 20 VM's, and with only 4 cores, that's a LOT of overhead.
They are constantly switching back and fourth to give the VM's a time slice, and that's where you are seeing "sluggish" performance. ESXTOP from the service console may be a good place to start. If you have high RDY% times that could be an indication that the VM's are "waiting" to be serviced by the CPU.
Sluggish is definately a sign of CPU dragging because they can't give equal time to every VM, due to the fact that there just isn't enough processing power to go around.
If these are low priority VM's, I suggest you shut as many of them off as you can, then test to see if the remaining VM's are sluggish, if it clears up the aparent "sluggish" performance, then you have your answer.
You say your esx server are 50% ram loaded. With your 15 to 20 vms per esx with 16GB ram your vms are average sized at about 400MB.
Could it be, that your vms are equipped with to less memory and doing a lot of swapping within the os?
I'd go for Memory rather than CPU.
We have 4 x dual core DL385s running around 70 virtual guest servers. But we have around 32GB of memory per host.
If an application is being swapped out of memory it would indicate the amount of memory for the guest is too low.
You can check the Page Read IO figures to see if pages are being swapped into memory.
We use an HP FC based SAN. We had some issues with SAN performance. A number of issues were resolved when we upgraded the EVA8000 to XCS 6.110.
Sorry, awarded the wrong guy with a "correct answer" credit (stupid of me, checking stuff like this after yesterday). I'm going to check the paging on the 7th and return with my findings. (where I'm also going to upgrade all hosts to 3.5)