VMware Cloud Community
tarpit
Contributor
Contributor

Bad Performance with ESX 3.5 on HP Proliant DL380 G5 connected via Fiber Channel to HP 2012FC SAN

We have recently setup a complete setup of vmware ESX 3.5 VI.

The performance of overall has been bad. Clone and migration operations slow vm's to the point of unusability.

Hardware Configuration:

HP MSA 2012fc single controller connected directly to ESX server.

5x 1 tb volumes created on a single Raid 6 Arrary

Most of the VM''s run from two of the volumes.

Vmware ESX 3.5 Update 2 on:

HP Proliant DL380 G5 2x Xeon E54404x core, 32 gb of ram.

FC Card:

Emulex Lpe1150 4gb

All the vm's are stored on the HP MSA 2000 San.

Esxtop report less than 20MBRead/Written. Disk latency is over 1000 milliseconds when running a copy operation with a few running vm's.

CPU and ram seem to be available.

Should I Increase the maximum queue depth the FC HBA adapter?

The topolgy is set to loop on the SAN, should this be set to Point-to-Point?

0 Kudos
6 Replies
chrisfmss
Enthusiast
Enthusiast

I think your problem is your disk and raid. 5 disks dont give you many IOPS, and raid 6 is slow. You can add another disk and use raid 10. It is maybe your SATA drive.

0 Kudos
chrisfmss
Enthusiast
Enthusiast

To give you an example for the raid. We have a MSA 1500, my Domino server use a lun of 600 GB and I use raid 6. The backup time was 4-5 hours. Now i changed to raid 10 and the backup was 2 - 2.5 hours.

0 Kudos
khughes
Virtuoso
Virtuoso

The RAID setup could be a definate issue. I would also check your SAN to make sure you're getting all your write cache going. When we did our conversion we had redundant paths for the RDM to physical servers in our previous setup, but when you had that type of setup the write cache was disabled. I would definately check that out and make sure.

  • Kyle

-- Kyle "RParker wrote: I guess I was wrong, everything CAN be virtualized "
0 Kudos
neil_murphy
Enthusiast
Enthusiast

RAID6 is generally considered a very bad choice. Although it allows up to 2 drives to fail in the RAID set, it requires more parity information to be created leading to much lower I/O than RAID 5. You can achieve the same level of redundancy using a RAID5 set plus an online spare.

0 Kudos
zman442
Contributor
Contributor

I would normally agree with that statement however the high failure rates with SAS and SATA drives need to be considered when planning a production environment and the pros and cons need to be considered. In a RAID6 array you can have two complete drive failures and still have a functional albeit slow array. If you have a RAID5 + hot spare configuration and experience a drive failure and then a second drive fails while the hot spare is rebuilding you are out of luck and just lost the entire array. We have seen it happen several times with SCSI arrays on G4 and early G5 HP servers which is why we only use RAID6 for production critical virtual environments. Unless you are running more than 30-40 VMs off of a single LUN with high transactional situations you probably won't see a big difference in performance. Degraded performance is better than downtime in my experience.

One of the main problems with these systems is that the RAID ASICs in the controllers are very poor performing when compared to other RAID ASICs. The HP SAS/SATA controller from HP is one of the slowest performing ASICs available and only supports 1.5Gb/s throughtput for SATA drives in PCIe form in the G5 servers. I cannot imagine HP went with anything faster in the MSA series controllers and if they did it doesn't show in the frequent complaints about this model. On the MSA 1000/1500 series there is a known performance issue with creating LUNs with more than a certain number of disks but I was under the impression this was resolved in the MSA2xxx series.

I can say this though, we have an older MSA1500 SAN running in production which is only 2Gbit on a single controller with multiple LUNs and have acceptable performance with Vmotion, HA and storage migration in our small production environment however restoring snapshots is a bit on the slow side and feels on par with my VMware workstation performance which is a bit disappointing so anything faster than what we have now will be welcome.

0 Kudos
NTShad0w
Enthusiast
Enthusiast

hello tarpit,

i had implement quite same confiuration 2 monts ago, heeh looks quite like mirror conf Smiley Happy)

i had:

3x HP DL380 G5, 2x QC 2,8GHz CPU, 32GB RAM, 4x local SAS 146Gb 10k hdd in RAID10 (it just used now for ESX boot only, plans change)

1x HP MSA 2012fc DualControler 2x 1GB cache read/write + enclosure + 18x SAS 15k hdd in RAID10 (16x hdd in raid +2 hotspare) with 64kb stripe and controler optimization for standard

3x Emulex HBA 4Gbps on ePCI (as yours)

1x SAN FC Switch Brocade 4Gbps with 8x ports licensed

I had test it over 1 month... in different configurations, firmwares (right now array has newes accesible in 2008.10.28, i dont remember which one), i had test RAID5, RAID 50, RAID10, on 8xhdd, 11x hdd and 16x hdd, I tested different LUN size, etc

in my (and others) opinion best choice for performance is to use RAID10 on one volume on 16x hdd (if U need maximize performance for middle simultanously array usage by VMs), problem is that witch one RAID volume cant be accesible on dual controlers in the same time (it is unsupported by HP MSA 2012fc), if U want to use a lot of VMs with less IOPS per machine but much overall IOPS use 2x RAID10 (8x hdd per raid volume), it then use 2 controlers, but performance in one virtual mashine will be lower than on one big RAID10

In one big RAID5 on 16x hdd stream performance is even better (+30%) tahn on big RAID10, bot lower on reallife usage (-40%)

I didnt test RAID6, but I have knowledge it is SLOW, generally write speed is 2x slower than in RAID5, so it is very IOPS expensive to make 1 write even if U made 4 read for 1 write.

examples of my configuration IOPS test performance (RAID10 on 16x SAS 300GB drives on MSA 2012fc):

32k stream IO - up to 190MB/s

8k reallife (50/50 stream/random) IO - up to 20MB/s

max IO per controler ever seen - up to 6000 IOPS per controler, dont remember on which IO size

example in RAID 5 on 16x HDD SAS 15k it was:

32k stream IO - up to 270MB/s

8k reallife (50/50 stream/random) IO - up to 12MB/s

max IO per controler ever seen - up to 6000 IOPS per controler, es I remember same as on RAID10

I dont remember my RAID50 performance :((

IN YOUR CONFIGURATION THERE ARE (MOST PROBABLY) 3 PROBLEMS:

- slow SATA drives (in RAID5 for example in my opinion they are up to 3x slower than 15k SAS drives, on RAID6... hmm probably 3-4x slower, on RAID10 only 2x slower!!)

- very low number of drives (for RAID6 it is almost terrible low)!!

- VSLOW RAID type, as other say, RAID6 is VERY SLOW sollution, I may recommend to use it with minimal of 12x HDD to good performance and only on 15k drives

so my recomendation for U, there are some options:

- change to RAID10, 2-3x performance inprovement

- add 6x same HDD and try to create 11x HDD RAID6 +1 hotspare, but it will be still slow in my opinion, about 60-75% of performance improvement

- add 5x same hdd and use 10x hdd in RAID10+1 hotspare, it will increase your performance up to 3-4x times

- just change to RAID5+1 hotspare, but as other say it is not vsafe sollution but will be for free and probably 30-40% of performance improvement

- add 6x same hdd and construct RAID50 on 10x HDD + 2x hotspare, hmm, it is config with quite good performance, probably up to 2x of your current conf, but it is still not vsafe

kind regards

NTSMHunter

VM Sollutions Architect

0 Kudos