Re: Performance not brilliant - Bad disk config?? ...

Westy · ‎11-15-2007

Have been six months live with VI3/SAN installation. Total novice when started and had to rely on "consultants" advice but can hopefully now drop the "Total" bit myself and start to address some issues.

Performance is not what we thought it would be but I am sure this is down to bad disk config having read a lot of threads and VMWare stuff. Want to know really, how bad it is (if it is at all) and try to get a bit of advice to remedy.

We currently have:

HP EVA4000 SAN.

10 x 300GB FC configured as one disk group which consists of the following vmfs formatted luns presented to five ESX3 hosts:

"OS_1" - 250GB raid1 - All 22 VM's "C: system drives" are on here spread across the 5 hosts.

"OS_2" - 150GB raid1 which has been added as an extent to "OS_1" (on consultants advice early doors)

"DB" - 300GB raid5 for Exchange & SQL databases, and Domain Controller Active Directory partitions.

"Logs" - 150GB raid1 for Exchange & SQL Logs

The whole disk set is replicated to another EVA3000 SAN in our DR site.

Exchange & SQL performance have been the main problems. All data drives for the servers are on a seperate FATA Disk set configured as one big vmfs volume shared between the VM's and has been 'ok'ish.

Am I right in thinking we have a bad disk setup?

gorto · ‎11-15-2007

Could be this: "The whole disk set is replicated to another EVA3000 SAN in our DR site. "

The replication could be throttling back your front-end disk activity becasue it can't keep pace. Have a look at your SP logs to see if you have any clues to throttling.

Texiwill · ‎11-16-2007

Hello,

"OS_1" - 250GB raid1 - All 22 VM's "C: system drives" are on here spread across the 5 hosts.
"OS_2" - 150GB raid1 which has been added as an extent to "OS_1" (on consultants advice early doors)

Never been a big fan of extents, if possible I would combine these into one 400GB LUN. Extents require locking of all LUNs involved during a metadata update. This would not affect running systems but if you used vMotion, deployed a VM, did backups, etc.

"DB" - 300GB raid5 for Exchange & SQL databases, and Domain Controller Active Directory partitions.

Well this could be a problem. Exchange and SQL on the Same LUN? These are write intensive items generally, at the very least I would split them off to two different LUNs, I would also consider using minimum of 100GB RDM for Exchange as well as SQL. However that is your choice. Generally you want to keep high volume write items on separate VMDKs/RDMs.

"Logs" - 150GB raid1 for Exchange & SQL Logs

Generally I do not like this split of disks, it is extremely old school and can increase the amount of SCSI Reservations when you do actions on a LUN. If it was me I would rather use RDMs for Exhchange and SQL data if they are > 150GBs the minimum for an RDM is really up to you, but that is my general cut-off. I would also keep the VMDKs for every VM together. So OS and Log VMDKs would reside together. This is to reduce SCSI Reservation issues. If necessary those with high writes would go to another LUN.

Best regards,

Edward L. Haletky, author of the forthcoming 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', publishing January 2008, (c) 2008 Pearson Education. Available on Rough Cuts at http://safari.informit.com/9780132302074

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

christianZ · ‎11-16-2007

Have you estimate the performance in your vm (perfmon - disk queue length, response times, etc.)?

It will be interesting to know the values.

My guess is that the 10 disks (incl. 1 hot spare?) are not enough.

spex · ‎11-18-2007

I'm sure you have not enough disks in your setup.

You can easily move db and log of one of your problematic sql servers to a dedicated lun (best would be rdm, but not necessary) to see what you can expect.

Also synchronous replication could be an issue. Since you have to wait until your second site commits the write.

Regards Spex

FredPeterson · ‎11-18-2007

Why are you using RAID mirrors in a SAN? That seemingly defeats the purpose of a SAN in my opinion - which is high speed redundant remote disks. That high speed comes from data spread across spindles. A RAID 1 requires the data to be written to both drives and the SCSI bus waits for that acknowledgement before returning the disk to the OS.

A consultant said to use an extent? heh.

Just because of the methods you might use to setup a physical server (raid 1 for OS, 5 for data etc) are technically available with SAN, doesn't mean its still a good idea.

Westy · ‎11-19-2007

Perfmon indications suggest "DB" lun was really struggling, other luns did not appear to be anywhere near as bad. Have moved SQL to dedicated LUN's and good improvement already. Am going to add extra disks as people have suggested and move Exchange to dedicated rdm luns. Is there a methodology for measuring how many disks/spindles you need? Am going to eventually try to get rid of extent as well and group each VM's vmdk's together.

"Also synchronous replication could be an issue. Since you have to wait until your second site commits the write."

Running synsnchronous replication due to new and old SAN not being capable of asynchronous together although I would have assumed it would with bothe being HP eva's. Turned off and exchange response much better which makes sense. Have got the OK to upgrade old san post xmas so should eventually get everything ok.

Thanks for all the advice. Much appreciated.

Texiwill · ‎11-19-2007

Hello,

Estimating disk/spindles is always a difficult proposition, if you are using RDMs then apply what you would normally apply for your general server, as for a VMFS, the more the better. But splitting across LUNs also improves performance. Since you are using EVAs? Are you going to upgrade to CA? CA is a real issue with EVAs or has been in the past. I would contact HP Storage support and discuss possibilities with them. They may have a solution to keep it from impacting ESX.

Best regards,

Edward L. Haletky, author of the forthcoming 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', publishing January 2008, (c) 2008 Pearson Education. Available on Rough Cuts at http://safari.informit.com/9780132302074

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

BUGCHK · ‎11-21-2007

> "Also synchronous replication could be an issue.

> Since you have to wait until your second site commits the write."

That is not how it was explained to me by HP:

- host sends data to source EVA

- source EVA stores data in writeback cache and sends a copy to destination EVA

- destination EVA stores data in writeback cache and sends confirmation to source EVA

- source EVA receives destination confirmation and sends confirmation to host

- both EVAs will decide when to flush the data to the disk drives

gorto · ‎11-21-2007

Its still a 2-phase commit whatever HP call it.

Prove it by removing the replication on say, 1 LUN - performance will lift on that LUN

The Laws of Physics still remain ......

Westy · ‎11-21-2007

Thanks for all the advice. Have marked a couple of the answers as helpful but will close the question and get to work resolving my issues.

All

Performance not brilliant - Bad disk config?? Advice!