I've been arguing with myself over how to configure the new LUNs for the new disks we purchased for the SAN. Any advice would be helpful!
Our initial deployment has an existing LUN setup with 5 300GB disks, and 1 hot spare. There are currently 2 servers connected (HP DL380 G4) to this LUN.
Our upgrade is bringing online 2 new IBM x3650 servers, and 8 new drives for the MSA30.
I seem to remember there are performance issues when using the MSA1500 with fewer LUNs being accessed with more servers, but maybe we don't have enough servers to worry about that yet. Our resource requirements are fairly low, the existing servers run everything we have now very well and we only need the new servers for new projects.
It seems like my 2 choices are:
Keep existing 5 disk LUN
Keep 1 hot spare
Create new 8 disk LUN
Kepe existing 5 disk LUN
Keep hot spare
Create two 4 disk LUNs
You're right.. most of our servers are Win2k3 VMs, about 8GB each to start. Our two file servers are by far the largest, about 200-300gb each. I was planning on splitting them up on different LUNs once I get the new LUNs setup.
Definitely split the LUNs -- is your MSA active/active or active/passive?
Keep in mind that ESX only uses a single path per LUN for I/O and will use alternate paths for failover (no load balancing per se).
Awhile back I recall a post indicating that a max of ESX 4 hosts was about the limit of an MSA (it may have been the 1000 rather than the 1500, but they're pretty close) before performance started to take a dive.
Active/Passive currently. The possible performance gains of active/active are exciting, but we've been so stable so far I don't know if I want to risk changing it unless there is a compelling reason now that we will have 4 servers total.
I was just reading about block sizes. I believe the existing VMFS volume was created at a 2MB block size because one of our file servers is slightly larger than 256GB. Is there anything to worry about using a block size larger than the 1MB default?
I've seen a few hints about performance decreases above 3 hosts connected to a LUN but it's just hearsay from 2005. (http://www.vmware.com/community/message.jspa?messageID=300221) Newer firmware could have fixed that already. What utilities and statistics do I look at to diagnose storage performance issues?
I'm with you: A/P for an MSA is stable and good.
I seem to recall some issues with A/A and LUNs disappearing from ESX 3.x hosts (I believe this was corrected by an ESX patch somewhere along the line)
As for block sizes, the only time I've witnessed anything like an issue was the VMFS-2 to VMFS-3 upgrade when the 16MB block size went unsupported and an in-place upgrade of that filesystem was not possible.
Performance issues should be visible by checking out esxtop in the SC, or using the Disk section of the Performance charts in VIC.
While there are some differences, a lot of the ESX 2.x concepts still apply: http://www.vmware.com/pdf/esx2_using_esxtop.pdf
Looking for disk timeouts in the logs would be another way to look for potential issues. I have a customer who boots 8 blade servers from an MSA 1000 (I didn't build it). They can only boot two of the machines at a time or the boot disk times out. Once the two have booted, though, they can bring two more online... and so on.
I believe the issues you will see would present themselves in a situation in which you were powering on many VMs at once.
After poking around the Array Configuration Utility, it seems that the MSA1500 has no concept of global hot spares. I can only assign spares to logical volumes, meaning that each LUN would need a spare for itself. That seems like too many 'wasted' disks to me! I will poke around the CLI configuration documentation and see if it's possible to configure a global hot spare there.
I don't have much experience with the ACU and am not familiar with the latest MSA firmware, but on the MSA1000 it was possible to assign a single spare drive to multiple logical disks residing in different disk drive arrays.
CLI> show disks
Disk List: (box,bay) (bus,ID) Size Units
Disk101 (1,01) (0,00) 36.4GB 0, 1
Disk102 (1,02) (0,01) 36.4GB 3
Disk103 (1,03) (0,02) 36.4GB 5
Disk108 (1,08) (1,00) 36.4GB 2, 9
Disk109 (1,09) (1,01) 36.4GB 4
Disk110 (1,10) (1,02) 36.4GB 6, 7
Disk113 (1,13) (1,05) 36.4GB 8
Disk201 (2,01) (2,00) 36.4GB 0, 1
Disk202 (2,02) (2,01) 36.4GB 3
Disk203 (2,03) (2,02) 36.4GB 5
Disk208 (2,08) (3,00) 36.4GB 2, 9
Disk209 (2,09) (3,01) 36.4GB 4
Disk210 (2,10) (3,02) 36.4GB 6, 7
Disk213 (2,13) (3,05) 36.4GB 8
Disk214 (2,14) (3,08) 36.4GB 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 (spare)
You are right! It seems a bit backwards to me, but you have to select each volume separately and manually set a drive to be your hot spare. It allows you to use the same drive multiple times. Perfect.
Thanks for all the advice everyone, I implemented our storage upgrade today and will be moving forward with our new servers tomorrow. I hope I remember how to configure the FC switch!
Just an FYI on MSA 1500 performance.
We currently have two fully loaded MSA 1500's...4 shelves and the whole bit. They run Active/Passive. Each has 3 ESX servers attached. They are currently running about 35 VM guests/MSA. Workloads range from small web servers, to SQL servers to our 3 big file servers....each with 5 300GB RDMs a piece.
Anyway, at one point in time we started getting performance issues with one of the file servers that generated many user complaints. We were able to monitor some of the high I/O vms but nothing stood out as the big culpret. We found the best way to troubleshoot the issue was to use the cli on the MSA to measure the stats. Doing a "show perf" showed us that we had maxed out the MSA. It was using 99% of the CPU. We put in a call to HP and they looked over our logs and said that we didn't have a config issue. I asked about the performance gains from switching to Active/Active and they said we wouldn't see any.
So, after further monitoring, we moved the one suspect heavy I/O load VM "crappy Ingres database" off to our VI3 farm with faster storage and it releaved our MSA CPU by 20%. Later on, we also decided to move our terminal server profile share off our one file server on the MSA to its own physical box. That releaved the cpu by another 10%.
1. 3-4 ESX hosts per MSA will be plenty. The MSA will tire before your ESX servers.
2. MSA will max out before one path from ESX host to MSA will...if using 3-4 ESX hosts.
3. For some reason file shares with absurd amounts of files (ts/nt profiles...1 million+files) aren't the best for esx/msa.
our esx hosts are DL385G1 2 procs 16G RAM, dual 2G qlogic cards.
Hope this helps