After being burned horribly and now in recovery mode from using an HP XP 12000 array with ESX 3.5, we are migrating a portion of our environment to a new location utilizing Hp's EVA 8100 series frame. I'm looking for anyone else out there using higher end HP hardware (DL580G4/5 or DL585G2+) with an HP EVA 8100.
We've suffered through various issues surrounding SCSI reservation conflicts on the XP, and one of the recommendations from VMW was to limit the number of hosts that can access a LUN to four. For management purposes, we restricted our clusters to 4 hosts, thereby reaching that goal. They also recommend that we limit the # of vmdk's per LUN, we achieved this by keeping avg LUN sizes small, and simply presenting more LUN's.
Anyone else running ESX on the EVA 8100, how big are your clusters? how many lun's have you presented? any SCSI reservation or queue depth issues?
We're running a few HP c7000 blade enclosures with BL685c blades connected to 2 HP EVA 8100s. We have 4 clusters - 2 with host and 2 with 5 hosts. The largest cluster has 6 LUNs presented to it. For better or worse, we use 1TB LUNs.
We're still pretty lightly loaded, but no reservation or queue depth issues so far.
The SCSI reservation conflicts are caused by the Fibre Channel agent (cmahost) that is part of "HP Insight Manager for ESX".
Disable the Fibre Channel agent (cmahost) and test this configuration for 24 Hours.
Test the Version 8.0.0.a of the HP Agents
We are not using the HP storage or management agents in our environment yet for that reason. We've heard too many horror stories, we'd like to see the agents mature a bit more.
Our reservation conflict issues are related to the XP12000 and how we utilize EVA arrays (some A/A, some A/P) as external storage. The XP12000 struggles to properly manage locks on LUN's as requested by connected hosts, regardless of whether the storage is on the XP itself, or in one of the attached arrays. Given that, VMW has provided a number of KB articles and best practices to help mitigate the issues...
<[http://kb.vmware.com/kb/1005009]> - Troubleshooting SCSI Reservation failures on Virtual Infrastructure 3 and 3.5
<[http://kb.vmware.com/kb/1105011]> - SCSI Reservation Failures on HP XP Storage Arrays
<[http://kb.vmware.com/kb/8411304]> VMotion Failure of Virtual Machines Located on LUSE LUNs on HP XP 10000 and XP 12000
<[http://www.vmware.com/files/pdf/scalable_storage_performance.pdf] - Storage Performance Guide
So, given that.... when we move to the new location, we're using a fully active/active EVA8100. However many of us are nervous about exceeding the 4 hosts/lun (cluster), and minimized vmdk's per lun. We'd ideally like to expand to 8 or 10 way clusters, but nervous.
The current consensus is to start slow, build a 5 node cluster, and slowly add each node, maybe once a week to measure load and performance.
We have 4 VMware Cluster: 2 with 2 vmhosts and 2 with 5 hosts, but all this 14 vmhosts (in 2 differents datacenters) have visibility to all LUNs on our two EVAs 8000 (sincronized beetwen then with PPRC).
The difference that we works with IBM x3950 and ESX 3.0.2. We have defined more than 30 LUNs of 300Gb and 500Gb.
Our production infraestructure have been stable from jan-2007, with no problems (touching wood)
We have a 6 node - soon to be 8 node - BL460g1 ESX 3.5u1 cluster connected to EVA 8000 arrays. I have seen scsi reservation issues, but they came down to two main things - misconfiguration on the array (one host was accidentally set as Windows rather than VMWare on one of the arrays) and failing hardware - specifically a broken HBA. We're currently using 200Gb luns to give us somewhere between 4 and 7 VMs, depending on the size of them. We are using the HP management software, and I haven't disabled any parts of it. I also found the VMWare and HP StorageWorks Best Practice guide (http://h71019.www7.hp.com/ActiveAnswers/downloads/VMware3_StorageWorks_BestPractice.pdf) useful, although it doesn't mention EVA 8100 arrays. I'll be watching our ESX cluster closely from now on for any more scsi reservation issues, but hopefully we've made progress by solving the 2 issues I mentioned above.
Senior Systems Engineer
Central Queensland University
We're running DL385s using the slightly older EVA8000.As part of this we use Continuous Access to replicate the data to an EVA6000 at our DR site.
We saw some SCSI Reservation issues during periods of heavy load. Much of that went away when we upgraded to XCS 6.1. But as you're running the 8100 I think you'l already be at XCS 6.110 (6.200 is now out).
We also believed some of the issues were caused by HP VMM which we also have installed. This would be particularly problematic during VMotion activity.
That does seem to have become less troublesome with the updated VMM 3.5.
A few posts came in while I was writing this. On our EVA 8000 we have a 6 node and a 3 node cluster.
The 6 node cluster is currently still running ESX 3.0.2 and has around 12 LUNs (not at work at the mo so can't be precise) presented to it. They're a mixture of two EVA Groups, one of Fibre SCSI Disks, one of FATA.
We are also seeing some strange issues with vMotion and high CPU utilization. Our configuration is 2-c7000 BladeSystems w/8 BL480cs using Emulex LPe1105 with firmware version 2.72A2 (1 is using 2.70A5 as per HP's recommendation when using the HP XP 12000) and we are using an HP XP 12000 as the storage device. We have 33 LUNs connecting to 8 ESX hosts and 23 of those LUNs are configured as RDMs in virtual compatibility mode. Can you be more specific to the nature of the problems that you were experiencing on the HP XP 12000? Thanks.