(2) IBM x3650 hosts running esxi 5.1 with 2xHBA SAS connections to IBM DS3200 dual controller SAN.
I have 2 arrays, both set to multi-host access for vmware.
With a single host powered on and running all vm's, the disk performance seems normal.
As soon as I power on the second host, the latency on host-1 soars (as high as 500ms!!), even though there are *no vm's* on host-2!!! If I power down host-2, latency on host-1 returns to acceptable levels. Again, there should be *NO* disk activity caused by host-2 since it has no vm's and boots from internal storage.
I've motioned all vm's to host-2 and the symptoms are the same - disk latency is fine with all vm's on host-2 and host-1 powered off. If I power host-1 back on, the vm's running on host-2 grind to a near-halt due to latency.
The only oddity I've noticed is the path for one of the LUN's is different than what I think it should be.
hba1 - runtime name: hba1:c0:t0:L1
hba2 - runtime name: hba1:c0:t0:L1 <-- note, this shows the same runtime name as hba1
manage path: hba1:c0:t0:L1
hba1 - runtime name: hba1:c0:t0:L1
hba2 - runtime name: hba1:c0:t0:L1 <-- note, same
manage path: hba2:c0:t0:L1 <-- notice how this says hba2, not hba1 (hba1 is "standby")
One additional note:
Before the second esxi 5.1 was added, there was a Windows bare-metal host connected to the SAN, with the esxi host accessing 1 array, and the Windows host accessing the other array. In that configuration, disk latency was also normal. The Windows host has been removed and disconnected. I also removed the host group in the Storage Manager.
Is there anything new in your case? Did you manage to find a solution?
I'm very interested in how your environment is doing as I would like to migrate to vSphere 5.1 but unfortunatelly DS3200 dual SAS is not currently on vMware compatibility matrix with vSphere 5.1
My environment is very similar to yours: 2x IBM x3650M3 in HA connected with DS3200 dual SAS.
I would be more than happy if you could share your experience.
This weekend I am going to assign one host to one array/lun and th other host to the other array/lun.
I am also going to see if I have the IBM-specific build of ESXi on both hosts. I have run into issues with the vmware build on IBM hardware - not issues so much as drivers for some IBM devices: nic's and controllers.
I will post back with the results.
I do have 2 other sites with multiple IBM hosts connected by SAS HBA's to IBM DS3400 and DS3500 units running in production without this issue, so I am confident that it it can work.
Thank you for your response.
I would expect DS3500 to work without any issue as it is LSI 2600 - the same is Dell MD3200 which is on supported storage systems for vSphere 5.1
It's good to hear that DS3400 works withouth any issue thou.
I will wait for your further results with DS3200.
Yeah, but VMware was saying that you can't have "shared access" using SAS. Which seems strange, that's the whole point of vmfs.
VMware was telling me it has to go through some sort of switching like iSCSI or FC.
That's odd. I admit I never heard of that restriction, but I can tell that I have been successfully running ESXi 4.1 on two IBM x3650M3 in HA, both connected directly to the same DS3200 dual SAS for over two years and never experienced any problems with them.
That is good to hear. I had not heard that as a restriction either, and this is the only installation giving me this issue.
How do you have your paths set up in Storage Adapters --> Manage paths?
The one difference I see between the working systems I have deployed is that the "non-working" system has the 2 paths to the SAN in Active/Standby, but the working systems are Active/Active.
Sorry for the delay I was mostly out of the office lately and totally forgot to check that.
Anyway, I finally did and I can tell that in my environment it is Active/Standby connection from hosts to LUNs.
This is my only environment with DS3200 x2 SAS to vMware hosts.
How did your tests go?
I thought I had a breakthrough on this issue, but it seems to be only an improvement, not a fix.
I installed the IBM-specific esxi 5.1 build on both hosts. In initial testing, the latency on the SAN (DS3200) seemed much better.
After 2 days, I have stated seeing latency mountains (i.e., an increase that goes up to 1000ms or more) for an extended period - 15-30 minutes (on the chart it looks like a mountain against the otherwise "flat" latency). During the spike/mountain, several servers become extremely sluggish.
The one event that occurs each day when the latency increases is the Symantec AntiVirus clients get their def updates. There are less than 50 clients total, several across WAN links, so there should not be a total hit on the network to explain the spike in latency.
In fact, after I installed the IBM build, I was copying 500MB files from VM to VM across hosts and LUN's - the file was literally "appearing" on the destination desktop with no Windows "copy progress bar". Copying a 500MB folder of several hundred files was slightly slower (as exptected compared to a single 500MB file), but still less than 30 seconds
I can't imagine the total size of all the def files would be more resource-demanding.
Bottom line - I don't think this is fixed.
These are bad news. If even IBM-specific build has that problem that would mean that there can be some compatibility issues with DS3200 - but hopefully some workaround could be found. It could also mean that this is configuration specific or environment specific issue, so i'm staying positive on this.
Unfortunatelly I will have to wait with the update of the host in my environment - it's production on remote location, so if I have to go back for some reasons it will take a while for me to get there and fix the whole thing.
Pozdrawiam, / Best regards,
- Downgraded to vmware-build esxi 5.0.0 (i.e., not the IBM-specific build).
- Updated firmware on DS3200 controllers
- updated firmware on all disks
- same issue - latency on first boot after applying the updates was so bad (2000+ ms), that one of the 10vm's between the 2 hosts took 20 minutes from Ctl-Alt-Del to get to desktop.
Vmware wants to get a conference call with me, VMware, and IBM all at the same time.
Will update again as events warrant.
can you give me more details of the below
- esx multipathing policy
- RAID level
- is there any raid rebuild going on the storage
Multipathing - VMware MRU
I've tried changing
RAID level 5 - I have 2 arrays, both RAID5
ARRAY1 - 4x500g SATA (no VM's currently on this array)
ARRAY2 - 3x750g SATA
No activities going on with the Arrays - everything is clean and "green", no warnings
Originally, I had 1 Windows (bare metal) host and 1 esxi host. Each was assigned to one of the arrays.
I have removed the Windows host, added a 2nd esxi host, and change both arrays to shared access, and reprovisioned ARRAY1 as a vmfs datastore.
As of yet, I've not moved any VM's from ARRAY2 to ARRAY1 until I get the latency issue under control.
I realize SATA in RAID5 is not the best configuration, but I've got other systems running 3-disk RAID5 datastores with no issues (latency peaks under 100ms on those systems - again, not super, but not a show-stopper like this).
Hi folks, had the same problems on local store on a brand new Dell-Server.
I found this article
from Eric Zandboer and changed Disk.DiscMaxIOSize from default 32767 Kb to 128 Kb
- and this was it!