Here's my config :
-VI 3
-HP pClass blades connected to an HP EVA6000
- latest HP firmware on the 6000
- hosts setup as VMWare (per HP best practices)
-LUNS presented for VMFS volumes (no troubles here)
-LUNS presented to all blades and setup as raw device mappings to a virtual Microsoft SQL Server Cluster (using MS Clustering Services)
- setup per VMWare's best practices on VM MS Clusters (unless I missed a critical setup, which I won't discount).
Here's the problem. There are approximately a half-dozen LUN's presented in this way. When going to add new storage (VMFS volumes), it is taking anywhere between 6-7 minutes to finish up with the SCSI reservation timeouts on these LUNs (confirmed that this is the issue by tailing /var/log/vmkernel).
No other LUNS have problems... just the ones that are presented to the cluster. We have other RAW device mappings to Windows VM's that work fine (and don't have the reservation conflict issue).
I was originally getting timeouts in the client, but increasing the timeout value to 10 minutes in the client makes it so I can add storage, but it is taking quite a long time (10 blades need to have this done).
The problem occurs when accessing the configuration either through VC or directly on the host itself.
Here's my question... is there any way to make ESX not touch those LUNs when performing tasks like adding new storage or this something I have to live with? Also-- am I doing it all wrong and is there some configuration that results in this not happening?
We have no performance issues, problems with fail-overs or anything else. The only problems come when trying to work with the storage.
Any thoughts or suggestions are much appreciated.
--Brad Watson
Gentlemen, this is to be expected. MSCS clustering uses a SCSI3 persistent reservation, ESX uses SCSI2 non-persistent. Any rescans, add storage, etc will encounter a SCSI reservation on the luns in question as ESX cannot read the metadata, since it is locked. The logs will then report the reservation conflict and fail the IO to the lun.
This is a known limitation of MSCS clustering in ESX - the workaround is to disable the cluster during the storage operations.
I am facing same issue on my cluster. I have 7 node cluster and along with other VMs, I have configured 5 MSCS.
I see lot of SCSI reservation error messages. I am not sure, whether this is a cause of concern OR something, which can be ignored.
My ESX is 3.5 U4.
I have already set set the SCSI.ConflictRetries set to 10.
Any ideas/suggestions are most welcome. Will keeping all the MSCS nodes on a seperate dedicated cluster help ?
Thanks
Hi,
check the following VMware knowledge base article, it will explain the seen behaivior.
Even if the title declares to be ESX4 related, the seen behaivior didn't change to ESX3.x.
Hope this helps a bit.
Greetings from Germany. (CET)
You are right, but still that is not helping my problem. In esx3.5 X the default for SCSI.conflict is at 80 and Vmware has asked it to be changed to 10 while adding new Luns.
I would like to know, will SCSI reservation error cause problems down the road. I want to be pro-active and not reactive.
Thanks