VMware Cloud Community
archITech
Contributor
Contributor

SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering.

Here's my config :

-VI 3

-HP pClass blades connected to an HP EVA6000

- latest HP firmware on the 6000

- hosts setup as VMWare (per HP best practices)

-LUNS presented for VMFS volumes (no troubles here)

-LUNS presented to all blades and setup as raw device mappings to a virtual Microsoft SQL Server Cluster (using MS Clustering Services)

- setup per VMWare's best practices on VM MS Clusters (unless I missed a critical setup, which I won't discount).

Here's the problem. There are approximately a half-dozen LUN's presented in this way. When going to add new storage (VMFS volumes), it is taking anywhere between 6-7 minutes to finish up with the SCSI reservation timeouts on these LUNs (confirmed that this is the issue by tailing /var/log/vmkernel).

No other LUNS have problems... just the ones that are presented to the cluster. We have other RAW device mappings to Windows VM's that work fine (and don't have the reservation conflict issue).

I was originally getting timeouts in the client, but increasing the timeout value to 10 minutes in the client makes it so I can add storage, but it is taking quite a long time (10 blades need to have this done).

The problem occurs when accessing the configuration either through VC or directly on the host itself.

Here's my question... is there any way to make ESX not touch those LUNs when performing tasks like adding new storage or this something I have to live with? Also-- am I doing it all wrong and is there some configuration that results in this not happening?

We have no performance issues, problems with fail-overs or anything else. The only problems come when trying to work with the storage.

Any thoughts or suggestions are much appreciated.

--Brad Watson

0 Kudos
23 Replies
admin
Immortal
Immortal

Gentlemen, this is to be expected. MSCS clustering uses a SCSI3 persistent reservation, ESX uses SCSI2 non-persistent. Any rescans, add storage, etc will encounter a SCSI reservation on the luns in question as ESX cannot read the metadata, since it is locked. The logs will then report the reservation conflict and fail the IO to the lun.

This is a known limitation of MSCS clustering in ESX - the workaround is to disable the cluster during the storage operations.

0 Kudos
vcpguy
Expert
Expert

I am facing same issue on my cluster. I have 7 node cluster and along with other VMs, I have configured 5 MSCS.

I see lot of SCSI reservation error messages. I am not sure, whether this is a cause of concern OR something, which can be ignored.

My ESX is 3.5 U4.

I have already set set the SCSI.ConflictRetries set to 10.

Any ideas/suggestions are most welcome. Will keeping all the MSCS nodes on a seperate dedicated cluster help ?

Thanks

----------------------------------------------------------------------------- Please don't forget to reward Points for helpful hints; answers; suggestions. My blog: http://vmwaredevotee.com
0 Kudos
kastlr
Expert
Expert

Hi,

check the following VMware knowledge base article, it will explain the seen behaivior.

ESX and ESXi 4.0 Update 1 hosts hosting passive MSCS nodes with RDM LUNs may take a long time to boo...

Even if the title declares to be ESX4 related, the seen behaivior didn't change to ESX3.x.

Hope this helps a bit.

Greetings from Germany. (CET)


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
vcpguy
Expert
Expert

You are right, but still that is not helping my problem. In esx3.5 X the default for SCSI.conflict is at 80 and Vmware has asked it to be changed to 10 while adding new Luns.

I would like to know, will SCSI reservation error cause problems down the road. I want to be pro-active and not reactive.

Thanks

----------------------------------------------------------------------------- Please don't forget to reward Points for helpful hints; answers; suggestions. My blog: http://vmwaredevotee.com
0 Kudos