Solved: Re: SCSI reservation where does it happen

jspilon · ‎03-28-2008

Hello,

After searching through the forums regarding SCSI Reservation matters for LUN Sizing I have some questions that I would like to answer before going forward with our setup.

Understanding that SCSI Reservation occurs whenever the metadata is updated, the following questions arise:

- Does scsi reservation affects the VMFS partition, the underlying LUN (in case the vmfs spans over several luns) or the array hosting it?

So excluding amount of VMs, features used (snapshots, vmotion), raid level, IOPS, I/O contention and space capacity,

would it be better to <> In order to avoid SCSI reservation contention? Why?

1. spawn 1 VMFS over several smaller LUNs on 1 array

2. spawn 1 VMFS over several LUNs in a 1:1 LUN to array ratio

3. spawn Multiple VMFS over 1:1:1 array to LUN to VMFS ratio

4. spawn Multiple VMFS over 1:1 LUN to VMFS ratio over 1 array

Texiwill · ‎03-29-2008

Hello,

SCSI reservations LOCK the LUN or LUNs the VMFS resides upon. It does not lock the partition on a LUN but the entire LUN. SCSI reservations are locks that happen on the SAN/NAS device serving up the LUN.

Because of this the only way to elliminate a SCSI reservations is to balance your VMFS activities across multiple independent VMFS. So:

1 VMFS covering multiple LUNS causes all the LUNs to effectively LOCK but really it is only the one holding the metadata. Effectively because no metadata update can occur until the lock is cleared.

1 VMFS covering one LUN causes a lock of the LUN.

If a VM spans more than one VMFS, then an action that causes a metadata update for that VM locks ALL LUNS.

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

View solution in original post

Texiwill · ‎03-29-2008

Hello,

SCSI reservations LOCK the LUN or LUNs the VMFS resides upon. It does not lock the partition on a LUN but the entire LUN. SCSI reservations are locks that happen on the SAN/NAS device serving up the LUN.

Because of this the only way to elliminate a SCSI reservations is to balance your VMFS activities across multiple independent VMFS. So:

1 VMFS covering multiple LUNS causes all the LUNs to effectively LOCK but really it is only the one holding the metadata. Effectively because no metadata update can occur until the lock is cleared.

1 VMFS covering one LUN causes a lock of the LUN.

If a VM spans more than one VMFS, then an action that causes a metadata update for that VM locks ALL LUNS.

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

mcowger · ‎03-29-2008

Just to get in a little deeper from Edward's description (which is dead on).

Various arrays implement reservations differently. Good ones really do implment the reservations on the LUN level - meaning if LUN A is locked, LUN B doesn't have to be. Some will lock LUN B with LUN A if their share a parity/RAID group, even though the lock was requested again A only. Some really cheaper arrays only support one lock at a time, so locking LUN A will lock A, B, C ... N

Another reason to make sure you are using a good array.

--Matt

--Matt VCDX #52 blog.cowger.us

Texiwill · ‎03-30-2008

Hello,

To get even a little deeper... Locks releases are based on when the Array reports back that the lock is released. The release could have happened 100s or microseconds ago, BUT the array does not report this to the VI3/ESX server, so the reservation is according to the VI3/ESX server still in use but the array is ready for another lock. This is generally a firmware problem in the array. At one time Hitachi arrays suffered from this (several years ago) so it is important to have the appropriate levels of firmware on your arrays.

If you think this is happening to you, you should get a logic analyzer on the Fabric as it is the only way to see such things. Unfortunately.

Check out http://www.informit.com/articles/article.aspx?p=1156956 for detailed information. This is an excerpt from my book provided by Pearson Education.

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

DougBaer · ‎03-30-2008

Because Hitachi arrays were mentioned, I have to chime in here.

When a LUSE is created in an Hitachi array (basically a concatenation of LDEVs to create a big LUN), it appears that a SCSI lock must lock each LDEV in succession within the LUSE before the array will report that the LUN has been locked (and the same for unlocking). I have seen as much as TWO SECONDS latency when the LUSEs were made up of around 26 LDEVs. Needless to say that performance on those LUNs was TERRIBLE whenever metadata was being changed -- like anytime there was a VM with a snapshot and reasonable amount of I/O, for example.. It appears that fewer than 6 LDEVs in the LUSE do not seem to cause as much of an issue, although I suspect there is still some latency.

As was mentioned in this post, however, the locking seems to be array dependent. I have not seen such issues with concatenated metavolumes on EMC Symmetrix arrays, even when the metavolumes were composed of many LDEVs.

Doug Baer, Solution Architect, Advanced Services, Broadcom | VCDX #019, vExpert 2012-23

mcowger · ‎03-30-2008

Indeed, we are seeing the same LUSE locking issues on our Hitachi arrays - not to mention their crappy performance. Whoever at HDS thought concat was a good idea for a LUSE needs to be slapped.

--Matt

--Matt VCDX #52 blog.cowger.us

DougBaer · ‎03-30-2008

AMEN!

I like this one... from http://kb.vmware.com/kb/3408142

"SCSI reservation conflict warnings will still be present. This is normal and expected because LUNs continue to be shared in a multi-initiator environment. Host Mode Option 19 I/O improvement will enable the host to retry commands with a greater probability of success."

I love to run my production environments with a greater probability of success.

Doug Baer, Solution Architect, Advanced Services, Broadcom | VCDX #019, vExpert 2012-23

mcowger · ‎03-30-2008

This crap is the reason I will be moving my VI environment off HDS (even though we love them for the bigger DB style stuff). the only reason we have lasted this longon that with the crappy performance is the stupidly large amount of cache they have (128GB on our arrays)

--Matt

--Matt VCDX #52 blog.cowger.us

DougBaer · ‎03-31-2008

It is unfortunate, because HDS does make a solid array. (apologies to the original poster for hijacking the thread)

Back on topic, I would definitely not span a VMFS across multiple LUNs. While some folks I have encountered would like to consider the spanning capability of ESX similar to a volume manager, do not make that mistake. My understanding is that VMware still considers the use of extents to be "for emergency use only".

Even if this view has changed, keep in mind that extents give you capacity, not performance! This is concatenation technology -- one LUN has to be filled up before the next one is used. Another consideration is that not many people use extents on the ESX side, so any issues that you may encounter with your implementation may be shiny new issues that nobody else has run across... so rather than a quick answer form support, you may have a bit longer while logs are reviewed and analyzed.

With storage VMotion available as of ESX 3.5, it is much easier to present a new, large LUN and migrate VMDKs to the new storage rather than extending VMFS partitions with extents for long-term use.

As for sizing, I have heard a lot of concern expressed with the 2TB LUN limit -- keep in mind that VMware's recommendation for VMFS LUNs is to limit VMDK population per LUN due to the potential for SCSI locking issues. I generally use 10-15 VMDKs per VMFS as a starting point and size VMFS based on my expected VM sizes.

Doug Baer, Solution Architect, Advanced Services, Broadcom | VCDX #019, vExpert 2012-23

jspilon · ‎03-31-2008

Thank you all for your answers,

I will definitely stay away from the extents since i don't like the idea ofthe space spanning over the new LUNs. I understand there is no gain of performance.

We have an IBM DS4700 if this can help anyone in guiding me.

So if i understand properly, the array should not be affected by SCSI reservation? Let say I have VMFS1 and VMFS2 on the same array with each vmfs on 1 lun, would a lock on VMFS1 affect VMFS2 ?

For example, I have 10 disk, I could setup 1 array of 10 disk and setup 2 luns over it for setting up 2 vmfs, or I could setup 2 arrays with 5 disk each and setup vmfs on separate spindles.

Would taking advantage of more spindles would justify setting up the 2 vmfs on the same array?

DougBaer · ‎03-31-2008

I do not have experience with the DS4700, but there is no reason that a SCSI lock on VMFS1 would affect VMFS2 in the configuration that you describe. The SCSI locks are generally per VMFS partition due to metadata changes.

If the DS4700 is active-active, you can manually loadbalance the storage paths by making VMFS1 active on path A and passive on path B -- the doing the opposite for VMFS2.

As for whether to create 2 arrays or 1 array, perhaps someone with DS4700-specific experience could weigh in -- but having more spindles behind any LUN helps in general, and having one array vs. two generally helps with disk allocation (does the DS4700 configure spare disks per array or per chassis?).

Doug Baer, Solution Architect, Advanced Services, Broadcom | VCDX #019, vExpert 2012-23

jspilon · ‎04-03-2008

Thank you all for replying, this thread helped me understand the concept a little more into details. I will opt for a more conservative option since we want to use the features that will cause SCSI reservation such as DRS and snapshots... and I do not have spare drives in the SAN at the moment to move the VMs if the setup isn't delivering as expected.

klich · ‎07-15-2008

DougBaer,

Seems you have some experience with troubleshooting the use of LUSE's on HDS arrays. Would you mind sharing with me how you were able to capture the latency times with the 26-way LUSE's?

We're seeing a similar issue with LUSE's (largest is 16-way), but I need some way to prove this issue to my storage team for them to take action. I would like to implement the VMware / HDS recommendations of not using LUSE's, but need backing to support this change.

Thank you.

All

SCSI reservation where does it happen