SCSI Reservation Conflicts, Raw Device Mappings, a...

archITech · ‎09-13-2007

Here's my config :

-VI 3

-HP pClass blades connected to an HP EVA6000

- latest HP firmware on the 6000

- hosts setup as VMWare (per HP best practices)

-LUNS presented for VMFS volumes (no troubles here)

-LUNS presented to all blades and setup as raw device mappings to a virtual Microsoft SQL Server Cluster (using MS Clustering Services)

- setup per VMWare's best practices on VM MS Clusters (unless I missed a critical setup, which I won't discount).

Here's the problem. There are approximately a half-dozen LUN's presented in this way. When going to add new storage (VMFS volumes), it is taking anywhere between 6-7 minutes to finish up with the SCSI reservation timeouts on these LUNs (confirmed that this is the issue by tailing /var/log/vmkernel).

No other LUNS have problems... just the ones that are presented to the cluster. We have other RAW device mappings to Windows VM's that work fine (and don't have the reservation conflict issue).

I was originally getting timeouts in the client, but increasing the timeout value to 10 minutes in the client makes it so I can add storage, but it is taking quite a long time (10 blades need to have this done).

The problem occurs when accessing the configuration either through VC or directly on the host itself.

Here's my question... is there any way to make ESX not touch those LUNs when performing tasks like adding new storage or this something I have to live with? Also-- am I doing it all wrong and is there some configuration that results in this not happening?

We have no performance issues, problems with fail-overs or anything else. The only problems come when trying to work with the storage.

Any thoughts or suggestions are much appreciated.

--Brad Watson

Texiwill · ‎09-13-2007

Hello,

SCSI reservation conflicts seen by the ESX Server are caused by doing two many simultaneous LUN activities on a single LUN. The metadata is getting locked constantly. Since you are seeing them within ESX, we must investigate that.

Can you give us the pertinent parts of the log file?

Actions that can cause a conflict:

open, close file/link to RAW/RDM

change size of file/link to RAW/RDM

create/delete file/link to RAW/RDM

updating access/mod/create times of file/link to RAW/RDM

So when you create a RAW, you are creating a link to the RAW within the metadata, updating access, modification, and create times, and setting an initial size.

Are you doing ANY other actions on the LUN when you do this?

EVA6000 should be able to handle up to 4 actions simultaneously.

Also, the # of blades that see the LUN has an impact. How many see the LUN? > 8?

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

archITech · ‎09-13-2007

The LUNs in question are all specifically for the purpose of presenting disk to MSCS (Microsoft Clustering) nodes. They are not used for anything else. Sorry... I was reading through my post and realized that I'm probably not clear enough.

It has to do with re-scanning and trying to add (new) storage, not related to those LUNs, on any ESX host in the cluster. There aren't any errors related to any other LUNs showing in the logs, just the ones that are used as raw device mappings to MSCS VM's.

We have other Windows VM's with raw device mappings (to other LUNs) and we have no problems. I suspect that this problem is due to the fact that MSCS puts reservations on LUNs to tie that LUN to the box that has the cluster disk resource at the time. ESX is trying to do a rescan on those reserved LUNs and getting reservation errors because the underlying MSCS VM has the LUN reserved. Theory only...

My question revolves around whether or not this behavior is by design or if I have something messed up in the config. If the former is the case, then is there any workaround? If the latter, then does anyone have any suggestions on how I might fix it? I followed the MSCS best practices document from VMWare when setting these up (raw device mappings are physical).

In answer to your questions, I am doing nothing else but re-scanning to add new storage and it's causing the reservation conflicts and timeouts. The only thing that is happening is that the MSCS cluster is up and serving data (it's a QA SQL server cluster).

There are currently 10 blades that can see the LUNs in question.

This problem has been ongoing, but was not enough of an annoyance to warrant the time to figure it out (we add a VMFS volume every couple of months), but the last time I went to add a new VMFS volume, I couldn't do it without increasing the timeout value in the client to almost 10 minutes.

Many thanks for the quick response.

--Brad

J_Baio · ‎09-17-2007

I have the exact same issue with running mscs with a physical and virtual node. I currently have a case open with VMWare about it but not much has come of it yet. Beware that it may get to a point where a rescan could crash your ESX host, it happened to me thus alerting me to all the reservation errors. I had called VMWare previous to the crash because rescans got very slow, but all they did was change the timeout and tell me it was a result of all the LUNs connected to our hosts. Shame on me for not digging further and seeing all the reservation errors which are causing the slow rescans.

Jae_Ellers · ‎09-17-2007

What version of ESX are you on? 3.0.1 needs 2 Gb drivers.

3.0.1 http://kb.vmware.com/kb/1560391

3.0.2 http://kb.vmware.com/kb/1002304

-=-=-=-=-=-=-=-=-=-=-=-=-=-=- http://blog.mr-vm.com http://www.vmprofessional.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-

J_Baio · ‎09-17-2007

I run Emulex 4gb cards so I am currently in violation of this as I run 3.0.1. Fortunately, I am moving to a new EMC DMX SAN in the coming months, and at the same time upgrading to 3.0.2, which now supports MSCS with 4GB drivers as I read the other day (also shown in the last link of your post)

Could this possibly clear the reservation issues?

archITech · ‎09-17-2007

All of the blades are HP P Class (8 are G1, 2 are G2) and are connected to the switches at 2gbps.

archITech · ‎09-17-2007

Looking at it further, however, reveals that I'm using the 4gb drivers apparently.

\[root@enc1bl1 root]# vmkload_mod -l
Name R/O Addr Length R/W Addr Length ID Loaded
vmkapimod 0x7b5000 0x1000 0x1dff070 0x1000 1 Yes
vmklinux 0x7b6000 0x18000 0x1e8b610 0x3e000 2 Yes
cciss 0x7ce000 0x6000 0x1ed3ab8 0x2000 3 Yes
qla2300_707 0x7d4000 0x44000 0x1ed7ba0 0x72000 4 Yes
tg3 0x818000 0x12000 0x1f53c48 0x4000 5 Yes
tcpip 0x82a000 0x3b000 0x1f58670 0x1b000 6 Yes
cosShadow 0x865000 0x3b000 0x1f756b8 0x1b000 7 Yes
migration 0x8a0000 0xe000 0x1f926d0 0x1000 8 Yes
lvmdriver 0x8ae000 0xc000 0x1f93888 0x2000 9 Yes
nfsclient 0x8ba000 0x11000 0x1f968a8 0x1000 10 Yes
vmfs3 0x8cb000 0x23000 0x1f99bc0 0x1000 11 Yes
vmfs2 0x8ee000 0x11000 0x1f9c460 0x11000 12 Yes

I'll get patching coordinated and we'll see if getting to 3.0.2 (or at least getting the right drivers installed) fixes everything. I'll re-post the results, but it'll be a week or so.

Thanks for the info.

murreyaw · ‎09-17-2007

This is going to sound crazy, however rather than using RDM, if available direct mount iSCSI via the Windows iSCSI initiator. Tons faster than RDM. I couldn't tell if you were using fiber, or iSCSI. At VMWorld, Bluelock, showed that iSCSI initiator inside the VM, rather than using ESX's initiator for the data drives was significantly faster than RDM of a lun through the ESX intiator.

J_Baio · ‎09-17-2007

Please do post your results, I may not be able to get my tests done by then and would like to know what happens.

Anton_Kolomyeyt · ‎09-18-2007

This is 200% true. For now guest OS iSCSI initiator access is much faster then ESX own mapping. Unfortunately...

-a

archITech · ‎09-20-2007

Wow... what really sucks in that case is that we're using fibre, through-and-through. I don't have iSCSI in my environment at all (yet).

So far, upgrading a couple of the blades to 3.0.2 has not resolved the issue. I want to wait until they are all upgraded to 3.0.2 before I make any hard statements about it though (we're in the middle of upgrading today... and there's 10 blades to do).

The other bothersome thing for me is that we have 4gb HBAs (in Mezzanine) on the blades, connected to 4gb switches, running 4gb drivers, and I'm running at 2gb. The next thing I'm going to do is to down a blade, force it to 4gb on the switch and bring the blade back up. Thoughts on whether or not anything thinks this will help is appreciated. We are connecting to both an EVA6000 running at 2gb (2x2gb per switch) and an EVA8000 running at 4gb (4x4gb per switch).

Thanks for the additional info and I'll keep everyone posted on our results.

archITech · ‎10-09-2007

Just a quick update. We've upgraded to the latest and greatest on 9/10 blades. At this time, the problem is still going on. I'm going to get the other blade upgraded, just to be sure and then place a support call to VMWare. LUN rescans are still getting hung up on the LUNs presented for MSCS because they have reservations. Given all the LUN problems with presenting RAW to a VM, I think we're going to look at iSCSI as soon as can.

murreyaw · ‎10-09-2007

iscsi with the initiator in the VM has worked flawlessly for me everytime.

IB_IT · ‎02-28-2008

Hi archITech,

just curious of the status on this...were you ever able to get this resolved?

DROCK_IN_NC · ‎06-05-2008

Here is a way to speed up the actions you need to do while you have MSCS scsi reservation conflicts causing serious delays.

Advanced Configuration -> SCSI -> Scsi.ConflictRetries = 10 <--- SET THIS TO 10 OR LESS

This is set to 80 by default which means ESX will try to get a hold of that lun 80 times before giving up. This is much much much better than messing with the server timeout which is probably set to 10+ mins.

This is just some of the things I have done as I try to get rid of / deal with my MSCS Scsi reservation conflict issues.

A13xxx · ‎01-08-2009

Damn i have just encountered the exact same problem , now my server takes over 30 minutes to boot! i have reduced it down to 10 but it doesnt help much with the boot time. It would be nice if VMware could address this issue

Texiwill · ‎01-09-2009

Hello,

This ESX setting will force SCSI Reservation Conflicts to occur faster. THereby causing failures to be seen faster.

Advanced Configuration -> SCSI -> Scsi.ConflictRetries = 10 <--- SET THIS TO 10 OR LESS

This is set to 80 by default which means ESX will try to get a hold of that lun 80 times before giving up. This is much much much better than messing with the server timeout which is probably set to 10+ mins.

Actually it will send the SCSI Reservation Request 80 times before giving up.

During a boot, are you booting all blades at once or just one at a time? If you boot all blades at once you WILL get reservation conflicts as this is an intensive action.

I would make sure your EVA has the most recent firmware, as well as the switches, and the firmware on the HBAs is the most recent as well. I would also investigate any SAN logs and correlate them to the VMkernel logs. Perhaps there is a problem within the SAN as well. If the problem continues I would contact HP Support. It sounds like the SAN is not responding fast enough to the requests.

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Blue Gears and SearchVMware Pro Blogs: http://www.astroarch.com/wiki/index.php/Blog_Roll

Top Virtualization Security Links: http://www.astroarch.com/wiki/index.php/Top_Virtualization_Security_Links

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

smudger · ‎01-09-2009

Hi - interesting discussion and I've had similar in our labs.

One thing I haven't read is how many VMs you are running on your LUNs, or problem LUNs? imo the reservation conflicts always start appearing when the LUNs is saturated with too many VMs, utlimately the LUN starts timing out and disappears......

Thanks.

Texiwill · ‎01-09-2009

Hello,

An overloaded fabric would cause timeouts as well. This depends not as much on the number of VMs during boot, but the amount of overall fabric traffic which could be VMs, backups, etc. I would still consult the SAN logs....

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Blue Gears and SearchVMware Pro Blogs: http://www.astroarch.com/wiki/index.php/Blog_Roll

Top Virtualization Security Links: http://www.astroarch.com/wiki/index.php/Top_Virtualization_Security_Links

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

All

SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering.