1 2 Previous Next 23 Replies Latest reply on Mar 1, 2010 7:02 AM by vcpguy

    SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering.

    archITech Novice

      Here's my config :

       

      -VI 3

       

      -HP pClass blades connected to an HP EVA6000

           - latest HP firmware on the 6000

           - hosts setup as VMWare (per HP best practices)

       

      -LUNS presented for VMFS volumes (no troubles here)

       

      -LUNS presented to all blades and setup as raw device mappings to a virtual Microsoft SQL Server Cluster (using MS Clustering Services)

           - setup per VMWare's best practices on VM MS Clusters (unless I missed a             critical setup, which I won't discount).

       

      Here's the problem.  There are approximately a half-dozen LUN's presented in this way.  When going to add new storage (VMFS volumes), it is taking anywhere between 6-7 minutes to finish up with the SCSI reservation timeouts on these LUNs (confirmed that this is the issue by tailing  /var/log/vmkernel). 

       

      No other LUNS have problems... just the ones that are presented to the cluster.  We have other RAW device mappings to Windows VM's that work fine (and don't have the reservation conflict issue).

       

      I was originally getting timeouts in the client, but increasing the timeout value to 10 minutes in the client makes it so I can add storage, but it is taking quite a long time (10 blades need to have this done).

       

      The problem occurs when accessing the configuration either through VC or directly on the host itself.

       

      Here's my question... is there any way to make ESX not touch those LUNs when performing tasks like adding new storage or this something I have to live with?  Also-- am I doing it all wrong and is there some configuration that results in this not happening?

       

      We have no performance issues, problems with fail-overs or anything else.  The only problems come when trying to work with the storage.

       

      Any thoughts or suggestions are much appreciated.

       

      --Brad Watson

        • 1. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
          Texiwill Guru
          vExpertUser Moderators

          Hello,

           

          SCSI reservation conflicts seen by the ESX Server are caused by doing two many simultaneous LUN activities  on a single LUN. The metadata is getting locked constantly. Since you are seeing them within ESX, we must investigate that.

           

          Can you give us the pertinent parts of the log file?

           

          Actions that can cause a conflict:

                   open, close file/link to RAW/RDM

                   change size of file/link to RAW/RDM

                   create/delete file/link to RAW/RDM

                   updating access/mod/create times of file/link to RAW/RDM

           

          So when you create a RAW, you are creating a link to the RAW within the metadata, updating access, modification, and create times, and setting an initial size.

           

          Are you doing ANY other actions on the LUN when you do this?

           

          EVA6000 should be able to handle up to 4 actions simultaneously.

           

          Also, the # of blades that see the LUN has an impact. How many see the LUN? > 8?

           

          Best regards,

          Edward

          • 2. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
            archITech Novice

            The LUNs in question are all specifically for the purpose of presenting disk to MSCS (Microsoft Clustering) nodes.  They are not used for anything else.  Sorry... I was reading through my post and realized that I'm probably not clear enough.

             

            It has to do with re-scanning and trying to add (new) storage, not related to those LUNs, on any ESX host in the cluster.  There aren't any errors related to any other LUNs showing in the logs, just the ones that are used as raw device mappings to MSCS VM's.

             

            We have other Windows VM's with raw device mappings (to other LUNs) and we have no problems.  I suspect that this problem is due to the fact that MSCS puts reservations on LUNs to tie that LUN to the box that has the cluster disk resource at the time.  ESX is trying to do a rescan on those reserved LUNs and getting reservation errors because the underlying MSCS VM has the LUN reserved.  Theory only...

             

            My question revolves around whether or not this behavior is by design or if I have something messed up in the config.  If the former is the case, then is there any workaround?  If the latter, then does anyone have any suggestions on how I might fix it?  I followed the MSCS best practices document from VMWare when setting these up (raw device mappings are physical).

             

            In answer to your questions, I am doing nothing else but re-scanning to add new storage and it's causing the reservation conflicts and timeouts.  The only thing that is happening is that the MSCS cluster is up and serving data (it's a QA SQL server cluster).

             

            There are currently 10 blades that can see the LUNs in question.

             

            This problem has been ongoing, but was not enough of an annoyance to warrant the time to figure it out (we add a VMFS volume every couple of months), but the last time I went to add a new VMFS volume, I couldn't do it without increasing the timeout value in the client to almost 10 minutes.  

             

            Many thanks for the quick response.

             

            --Brad

            • 3. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
              J Baio Novice

              I have the exact same issue with running mscs with a physical and virtual node.  I currently have a case open with VMWare about it but not much has come of it yet.  Beware that it may get to a point where a rescan could crash your ESX host, it happened to me thus alerting me to all the reservation errors.  I had called VMWare previous to the crash because rescans got very slow, but all they did was change the timeout and tell me it was a result of all the LUNs connected to our hosts.  Shame on me for not digging further and seeing all the reservation errors which are causing the slow rescans.

              • 4. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
                Jae Ellers Master

                What version of ESX are you on?  3.0.1 needs 2 Gb drivers.

                3.0.1 http://kb.vmware.com/kb/1560391

                 

                3.0.2 http://kb.vmware.com/kb/1002304

                1 person found this helpful
                • 5. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
                  J Baio Novice

                  I run Emulex 4gb cards so I am currently in violation of this as I run 3.0.1.  Fortunately, I am moving to a new EMC DMX SAN in the coming months, and at the same time upgrading to 3.0.2, which now supports MSCS with 4GB drivers as I read the other day (also shown in the last link of your post)

                   

                   

                  Could this possibly clear the reservation issues?

                  • 6. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
                    archITech Novice

                    All of the blades are HP P Class (8 are G1, 2 are G2) and are connected to the switches at 2gbps.

                    • 7. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
                      archITech Novice

                      Looking at it further, however, reveals that I'm using the 4gb drivers apparently.

                       


                      \[root@enc1bl1 root]# vmkload_mod -l
                      Name R/O Addr Length R/W Addr Length ID Loaded
                      vmkapimod 0x7b5000 0x1000 0x1dff070 0x1000 1 Yes
                      vmklinux 0x7b6000 0x18000 0x1e8b610 0x3e000 2 Yes
                      cciss 0x7ce000 0x6000 0x1ed3ab8 0x2000 3 Yes
                      qla2300_707 0x7d4000 0x44000 0x1ed7ba0 0x72000 4 Yes
                      tg3 0x818000 0x12000 0x1f53c48 0x4000 5 Yes
                      tcpip 0x82a000 0x3b000 0x1f58670 0x1b000 6 Yes
                      cosShadow 0x865000 0x3b000 0x1f756b8 0x1b000 7 Yes
                      migration 0x8a0000 0xe000 0x1f926d0 0x1000 8 Yes
                      lvmdriver 0x8ae000 0xc000 0x1f93888 0x2000 9 Yes
                      nfsclient 0x8ba000 0x11000 0x1f968a8 0x1000 10 Yes
                      vmfs3 0x8cb000 0x23000 0x1f99bc0 0x1000 11 Yes
                      vmfs2 0x8ee000 0x11000 0x1f9c460 0x11000 12 Yes

                       

                      I'll get patching coordinated and we'll see if getting to 3.0.2 (or at least getting the right drivers installed) fixes everything.  I'll re-post the results, but it'll be a week or so.

                       

                      Thanks for the info.

                      • 8. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering.
                        murreyaw Hot Shot
                        vExpert

                        This is going to sound crazy, however rather than using RDM, if available direct mount iSCSI via the Windows iSCSI initiator.  Tons faster than RDM.  I couldn't tell if you were using fiber, or iSCSI.  At VMWorld, Bluelock, showed that iSCSI initiator inside the VM, rather than using ESX's initiator for the data drives was significantly faster than RDM of a lun through the ESX intiator.

                        1 person found this helpful
                        • 9. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
                          J Baio Novice

                          Please do post your results, I may not be able to get my tests done by then and would like to know what happens.

                          • 10. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
                            Anton Kolomyeytsev Enthusiast

                            This is 200% true. For now guest OS iSCSI initiator access is much faster then ESX own mapping. Unfortunately...

                             

                            -a

                            • 11. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering
                              archITech Novice

                              Wow... what really sucks in that case is that we're using fibre, through-and-through.  I don't have iSCSI in my environment at all (yet). 

                               

                              So far, upgrading a couple of the blades to 3.0.2 has not resolved the issue.  I want to wait until they are all upgraded to 3.0.2 before I make any hard statements about it though (we're in the middle of upgrading today... and there's 10 blades to do).

                               

                              The other bothersome thing for me is that we have 4gb HBAs (in Mezzanine) on the blades, connected to 4gb switches, running 4gb drivers, and I'm running at 2gb.  The next thing I'm going to do is to down a blade, force it to 4gb on the switch and bring the blade back up.  Thoughts on whether or not anything thinks this will help is appreciated.  We are connecting to both an EVA6000 running at 2gb (2x2gb per switch) and an EVA8000 running at 4gb (4x4gb per switch).

                               

                              Thanks for the additional info and I'll keep everyone posted on our results.

                              • 12. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering.
                                archITech Novice

                                Just a quick update.  We've upgraded to the latest and greatest on 9/10 blades.  At this time, the problem is still going on.  I'm going to get the other blade upgraded, just to be sure and then place a support call to VMWare.  LUN rescans are still getting hung up on the LUNs presented for MSCS because they have reservations.  Given all the LUN problems with presenting RAW to a VM, I think we're going to look at iSCSI as soon as can.

                                • 13. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering.
                                  murreyaw Hot Shot
                                  vExpert

                                  iscsi with the initiator in the VM has worked flawlessly for me everytime.

                                  • 14. Re: SCSI Reservation Conflicts, Raw Device Mappings, and Windows Clustering.
                                    IB_IT Expert

                                    Hi archITech,

                                    just curious of the status on this...were you ever able to get this resolved?

                                    1 2 Previous Next