bradyk87
Contributor
Contributor

Issue with clustered RDM's and storage outages

Hi all

We have a number of clusters that each contain about 15 hosts. We utilise RDM's for Microsoft failover clusters quite heavily in our environment as well - up to 70 RDM's. Our SAN array is a VNX 7500. All hosts within each cluster are defined in a host group on the array.

ESXi hosts are Dell M620's M630's and R730's. Running ESXi 5.5 Update 3.

All works well on a day-to-day basis however we have been having issues with random clusters experiencing a failure/failover whenever we add a new host to the host group on the SAN array. It appears that when the host is added to the storage group it automatically kicks off a storage scan (as i can see because the Datastores on the host start appearing automatically). Some time after the host is added to the storage group, sometimes 15 minutes, sometimes up to 5 hours, some of the clusters start failing due to the physical disks which they use being unavailable. Errors we are seeing in the event log:

Cluster resource 'INST01_Log' of type 'Physical Disk' in clustered role 'SQL Server (clustername\INST01)' failed.

Ownership of cluster disk 'INST02_Data' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

In most cases the cluster will successfully fail over to the passive node. In other instances I'll need to manually bring the disk resource back online if it hasnt automatically recovered.

The reason for the extremely long time it is taking before it causes an issue is seen is because as the RDM's are being scanned for the first time, there is a SCSI reservation on them which does not allow them to be read. It waits until it times out before move onto the next device. As good practise we perennially reserve all of our cluster RDM's however its not possible to do this until the disk has been added for the first time. If we happen to reboot a host that hasnt had the disk perennially reserved yet it can take up to 6 hours for it to start responding.

We logged a job with VMware however they came back saying that the issue is being caused by the array and we should contact EMC. I dont necessarily agree with this as things operate fine usually - its just when a host is added for the first time and a scan takes place does it cause some sort of lock on the RDM that prevents the MSCS cluster from being able to read/write to it. No issue with the VMFS data-stores themselves has been seen.

Has anyone else seen this or know what could be causing the issue? Should a host performing a scan on an RDM being used in an MSCS cluster cause it to fail?

Cheers
Brady

Tags (3)
0 Kudos
6 Replies
PaulLab3
Enthusiast
Enthusiast

Are you using EMC multipathing driver or native?

Some time ago I have problem with MS cluster validation with RDMs from HDS G200.

Solution for me was setting Most Recently Used policy for multipathing (VMware native driver).

bradyk87
Contributor
Contributor

We are using the native multipathing driver.

The LUN's are currently set to use Round Robin. I can set the LUN's to use MRU on the existing hosts however not sure if this will help when a new host is added to the storage group. I'll give it a shot in our test environment regardless and see how things so.

We are currently also playing with the idea of disabling VAAI as we have seen issues with it in the past. We think that the ATS locking primitive could in fact be causing issues based on some article we have found.

0 Kudos
RAJ_RAJ
Expert
Expert

Hi ,

Try with different SCSI devices , 2 devices in one scsi , if you are using more than 200 GB use in seperate scsi

OS DISK - SCSI 0:0

MSDTC and QUORUM - SCSI 1:0 , SCSCI 1:1

OTHER RDM  - SCSI 2:0

NEXT  - SCSI 3:0

Also check that in EMC the owner of the RDM LUN is changing or not  , try to make  SPA or SPB  some cases if the load on the LUNs increases then it changes from SPA and SPB  so that time it may fail.

RAJESH RADHAKRISHNAN VCA -DCV/WM/Cloud,VCP 5 - DCV/DT/CLOUD, ,VCP6-DCV, EMCISA,EMCSA,MCTS,MCPS,BCFA https://ae.linkedin.com/in/rajesh-radhakrishnan-76269335 Mark my post as "helpful" or "correct" if I've helped resolve or answered your query!
0 Kudos
MJNY
Contributor
Contributor

Hi,

Have you found a solution to this?  We are running into exact issue.  As soon as we add new ESXi host to VMware storage in SAN, MS clustered VM's losing access to the disk and cluster fails.

Thank you,

Mike

0 Kudos
gferreyra
Enthusiast
Enthusiast

We have.

We experienced the same situation.

VMware told us we have our ESXi hosts not updated. That's all.

A bug. Repaired on some patch.

Now, we have a cluster with paravirtual controller, 5 TB of clustered disks.

100% functional.

Cheers!

0 Kudos
anthonymaw
Contributor
Contributor

We experienced this problem (NetApp, Cisco UCS, VMware 6.x).  As someone else pointed out, the solution is to change the storage Multipath setting from "Round Robin" to "Most Recently Used (VMware)" setting.  The issue seems to be that the Windows Failover Cluster Manager service running on each node of the cluster periodically checks for disk ownership by sending a SCSI-3 protocol command to set "persistent reservation".  It is part of how the storage failure detection mechanism works.  Normally the owning node will get a SCSI acknowledgement signal.  However in round-robin the reservation set/check command goes out one channel and the reply comes back on the other channel then the owning cluster node never receives a response and assumes the cluster is down.  Other nodes in the cluster also check by sending SCSI commands to see if the LUN's persistent ownership is set and may or may not receive an a response creating a situation where none of the cluster nodes knows if any particular node has suffered storage access failure.  It's all documented in the Microsoft Failover Clustering storage management information.  This seems to only be an issue in virtualized environments like VMware.  In a physical multi-server Windows Failover Cluster where the Windows OS is installed on real servers with shared RDM disks one would install a Windows multipath I/O driver provided by the storage vendor to solve the problem of SCSI commands going out one channel and replies coming back on another channel.

0 Kudos