VMware Cloud Community
jmarti05
Contributor
Contributor

Path Failover Causes Quorum Disk Failure

I've recently had two MSCS failures related to path changes. My environment is Windows 2003 R2 2 x node Active/Passive clusters on ESX 3.5 build 199239. Our storage is fully redundant, 2 x Qlogic HBAs in the hosts, 2 x McData Spheron 4700 switches to 2 x controllers on Hitachi AMS 500. The OS disks are VMDKs on the SAN, the shared disks are RDM. Each scenario is actually from a different environment, but the environments are identical.

In the first case, I was setting the preffered path to Fixed and a different controller on a MSCS file cluster. The cluster immediately went down. I was able to get all of the cluster disk resources online with the /fixquorum switch, but the quorum drive was not recognised. As soon as I switched the path back to the original controller, the cluster and quorum drive came up without issue.

In the second case, while updating controller firmware we had a planned path failover on a MS-SQL 2005 cluster. The cluster failed immiedately following the path failover. This time however, reverting the path did not fix the issue (once the controller was back online). The disk showed up in Microsoft Disk Manager as unreadable, and I could not format it or repartition it. I eventually had to present a new quorum lun to my host, attach it to the two nodes and do a cluster recovery of the quorum drive.

It seems to me that for some reason, the Quorum drive's path is not being "masked" by VMware. For our physical clusters, the Hitach multi-path software takes care of this, so the host dosen't know anything has changed. Has anyone run into this before? I wonder if we've got a configuration issue. I'm going to setup a small 2 node cluster and test later this week, and any input would be appreciated.

Thanks,

-Jonathan

Reply
0 Kudos
1 Reply
rogard
Expert
Expert

Reply
0 Kudos