VMware Cloud Community
bradyrandolph
Contributor
Contributor

Migrated MS Cluster Failure

Hi-

I migrated our VM MS clusters from 2.5 to 3.0 last week and now we are seeing major issues with what looks to be the shared disks. There are many errors in the event logs stating "Reservation of cluster disk 'Disk *' has been lost" or "Delay write failed". We currently have clusters running fine in our VI3 environment but they were built from scratch in VI3.

Has anyone ran into similar issues as we have?

Thanks!

0 Kudos
6 Replies
rubensluque
Enthusiast
Enthusiast

Is this a intra-box ms cluster connected to a fibre channel storage ? If you upgrade to ESX 3.0.1 or ESX 3.0.2, you need to install 2Gbps HBA drivers. VMware supports only 2Gbps drivers to work with MSCS cluster.

Check this KB to get more details:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1560391

bradyrandolph
Contributor
Contributor

We are using ESX 3.0.2 and was built from scratch. The system these VMs resided on before was a 2.5.3 host.

So we need to down grade our HBA driver? What would be the reason for this?

0 Kudos
bradyrandolph
Contributor
Contributor

I changed the HBA driver to 2 Gb but that didn't resolve the isssue. Here is what I am seeing in the cluster logs.

00000f6c.00000f78::2007/09/10-20:22:03.416 ERR Physical Disk : IsAlive, error checking device, error 170.

I have disabled the cluster service on the passive node to eliminate any disk sharing between VMs. But, the 170 error is “The requested resource is in use” so why is there a SCSI reservation that the system isn't accounting for?

Thanks.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

Are the shared storage for the MSCS VMDKs or RDMs? Is the C: (boot) drive on local storage or remote storage?

MSCS for ESX v3 suggests that (http://www.vmware.com/pdf/vi3_301_201_mscs.pdf)

Boot Drives go on Local storage NOT SAN/remote storage

SHared drives become RDMs.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
bradyrandolph
Contributor
Contributor

We currently have MSCS working on VI3 with the boot drive on SAN which is why we decided to migrate our clusters to VI3 from 2.5.3.

We have been looking into moving our boot drives to local disk.

All of our disk files are vmdks. What are RDMs?

0 Kudos
bradyrandolph
Contributor
Contributor

Here was the issue. We stacked all of our shared VMDKs on one lun hoping to save SAN space. Well when a shared disk is being used, there is a SCSI reservation lock put on that vmdk and the whole LUN, causing a "Write Failed" error, which makes perfect sense. We split out our vmdk's to seperate LUNs and everything was back to normal.

FYI..The VMDKs must be of the "Thick" typ.

Thanks,

Brady

0 Kudos