PhilSc1
Contributor
Contributor

VMware Data Protection (VDP) "Backup/Restore VM" cause Single Node Microsoft Cluster (MS Cluster) running under a single VM to lose the physical disks (Error 1038) - any ideas? Supported Config?

The environment is as follows:


A 2 node physical MS Cluster was made a single node MS Cluster and then P2V was performed to a VMware VM.   This was done because the belief that the DB2 and MQ Series based application was a MS Cluster and would need to continue as such.

So we now have a single node MS Cluster running under a single VM with VMDK disk under MS Cluster control.

Troubleshooting through the Windows Event Log:

A "Create virtual machine snapshot" was executed and completed successfully.

A VDP "snapshot" via proxy appliance was performed and "crashed".   The log shows a "VDP "Backup / Restore VM" (I am told it was just a backup, no restore) was executed and immediately followed with a MS Cluster "Error 1038" - loss of physical disk resources. 

Some 12 hours later a "Remove snapshot" was performed and exactly the time stamp that was done, the "Backup / Restore VM" showed Complete!

Questions:

1) Does VDP support a single node MS Cluster with vdsk?   (I see no indication of that, but client felt it was)

2) If yes to 1), any specific settings needed for it to work?

3) Any insights or comments on what could have been the reason why the VDP backup resulted in an immediate Err 1038 as described above?

Thanks in advance!

vNEX
Expert
Expert

Hello Phil,

I understand that your customer wants a bullet proof information about that ... so just a bit about MSCS and snapshots:

with bus-sharing mechanism in virtual mode you share single .vmdk across multiple VMs in one physical BOX.

When all the cluster nodes accessing the same single disk when performing backup operation you must ensure

that all the participating nodes/cluster services are aware of that operation and takes appropriate actions to preserve

data integrity and cluster service continuity...


Because VDP doesn't come with any VSS Writers/In guest agents for MSCS when you perform snapshot operation VDP manipulates

through on one of the nodes with shared cluster resources quorum/data disks and because VDP/ cluster are not aware of each other

snapshot operation (and quiescing) will disrupt cluster operations which will result in crash that could easily ends with some data loss

or errors you have already mentioned: Event ID 1038 — Cluster Storage Functionality

No matter if you have single node MSCS cluster!


In addition to your question: A VDP "snapshot" via proxy appliance was performed and "crashed".   The log shows a "VDP "Backup / Restore VM" (I am told it was just a backup, no restore) was executed and immediately followed with a MS Cluster "Error 1038" - loss of physical disk resources. 

This happens because VDP is configured by default to use Hot-Add backup transport were disks of a virtual machine being backed up are attached to the VDP appliance during backup operation to speed up whole process.(this approach does not utilize LAN for data transport) hence the "ownership error"....[for Hot-Add you must have proper host edition/license, VDP/host must have local access to VM disks.. ]

If you want to prevent Hot-Add feature you can migrate VM outside of storage accessible by VDP/Host or disable hot-plug device capability for the VM in its advanced configuration parameters. (devices.hotplug with false value)

Because vSphere Data Protection depends on vStorage API for Data Protection which for online backups utilize VMware snapshot technology

instead of comparing MSCS support with VDP you must take into account snapshot support on MSCS:

Here are some of the official VMware statements:

VMware KB: Microsoft Clustering on VMware vSphere: Guidelines for supported configurations

Microsoft Clustering Services (MSCS) virtual machines use a shared Small Computer System Interface (SCSI) bus. Any virtual machine using a shared bus cannot make hot changes to virtual machine hardware as this will disrupt the heartbeat between the MSCS nodes. These activities are not supported and will cause MSCS node failover: 

  • Using snapshots
  • .....


VMware KB: Unable to use Snapshots or perform a backup on virtual machines configured with bus-shari...

Microsoft Clustering requires bus-sharing and therefore cannot be used in conjunction with use Consolidated Backup, Data Recovery, or Snapshots.

So as workaround you may have to take the cluster node offline take backup and bring it back or use some 3rd party solutions with array-based snapshots and in-guest

backup agents like NetApp SnapManager etc...


In these days VDP Advanced edition offers backup agents only for Exchange and SQL servers.

Hope this helps you

Message was edited by: vNEX

_________________________________________________________________________________________ If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards, P.