VMware Cloud Community
DeMichel93
Contributor
Contributor

SRM 8.3 and vSphere HA - Recovery is stuck at "Delete file"

Hello,

I'm using SRM 8.3 with Stretched Storage from PureStorage (ActiveCluster) and vSphere HA Enabled on both sides but I encountered a problem, When I try to Recover VMs with vMotion Recovery enabled via SRM the recovery plan is stuck at step "19.1.2.1 Recover storage consistency group".

At this time Recovery plan/SRM tries to remove files from the protected datastore, to be exact, it tries to remove "vSphere-HA/FDM-stringoflettersandnumber-vcxx" folder but it's unable to since vSphere HA is enabled on both sides, when I disable vSphere HA SRM proceeds without a hassle and just continues to recover the VMs. I tried to select nonprotected, by SRM, datastores for heartbeats but vSphere HA still creates those folders on protected datastores and SRM is still unable to proceed with the recovery. I also tried to disable VM Monitoring in vSphere HA settings but to no avail. SRM just loops and tries to delete the files/folder. Anybody got any tips for this behaviour?

vCenter 6.5, SRM 8.3.02, PureStorage SRA 3.1

Reply
0 Kudos
4 Replies
ashilkrishnan
VMware Employee
VMware Employee

Hi,

1. Do you get any errors when plan eventually fails/times out?  Please share screenshot, if possible.

2. Any custom HA settings for affected VM ?

3. How many VMs are reporting this symptom ?

4. Do you face similar issues when trying to manually do a cross vCenter vMotion of a non-protected VM ?

Reply
0 Kudos
DeMichel93
Contributor
Contributor

Hello, thank you for answering.

1. Do you get any errors when plan eventually fails/times out?  Please share screenshot, if possible.

I waited for about 20 minutes and no fail/timeout, just repeated tasks trying to delete files.

I disabled HA on Protected Site from which the VM is "recovered" and that helped, Delete file task completed as well as Recovery Plan changed to Recovery Complete status. Disabling HA on Recovery Site does nothing at this point.

BTW, the "recovered" machine, by the time SRM tries to delete files, has already been vMotion'ed to the recovery site and working.

2. Any custom HA settings for affected VM ?

No, HA is pretty much just enabled on Cluster, no custom settings are enabled on specific VM. I tried to change Heartbeat Datastores to one's that are not protected by SRM but this does nothing.

3. How many VMs are reporting this symptom ?

I've tested three separate protection groups (separate datastores) with their own recovery plans and all of the have the same syptom, they are all stuck at step 19.1.2.1. when SRM is trying to delete a file from datastore and it tries to repeatedly delete files, when one delete task fails it immidiately tries again and it goes on and on.

4. Do you face similar issues when trying to manually do a cross vCenter vMotion of a non-protected VM ?

Cross vCenter vMotion works without an issue, when I initiate Migration with changing compute resources only migration starts, half-way through a new task/event is popping up "Initiate vMotion receive operation" and it completes without any issues as well. VM is properly migrated to other site.

Reply
0 Kudos
DeMichel93
Contributor
Contributor

Okay, I got in touch with VMware Support and after a month or so It was concluded that there's a problem with how SRM handles FDM's in 8.3, there will be a new SRM release (8.3.1) which should fix this problem. I will write here if this problem will be fixed or not.

satish123REUGHE
Contributor
Contributor

Ran into a similar issue with another customer using Netapp SRA, and he is using SRM appliance 8.3.1. 

Is the bug address in this release? Any new release where this bug is fixed.

Reply
0 Kudos