VMware Cloud Community
mcgreen1966
Contributor
Contributor

SRM / SRA Nimble Storage issue

Hi we had SRM replication fully running between our primary and secondary sites, we have done many test failovers and fail back with reprotection with no issues at all,

Monday we lost power at our main site, so the nimble storage just powered off, no graceful shutdown.

we got everything back up, but now when I try to initiate a test SRM replication with a test VM I get this error on SRM.

'Failed to promote replica devices. failed to promote replica consistency group 'NIM-SRM-TEST02'. Skipping failover operation for device group 'NIM-SRM-TEST02' As initiators were missing for one or more hosts.

I don't see any missing initiators on the 2 nimbles. 

i have deleted all the Replicated volumes, and rebuild the replication partner setup. including in SRM. but its still failing with this error.

I am a bit lost as to whats going on, we are using the latest version of SRM, and SRA storage adapter.

SRM versions are the same, but Vcenter is running 7 at one site and 8 at the other, as we are in the process of upgrading.

any suggestions appricated. !

 

0 Kudos
19 Replies
vFouad
Leadership
Leadership

Hi Mcgreen,

I hope you have managed to clear this issue up, if you haven't have you reached out to Nimble to verify that everything is good on their end?

Have you checked the username and password for the SRA, when you do a rescan on the SRA is the config shown as you expect it?

the failed to promote error usually suggests an array to SRA communication failure, unless a specific error is called out by the SRA and logged in the SRM logs.

Did you open an SR? if so please feel free to DM that to me an dI'll take a look.

Thanks,

0 Kudos
KevinLeek
Contributor
Contributor

Did you ever find a resolution? I'm experiencing this now and so far no resolution from HPe.

0 Kudos
vFouad
Leadership
Leadership

Hi KevinLeek,

If you have an open SR with VMware, I'll happily take a look at your SR, and any logs, there is usually an error code from the SRA that SRM will log which can help point the SRA team in the right direction... As always we are happy to collaborate with our partners to help solve issues for everyone.

Please reach out via Direct message, if you want to share your SR.

Best,

vFouad

 

 

0 Kudos
mcgreen1966
Contributor
Contributor

Spoiler
Hi, i have an open P1 ticket with HPE.  They are saying their SRA isn't compatible with Vcentre 8 and SRM 8.7, and they are in the process of writing a new and updated SRA. They asked if we could downgrade, but this isn't possible for us. I think personally this is a red herring, and there is a bug in their SRA with the photon version of SRM.  I am 100% sure we were using a much earlier version of SRM when we had this issue, we upgraded to 8.7 in order to see if it resolved the issue we are seeing. in the mean time we are going to try and use vsphere replication instead, not ideal but its the only choice we have.

I would be interested in your setup and the issue you are seeing, ???  perhaps if you have a ticket number with HP they can be linked internally to HPE, and help put pressure to resolve the issue.

mcgreen1966
Contributor
Contributor

Hi i do have the logs from SRM.  This is what HPE replied to me:-

Speaking to engineering, they believe that SRM version in use is the issue. We currently dont support NVMe and they believe this is the issue.

2023-05-31T11:37:39.071+01:00 warning vmware-dr[01117] [SRM@6876 sub=Storage opID=6ef3a517-c4fd-44f8-9f2e-ef34f706de96-failover:c955:6e2e:7d54:50bf] Failed to obtain initiators for NVMe access group 'domain-c26-nvme'
2023-05-31T11:37:39.071+01:00 warning vmware-dr[01117] [SRM@6876 sub=Storage opID=6ef3a517-c4fd-44f8-9f2e-ef34f706de96-failover:c955:6e2e:7d54:50bf] Access group 'domain-c26-nvme' doesn't contain any initiators

what i don't understand is we are not using NVMe ?? and I don't have any initiator groups or access group domain with that name ??

regards

Mark

0 Kudos
kageoman
Contributor
Contributor

Hi,

Chiming in on this as we hit the same issue when trying to perform test failovers. We are utilizing Nimble as the storage solution also. We downgraded to SRM 8.6 after hearing this was a possible bug with 8.7 from VMware engineering. The plan ran smoothly afterwards.

I'm chasing a bug ID for reference but thought I would share.

0 Kudos
mcgreen1966
Contributor
Contributor

Hi , yes we have done the same, downgraded to 8.6 and it all works fine now.

Although the VMWARE support matrix says its not a supported combination it all seems to work.

The HP engineer said its an issue with the HP SRA, I still have my case open with HP, which they are aware of and are fixing.

But no ETA.

As soon as the new SRA is available i will drop a note on here .

Cheers

Mark

0 Kudos
ads7
Contributor
Contributor

Hi, i am having the same issue using SRM 8.7 and Nimble. I have logged a ticket with Nimble support. Can you expand on how you downgraded to 8.6? Thanks

0 Kudos
KevinLeek
Contributor
Contributor

I just got the "must downgrade" answer from HPe as well. Did you have to wipe everything and reinstall from scratch or was there a downgrade method. I'm not finding explicit info.

0 Kudos
mcgreen1966
Contributor
Contributor

Hi you do Not need to wipe anything , we just turned off the new 8.7 virtual servers , and setup 2 new 8.6 srm virtual servers, registered with vcentre and installed the SRA , we only have 4 protection plans so it was very quick to rebuild these ,  the virtual servers are still on the same replicated nimble volumes . So they get picked up by srm once you add the volumes / storage back into SRM

hope that answers your questions !

 

ads7
Contributor
Contributor

Thanks! I will go down the route of switching off SRAM/SRA 8.7 servers and installing the SRM/SRA 8.6 servers.

0 Kudos
vFouad
Leadership
Leadership

Hey all,

I'm trying to get this chased down from the VMware side, and want to reach out to Nimble engineering to see if there is anything we can do to facilitate them updating their SRA code, If one or Two of you could share your HPE-Nimble SR Numbers with me, via direct message, I can get the process started on our side.

Thanks,

vFouad

0 Kudos
ads7
Contributor
Contributor

Hi all, just to add Nimble support looked into my SRM 8.7 issue and after running some commands on our Nimble HF40 arrays (which did not resolve the issue) advised me to "Reach out to VMware and ask about BUG PR 3246483"

0 Kudos
mcgreen1966
Contributor
Contributor

hpe said it was their SRA that wasn't compatible with SRM 8.7 to me ?? and they were fixing it.

0 Kudos
vFouad
Leadership
Leadership

Hey all,

A small update here, VMware and HPE Nimble both fully engaged and are collaborating on this, We will have an update when there is a fix for this issue. Please rest assured this is a high priority issue, and we are working to solve the compatibility issue as soon as possible.

In the mean time, the workaround detailed in this thread seems to be the best option for right now.

Please feel free to reach out if you need any more information on this issue.

vFouad

ads7
Contributor
Contributor

Hi, do you have a timeframe on this fix? Thanks

0 Kudos
trevez
Contributor
Contributor

we have very similar issue, SRM broken for entire organisation.  We downgraded SRM to our last version from a backup (8.5)  and got the same initiator errors.  

"Failed to promote replica devices. Failed to promote replica consistency group '*****'. Skipping failover operation for device group '******' as initiators were missing for one or more hosts."

 

Whats weird is initial failover worked fine, run a failback and thats when we got above error.   Thats how its been since.

Have a case running with vmware now who just seem to blame storage from SRM logs, but no changes have happened storage side to cause this.

0 Kudos
mcgreen1966
Contributor
Contributor

Hi has this issue now been fixed in SRA Version 9.0 from HPe ????

0 Kudos
Martin_Trustack
Contributor
Contributor

The validated configuration matrix would suggest that it is. SRM 8.7 is listed as supported with SRA 9.0. Note that 8.8 isn't listed though, so stay away from that!

0 Kudos