VMware Cloud Community
andrebrownjm
Contributor
Contributor

Test Failover Fails

I am configuring SRM 4.1, attempting to test a Recovery Plan, but it fails with the following errors:

2. Prepare Storage  	Error: Non-fatal error information reported during execution of SRA: testFailover Output:
"C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\/scripts/SAN/LeftHand Networks/jre/bin/java" -cp "C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\/scripts/SAN/LeftHand Networks/UI.jar" com.lefthandnetworks.commandline.Srm.Srm < "inputmE5L9R.xml""
UNEXPECTED: There was no writable space on snapshot named "Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1" to delete, continuing...
NOTE: Had this been a real failover the remote parent volume named "LEVEL2-VMS-DR" would have be changed to a primary volume, continuing...
ADDED: Added volume or snpashot "Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1" to the volume list named "SRM_VL_1", continuing...
ADDED: Created a server named "SRM_AG_2" with IQN "iqn.1998-01.com.vmware:localhost.ender.com:1094204767:33", continuing...
volume : Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1 is in volume list SRM_VL_1
volume : Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1 is in auth group SRM_AG_1
ERROR: command to address 192.168.206.9 failed because null

Error:

. 	00:00:05
    2.1. Attach Disks for Protection Group "Level 2" 	Error: Non-fatal error information reported during execution of SRA: testFailover Output:
"C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\/scripts/SAN/LeftHand Networks/jre/bin/java" -cp "C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\/scripts/SAN/LeftHand Networks/UI.jar" com.lefthandnetworks.commandline.Srm.Srm < "inputmE5L9R.xml""
UNEXPECTED: There was no writable space on snapshot named "Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1" to delete, continuing...
NOTE: Had this been a real failover the remote parent volume named "LEVEL2-VMS-DR" would have be changed to a primary volume, continuing...
ADDED: Added volume or snpashot "Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1" to the volume list named "SRM_VL_1", continuing...
ADDED: Created a server named "SRM_AG_2" with IQN "iqn.1998-01.com.vmware:localhost.ender.com:1094204767:33", continuing...
volume : Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1 is in volume list SRM_VL_1
volume : Level2_Remote_Snapshot_Schedule_for_SRM_Rmt.1 is in auth group SRM_AG_1
ERROR: command to address 192.168.206.9 failed because null

Error:

.

The SANs are LeftHand running SAN/iQ 8.5.

Anyone had any experience with an error like this?

The answer you seek is *+5,2*3,2
Reply
0 Kudos
11 Replies
andrebrownjm
Contributor
Contributor

Ok, I figured it out. The error was due to a problem with the iSCSI initiators on the servers. I'm configuring HP DL380s with Broadcom NICs. These NICs have iSCSI offloading capabilities, so each of them shows up as an iSCSI adapter/initiator. However, they each had separate WWNs. When the SAN added the entry for the iSCSI initiator to the access list for the LUN, it selected one of the network cards and used it's WWN. However, the WWN that should be used it the one for the iSCSI software initiator. I changed the WWN on all the Broadcom iSCSI adapters/initiators to the WWN of the VMware iSCSI initiator and the test worked.

The answer you seek is *+5,2*3,2
Reply
0 Kudos
idle-jam
Immortal
Immortal

i've the same problem. can you please explain further how this can be solve?

Reply
0 Kudos
andrebrownjm
Contributor
Contributor

The problem in my case was that the VMware Software iSCSI adapter's unique identifier (the WWN) was different from the unique identifiers on the NICs, which have iSCSI offloading functionality. In my setup, I had configured multipathing, so both NICs were bonded to the VMware iSCSI software adapter (vmhba37 in my case). My guess is that having different unique identifiers disrupted communication since they were bonded. I'll try to post screenshots later.

The answer you seek is *+5,2*3,2
Reply
0 Kudos
idle-jam
Immortal
Immortal

in this case, you disable iscsi link multipath to resolve the issue?

Reply
0 Kudos
andrebrownjm
Contributor
Contributor

In my case I wanted multipathing, so I configured each of the NICs to have the same iSCSI Name as the software adapter. See the attached screenshots.

The answer you seek is *+5,2*3,2
Reply
0 Kudos
idle-jam
Immortal
Immortal

both my value are the same. i still get the same thing Smiley Sad

Reply
0 Kudos
andrebrownjm
Contributor
Contributor

When you say both values are the same, did you compare the values on the NICs with the value on the software adapter? I forgot to point this out in my last post - the machine which I used for the screenshots didn't have NICs with iSCSI offload, so they don't show up in the Storage Adapters section. So compare the values I showed in the shots to the values for your hardware.

Are you sure you have iSCSI offloading? Just need to be sure you're having the same issue. Can you post your log?






The answer you seek is +5,23,2

The answer you seek is *+5,2*3,2
Reply
0 Kudos
idle-jam
Immortal
Immortal

here is my log and also attached is the screenshot of my iSCSI IQN.

Error: Non-fatal error information reported during execution of SRA: testFailover Output:

"C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\/scripts/SAN/LeftHand Networks/jre/bin/java" -cp "C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\/scripts/SAN/LeftHand Networks/UI.jar" com.lefthandnetworks.commandline.Srm.Srm < "inputg3bXqi.xml""

UNEXPECTED: There was no writable space on snapshot named "HQ_Sch_RS_1_Rmt.1" to delete, continuing...

NOTE: Had this been a real failover the remote parent volume named "DR" would have be changed to a primary volume, continuing...

ADDED: Added volume or snpashot "HQ_Sch_RS_1_Rmt.1" to the volume list named "1", continuing...

ADDED: Created a server named "SRM_AG_1" with IQN "21:00:00:1b:32:11:25:86", continuing...

volume : HQ_Sch_RS_1_Rmt.1 is in volume list 1

volume : HQ_Sch_RS_1_Rmt.1 is in auth group DR

ERROR: command to address 192.168.2.232 failed because null

Error:

Reply
0 Kudos
andrebrownjm
Contributor
Contributor

What model NICs do you have? Can you check if the NICs have any iSCSI settings (under Configuration -> Storage Adapters).






The answer you seek is +5,23,2

The answer you seek is *+5,2*3,2
Reply
0 Kudos
Wibowo
VMware Employee
VMware Employee

Have you checked whether the iSCSI target is listed in the Send Target's Dynamic Discovery on the recovery host? Add this if it is not.

Another thing to check is the SRA that you would have installed in the SRM Server. Try to get the latest one

HP LeftHand P4000 Storage Replication Adapter

Version 9.0.0.3561 | Released 11/11/2010

Kind Regards,

Bo

Reply
0 Kudos
atgjake
Contributor
Contributor

FYI, I was having the identical issue while both testing and performing a fail-over.  Using P4500G2 nodes running SANiQ 8.5 with the 8.5 version of the SRA.  I upgraded ONLY the SRA to 9.0 and it solved the problem.

Side note: I called HP (LeftHand) support and opened a case regarding this issue.  After speaking with an engineer, they confirmed that the 8.5 version of the SRA had a problem with "testing", although I found that it had a problem with both testing and actual recovery.  The support engineer stated that they will NOT be releasing a patch to fix the issue.  Again, I found that installing 9.0 of the SRA fixed the problem (without the need to upgrade all the storage nodes to 9.0).

I hope this helps someone else. Smiley Happy

Reply
0 Kudos