VMware Cloud Community
KrishnaR
Enthusiast
Enthusiast

DS4000 SRM issues

I'm starting this threadto hear back from users and field on any experiences with DS4000 and SRM. Particularly interested in any issues or problems encountered. I've been working with SRM and DS4000 since beta and can try to help resolve any problems that've come up. I'm also working on an SRM guide but can't give a date on it yet.

Reply
0 Kudos
106 Replies
KrishnaR
Enthusiast
Enthusiast

Another possible way to pursue support question is through IBM's RPQ process.

Reply
0 Kudos
KrishnaR
Enthusiast
Enthusiast

Update - there is an ESX hotfix available for IBM SRA running on ESX 3.5U1 with SRM 1.0. This fix is already incorporated in U3 to support the next version of SRM.

Reply
0 Kudos
Sarek
Hot Shot
Hot Shot

I'm in the process of installing SRM on a DS4800 (1 on each site). I have got the hotfix for Update 1, and it states:

*The

VMkernel build number MUST BE equal to 82663, otherwise, the hot patch cannot

be installed*

my output is:

# rpm -qa | grep vmkernel

VMware-esx-vmkernel-3.5.0-113339

This means i can't install the patch at the moment, or i'll have to downgrade all my ESX server to Update 1 (build 82663).

The problem i have is that the array manager gives the error message: ERROR OCCURED.

When i look at the logging i see the following error (highlighted the section with the error)

Command Line for discoverArrays: "D:\VMware\VMware Site Recovery Manager\external\perl-5.8.8\bin\perl.exe" "D:/VMware/VMware Site Recovery Manager/scripts/SAN/IBM/command.pl"

Input for discoverArrays: <?xml version="1.0" encoding="UTF-8"?>

<Command>

<Name>discoverArrays</Name>

<ConnectSpec>

<Name>DS4800</Name>

<Address>225.255.255.255</Address>

<Address>225.255.255.255</Address>

</ConnectSpec>

<OutputFile>C:\WINDOWS\TEMP\vmware-SYSTEM\dr-sanprovider0</OutputFile>

<LogLevel>trivia</LogLevel>

</Command>

Environment ALLUSERSPROFILE=C:\Documents and Settings\All Users will be set for the script

Environment ClusterLog=C:\WINDOWS\Cluster\cluster.log will be set for the script

Environment CommonProgramFiles=C:\Program Files\Common Files will be set for the script

Environment COMPUTERNAME=DZMVC001 will be set for the script

Environment ComSpec=C:\WINDOWS\system32\cmd.exe will be set for the script

Environment FP_NO_HOST_CHECK=NO will be set for the script

Environment NUMBER_OF_PROCESSORS=1 will be set for the script

Environment OS=Windows_NT will be set for the script

Environment Path=C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;D:\VMware\VMware Site Recovery Manager\scripts\SAN\IBM will be set for the script

Environment PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH will be set for the script

Environment PROCESSOR_ARCHITECTURE=x86 will be set for the script

Environment PROCESSOR_IDENTIFIER=x86 Family 15 Model 4 Stepping 8, GenuineIntel will be set for the script

Environment PROCESSOR_LEVEL=15 will be set for the script

Environment PROCESSOR_REVISION=0408 will be set for the script

Environment ProgramFiles=C:\Program Files will be set for the script

Environment SystemDrive=C: will be set for the script

Environment SystemRoot=C:\WINDOWS will be set for the script

Environment TEMP=C:\WINDOWS\TEMP will be set for the script

Environment TMP=C:\WINDOWS\TEMP will be set for the script

Environment USERPROFILE=C:\Documents and Settings\Default User will be set for the script

Environment windir=C:\WINDOWS will be set for the script

Starting process: "D:
VMware
VMware Site Recovery Manager
external
perl-5.8.8
bin
perl.exe" "D:/VMware/VMware Site Recovery Manager/scripts/SAN/IBM/command.pl"*

discoverArrays exited with exit code 0

discoverArrays's output:

D:/VMware/VMware Site Recovery Manager/scripts/SAN/IBM[2008-10-31:: 14:19:51]:INFO:discoverArray:call the discoverArray.pl file.....

:INFO:discoverArray:exit discoverArrays.....

discoverArrays's errors:*

'perl' is not recognized as an internal or external command,

operable program or batch file.

'discoverArrays' returned

Failed to retrieve script results

Work function threw std::exception: XML document is empty

Fault:

(dr.fault.InternalError) {

dynamicType = <unset>,

reason = "XML document is empty",

msg = ""

}

Error set to (dr.fault.InternalError) {

dynamicType = <unset>,

reason = "XML document is empty",

msg = ""

}

Has anyone have the same issue's, and is the solution downgrading to ESX 3.5 update 1 ?

Thx

Sarek

If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
Sarek
Hot Shot
Hot Shot

Added the following path: "D:\VMware\VMware Site Recovery Manager\external\perl-5.8.8\bin";"D:\VMware\VMware Site Recovery Manager\scripts\SAN\IBM" to the environment variables. And now i can add the storage (local en remote).

Shouldn't these paths be added with the install of VMware SRM on the server ??

Sarek

If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
rwmiller
Contributor
Contributor

Hello,

Your firmware is down level and will not work with the SRA. One thing that can be confusing is that there are two different version numbers used with the IBM DS. One is for their management software which will be version 10.10.xx.xx and the other is for the firmware in the array which would be 7.10.xx.xx. You indicate that your array is running version 6.60.xx.xx firmware and that is downlevel from the 7.10.xx.xx release and in fact the 7.10 release was a major upgrade in the firmware with a lot of enchancements and changes such as removing the 2TB volume size limit. The upgraded firmware is avaliable but be aware that you have to take the array offline to do this upgrade unlike other previous upgrades and that if you do upgrade you can not go back without having to restore your data from backup.

Bob

Reply
0 Kudos
KrishnaR
Enthusiast
Enthusiast

Bob is correct. 06.60.xx.xx is unsupported SP FW. The API used for the SRA is not fully compatible with this older FW. So the SRA COULD work with 06.60 but it's not guaranteed to.

Sarek, the scripts path is registered when SRA is installed. There was some question as to whether SRM should the one the register Perl. We're resolving that now and it should be fixed for the next SRM/SRA release. Until then, unfortunately, the Perl path has to be added manually to the Windows PATH variable.

Reply
0 Kudos
KrishnaR
Enthusiast
Enthusiast

And yes, SRM 1.0 only supports ESX 3.5U1 (and VC 2.5U1).

Reply
0 Kudos
Iuridae
Contributor
Contributor

Hi everyone!

Just the thread for me. We're about to setup the recovery plans and have a few questions about what to expect next time we return to our datacenter.

We have four hosts diveded into two sites with a DS4700 Storage on each site. We want it to be bi-directional. So far we have installed ESX 3.5 U3, VC 2.5 U3 and SRM 1.0 (not U1) and the IBM Array Manager. We managed to see the LUN ID in the Array Manager after we added the path to the pearl folder to the enviroment varaible and things seems to work. This is how we left it and will return in short.

But, each host can only see the Primary LUN on their DS4700. Do they need to see the secondary LUN mirrored from the other site, or how does this part work? Activate FlashCopy i read before. Is this a necessary step and is there more we should think about?

Thanks in advance,

Reply
0 Kudos
dex_1234
Contributor
Contributor

From the storage configuration perspective, you'll want to ensure you have flashcopy and ERM enable on both the protected and recovery site DS4700 subsystems.( assuming you want to test failback in both directions or at least have flashcopy enabled on the recovery side subsystem) You'll want to go ahead and activate ERM and configure the logical drives that you want to mirror with StorageManager for the protected site subsystem to the recovery site subsystem.( I'm assuming the physical communication for the subystems has already been configured, if you're working within a local test configuration, go ahead and make sure you have have the two subsytems communicating via a fabric through its mirroring ports ). You'll want the mirroring relationship established before proceeding with the rest of SRM configuration. The logical drives that are in the mirror relationship at the recovery side should already be mapped as well. This is probably a good time to point out that you want to have your mapping and partitioning layout configured via StorageManger for the protected/recovery side subsytems. I went ahead and manually scanned for all of the LUNs that I expected to be registered by the recovery side ESX hosts to make sure all LUNs were correctly seen and also that my multipathing design worked as I intended in regards to the number of paths available to my recovery side subsystem. The multipath considerations is often overlooked so make sure you see the LUNs as expected, number of paths as expected and so forth. DS4000/5000 multipathing behavior under the current VMware failover code probrably deserves its own thread, but multipathing design is a critical consideraton of your BC/DR planning with SRM and DS4/5K line subsystems. Once you get furthur into SRM configuration, it'll become more clear as to what SRM sees from it's perspective. The reason you need flashcopy enabled on the subsystem is if you run a test of your recovery plan later. The SRM server on the recovery side via its SRA adapter will initiate a flashcopy of your mirrored logical drives and it's the flashcopy that is presented to the recovery side ESX servers. Note that the recovery side logical drives in the mirror relationship is read only until they are promoted, but during the testing of your recovery plan you want a READ/WRITABLE logical drive for ESX to register. In short, configure mirroring, mapping/partition layout, and multipathing design as desired for the DS4700 first then proceed with the rest of SRM/SRA pre and post configuration. If you administer both the ESX and storage layers then its straight forward, but if you're in a large enterprise with a different storage group and possibly SAN/NETWORK group, you'll want to coordinate with these guys as well. Just from my own personal insight, the lack of communication can really cause problems in the design of your SRM strategy. Hope some of this helps.

Reply
0 Kudos
Iuridae
Contributor
Contributor

Hi Dex,

As I understand most of it has been done, except enabled flashcopy on the the storages.The communication works and also mirroring. On each site the ESX hosts can see both the Primary LUNs and the mirrored Secondary read-only LUNs, however only the Primary LUNs have been added as ESX storage. I assume that the read-only LUNs will be added once a test failover/failover will occure?

Very informative post, very helpful! Thank you!

Reply
0 Kudos
Sarek
Hot Shot
Hot Shot

Hi Dex,

I assume that the read-only LUNs will be added once a test failover/failover will occure?

Yes these LUNs will be seen when the failover is completed.

If you find this information useful, please award points for "correct" or "helpful".

If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
Iuridae
Contributor
Contributor

bindToController succesfull

Current CGN=508

:INFO:main:Done with failover

Error:

",

msg = "Message exceeds database maximum string length."

}

],

msg = "Message exceeds database maximum string length."

}

We have been struggling with that error and haven't found a solution to it. First impression is that some column in SRM datastore is to short.

We are running SRM 1.0 update 1 and SRA 1.00.35.07 and recieves that error when we run a test failover.

Any help would be appreciated.

Reply
0 Kudos
admin
Immortal
Immortal

"Message exceeds database maximum string length." error is harmless. However, it might indicate that storage operation failed. Details about storage failure are in the SRM logs. Could you post these logs from recovery site?

-Masha

Reply
0 Kudos
KrishnaR
Enthusiast
Enthusiast

This message indicates that you've run out of free disks during the testFailover process. See note on restriction in 01.00.35.07 Readme. This restriction will be lifted in an upcoming SRA release.

Reply
0 Kudos
Iuridae
Contributor
Contributor

Yes, thats correct. When we made some disk unconfigured the error disappeared.

For the snapshot to work there need to be free disks thats isn't a part of a raid-set.

Reply
0 Kudos
iefke
Enthusiast
Enthusiast

Config: 2 x IBM DS 5100 firmware 07.36.12.00, synchrone replication

VMware ESX 3.5 U3 including all January 2009 updates

VMware SRM version 1.0.1 build 128004

Problem: We have problems with configuring the SRA.  The first problem was the following error during the first screen in the SRA

LUN's with duplicate IDs or numbers..................

We had the following config:

host a lun a mapped with lun id 0 (vmware cluster)

host b lun b  mappeld with lun id 0 (Windows cluster host)

I changed all the LUN id with unique numbers. Now the error went away,

The following problem is that the SRA does not see replicated datastores in the last screen in the SRA. The LUNs are all in sync and replicated.

In the SRM logs i found the following errors:

Recomputing LUN groups for array pair '600a0b80004771f400000000493389ff' --> '600a0b80004770680000000049364b17'

Found 20 replicated LUN pairs

No lun groups created since there are no replicated datastores

Has anybody a suggestion for this problem?

www.ivobeerens.nl

Blog: http://www.ivobeerens.nl
Reply
0 Kudos
dex_1234
Contributor
Contributor

There will be a new IBM SRA build out shorly that should resolve this error. Upgrade the SRA at both sites once it releases. I'll check to see how soon.

Reply
0 Kudos
iefke
Enthusiast
Enthusiast

i've got a new SRA (it's not on the VMware site) from IBM that solves the SRA problem but introduces another.

www.ivobeerens.nl

Blog: http://www.ivobeerens.nl
Reply
0 Kudos
admin
Immortal
Immortal

>> i've got a new SRA (it's not on the VMware site) from IBM that solves the SRA problem but introduces another.

Could you provide detail about the new problem?

-Masha

Reply
0 Kudos
iefke
Enthusiast
Enthusiast

when i run i recovery plan with one vm with one disk everything is okay. When i run the recovery plan with a vm with two or more disks (one two of more luns) the test recovery fails on the prepare strorage.

www.ivobeerens.nl

Blog: http://www.ivobeerens.nl
Reply
0 Kudos