VMware Cloud Community
KrishnaR
Enthusiast
Enthusiast

DS4000 SRM issues

I'm starting this threadto hear back from users and field on any experiences with DS4000 and SRM. Particularly interested in any issues or problems encountered. I've been working with SRM and DS4000 since beta and can try to help resolve any problems that've come up. I'm also working on an SRM guide but can't give a date on it yet.

0 Kudos
106 Replies
Michelle_Laveri
Virtuoso
Virtuoso

I'm starting this threadto hear back from users and field on any experiences with DS4000 and SRM. Particularly interested in any issues or problems encountered. I've been working with SRM and DS4000 since beta and can try to help resolve any problems that've come up. I'm also working on an SRM guide but can't give a date on it yet.

Can you say more about the guide will include...?

Regards

Mike

http://www.rtfm-ed.co.uk/?p=584

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com
0 Kudos
KrishnaR
Enthusiast
Enthusiast

It's an install + best practices guide from a storage vendor perspective. I'll be discussing ERM (enhanced remote mirroring) and how the different scenarios with it could affect SRM. Probably will include my test environment details as well.

0 Kudos
Michelle_Laveri
Virtuoso
Virtuoso

Sounds good...

Do you work for IBM by chance - or a reseller?

Just curious... If you do make this guide/whitepaper - please tell me... and I try will include a reference to it in my book on SRM...

mikelaverick AT rtfm-ed DOT co DOT uk

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com
0 Kudos
KrishnaR
Enthusiast
Enthusiast

I work for an IBM OEM (can't say specifically, though you might be able to guess based on the storage I'm talking about heh ). I will definitely keep you informed as it gets closer to being done. And update forums as well.

0 Kudos
TMeissner
Contributor
Contributor

Hi - thanks for starting this thread ...

Thru the sales channel you received an error log from our eval. We were getting the following error: "Error: Invalid XML returned from storage array management script: Failed to get node ReturnCode." Your response to our logs was ... "it looks like the LSI SRA is returning a misspelled element in its response to the testFailover/start command" You suggested P#8 would fix this issue. Currently, there is only one SRA for the DS4000 available for download. The version I am using is 1.00.35.03. Does this version of the SRA have the fix or should I be looking someplace else?

thanks!!

ToddM

I attempted to correct the problem by reloading SRM. I was not surprised to sse the same error. ... attached is my latest log file. Message was edited by: TMeissner

0 Kudos
KrishnaR
Enthusiast
Enthusiast

ToddM, you have the correct adapter version. My suggestion was to reinstall the SRA and restart SRM service. Reason I said this is because I saw this issue intermittently during test but believed we had fixed all instances of it. Furthermore, I cannot reproduce this issue on my current setup using the same adapter version. Still, I've entered a CR in our system to have developer check code once more. Seems silly that a case error caused testfailover to fail doesn't it!

I was incorrect on P#8 (this is for ESX btw), it no longer contains the patch to fix an issue where testFailover fails because ESX doesn't recognize a vendor unique check condition returned from the array. The fix exists, I'm talking to VMware to have it released asap. This is separate from your ReturnCode issue.

0 Kudos
TMeissner
Contributor
Contributor

I attended a User Group Meeting where another storage vendor demonstrated SRM. In their presentation, they implied that during their testing, a "clone" was made of the LUN to be able to isolate the VMFS volume during the test and still keep the replication going. I was wondering if the SRA for the DS4000 is doing something like this. If so does that implied that the DS4000 on the recovery side needs to have a FlashCopy Feature Code? If so, what error would I see if the feature code is missing?

Thanks!!

0 Kudos
Michelle_Laveri
Virtuoso
Virtuoso

I attended a User Group Meeting where another storage vendor demonstrated SRM. In their presentation, they implied that during their testing, a "clone" was made of the LUN to be able to isolate the VMFS volume during the test and still keep the replication going. I was wondering if the SRA for the DS4000 is doing something like this. If so does that implied that the DS4000 on the recovery side needs to have a FlashCopy Feature Code? If so, what error would I see if the feature code is missing?

Thanks!!

I wouldn't be suprised. In my research most SRA do this - the only one which doesn't appear to make a snapshot "on-the-fly" is LHN (Adam is that right?!?!?). I'm not sure of the requirements for FlashCopy. What does the README or PDF file say there requirements are...???

Does IBM have redbook on SRA/SRM???

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com
0 Kudos
KrishnaR
Enthusiast
Enthusiast

DS4000 does use FlashCopy to create a snapshot of the mirrored LUN. So yes, this needs to be enabled on the recovery side (on both sides if you're configuring bidirectional protection). I'd have to dig into dev docs to see what error code is returned but suffice it to say, without ability to create snapshots, SRM testFailover will fail. I'd like to see a retest after enabling FlashCopy. Still tracking 3 other issues -

1) ReturnCode error - Can't reproduce error. Does this go away when FlashCopy is enabled?

2) testFailover does not work if same LUN numbers are used on different hosts - Can I get further explanation on this?

3) DS4000 patch - I'm working with VMware to release a patch that affects DS4000 SRM. Will update thread when this is finalized.

Afaik, IBM does not have an SRM guide yet. I've made progress on my guide and should hopefully be ready to release it in the coming month.

0 Kudos
TMeissner
Contributor
Contributor

Thanks for the response....

As for the "ReturnCode" error .... v1.00.35.03 of the SRA for the DS4700 has an error. In the command.pl and the command.pm files there is string value, $XML_RETURNCODE = "Returncode"; This needs to be changed to "ReturnCode" with a captial C. I made the change to the script and it fixed the XML error that I was getting. I am assuming that the SRA version will be updated on the VMware site.

Now my next error in the logs are that there is an invalid snapshot of the volume. I am assuming that this is really a flashcopy issue. We are working with IBM to get the proper Flashcopy feature code activation on the recovery side DS-4700.

0 Kudos
KrishnaR
Enthusiast
Enthusiast

That's why I couldn't recreate - I'm running with DS4800. Looks like ReturnCode issue only pops up in a 4700 specific section. I'll update our database with that info. I'll wait to hear if 2) is a genuine issue after you've fixed FlashCopy.

0 Kudos
TMeissner
Contributor
Contributor

Item 2)

Using IBM DS4000/FastT Storage Manager, you can define Host Groups to quickly map the VMFS LUNS to a group of ESX servers at the same time. When a LUN is mapped to a Host group up can assign the LUN ID from 0 to xxx. If you map a LUN to an individual server you can also assign the LUN IS from 0 to xxx. The problem comes when the are identical LUN IDs used between a Host Group mapping and individual hosts. The Storage Manager application allows this and somehow can tell the difference between Host Group and an individual host. I think the SRA requests the host mappings and sees two mappings that use the same LUN IDs. It is not aware that one of those ID's is a part of a Host Group.

My workaround was to change the LUN Id's for the DS4700 to insure they were unique. After making the change we no longer received the Duplicate ID's error.

Thanks

0 Kudos
Michelle_Laveri
Virtuoso
Virtuoso

Can I say what an interesting thread this has been/is....

A couple of observations:

1. An SRA which has a case-sensitive error in the perl script is a pretty bad show...

2. I don't recall seeing a SRA for the DS4000 being available in the download - perhaps I was observant enough.

3. I think some (not all) of the vendors have been a bit remiss in the documentation. It kind of reminds me of the early days of VCB. When we were very much left to own skills to resolve problems. Frequently the PDFs available assume very good knowledge of the storage - which is not very helpful to the average VMware guy who is not a storage expert. I'm quite happy to put myself in this camp - I don't like to exaggerate my knowledge on these forums!

I think what need is more "Getting Started with..." style documentation - written by people who are not pro-storage guys....

4. PDF guide to DS4000. I'm very close to releasing my book on SRM. I've hard copy on the way to me. As long as it looks ok, I will be releasing it on LULU. We can also use LULU to distribute free PDFs. If any one want to share what they have learned from this - I would be happy to host these PDFs on my LULU account. There's a cost for me to do this - but I'm happy to do this - as it's helpful to the community and a helpful free supplement to my book....

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com
0 Kudos
TMeissner
Contributor
Contributor

Our flashcopy on the recovery side is working using the default feature code that limits us to two flashcopies. I am just working with a single test LUN. I guess I'm stuck with the generic error:

#1] :INFO:failover:exit failover.....

Error:

",

msg = "Message exceeds database maximum string length."

}

],

msg = "Message exceeds database maximum string length."

}

Task destroyed

I've attached our log for the error. Any insight would be appreciated. If you think this is a bug in the SRA code ... that OK, just let me know. I'm getting tired of chasing this one. Is there any other information that would be helpful?

0 Kudos
KrishnaR
Enthusiast
Enthusiast

I'll take a look at these logs. I'm also willing to setup a WebEx or Netmeeting to get a look at your environment. If you could email me, we can work something out.

0 Kudos
admin
Immortal
Immortal

Hi TMeissner,

This looks like an SRA problem during test failover. Looking at the SRA log it seems that the operation was successfull, however the response indicated an error.

<?xml version="1.0" encoding="UTF-8"?>

<Response>

<ReturnCode>4</ReturnCode>

</Response>

...

(2008-10-07 10:45:06) ::TRIVIA::SMsra::TestFailover::performOp::Calling tfoStart

(2008-10-07 10:45:06) ::TRIVIA::SMsra::TestFailover::toStart::Entry

(2008-10-07 10:45:06) ::VERBOSE::SMsra::TestFailover::getVolumeWithWWN::Got Volume with WWN 600a0b800042359600000c5848ce720f

(2008-10-07 10:45:06) ::VERBOSE::SMsra::TestFailover::toStart::Got Volume

(2008-10-07 10:45:06) ::VERBOSE::SMsra::TestFailover::toStart::Volume status is Optimal

(2008-10-07 10:45:06) ::VERBOSE::SMsra::TestFailover::toStart::Snapshot for Volume not present, Creating Snapshot

(2008-10-07 10:45:06) ::VERBOSE::SMsra::Next Snapshot Count=0

bindToController succesfull

Current CGN=4115

:INFO:failover:exit failover.....

-Masha

0 Kudos
beb
Contributor
Contributor

I am not sure that this question relates directly to this post, but after reading this post it looks like there is a wealth of knowledge here.

I am new to SRM and I am going to be installing it soon. I have checked all the requirements and the . All is looking good to go except for my Storage.

I have a DS4800 and the matrix lists the following as the level required. Does any one know if this is the minimum or supported version levels?

Storage Replication Adapter Compatibility List for IBM Arrays

Hardware Models Firmware IBM Storage Manager

DS4800 07.10 and 07.15 10.10 and 10.15

My current versions are,

Hardware Models Firmware IBM Storage Manager

DS4800 06.60.02.00 09.60.G5.04

Does any one know if my current version will work or not?

Does any one have a link to the IBM SRA documentaion that may supply this information?

This is my first post on the Vmware community, so please let me know if I am posting incorrectly.

Any advise or assistance is much appreciated.

Brett

0 Kudos
admin
Immortal
Immortal

Hi Brett,

>> I have a DS4800 and the matrix lists the following as the level

required. Does any one know if this is the minimum or supported version

levels?

Safe assumption would be that only the exact versions are supported. However, in most cases all higher versions are supported as well. It would be best to consult the vendor of the SRA about your particular version.

-Masha

0 Kudos
KrishnaR
Enthusiast
Enthusiast

SRM is only supported with 07.10+. No testing has been done with previous versions. I think it would take code changes to accomodate the previous FW's API.

0 Kudos