Enthusiast
Enthusiast

Error while executing 'DiscoverLuns' command. IBM N-series SAN

Hi All, we're just runnign through the implementation of SRM in our environment and are having an error which I haven't seen before. Hopefully someone out there can help.

We have two Virtual Centers (Vsphere) and two IBM N3300 SAN's (basically netapp filers). ESX is connected using FCP and we have 4 datastores/LUNs connected to the ESX server at the primary data centre, each LUN is in it's own SAN Volume.

Snapmirror replication has been configured from the Primary SAN to the DR SAN and all replication is proceeding fine. All volumes are in a Snapmirrored state.

SRM installs fine and we have installed the latest SRA from IBM (1.4.3). The connection from the primary to the DR site is set up without problems but when we try to configure the Array managers we hit problems. Basically, at the point of LUN discovery we continually get the same error at 23%. The error is : Error while executing 'DiscoverLuns' command

I've had a look round and found some articles which are very similar to our problem, the problems described in the article are identicial except for one major difference, in the article the SAN's being used are IBM SVC's not N-series. The solution with the SVC problem is to ensure that you put 'vmware' in the host mapping when configuring the SAN. However, there is no equivalent to this in the N-series/Netapp as far as I can tell. Although to cross this off the list I have renamed all LUNs and initiator groups to contain the phrase 'vmware'.

Any help appreciated.

Cheers

0 Kudos
14 Replies
VMware Employee
VMware Employee

Hi there,

I think you've hit a known issue.

If you have access to the NetApp NOW website, please check out this article.

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb58002

HTH

Cormac

http://cormachogan.com
0 Kudos
Enthusiast
Enthusiast

Hi Cormac, thanks for the speedy response. However, we are not using any NFS datastores (the filer isn't licensed for NFS) and are using purely Fiber Channel HBA's on the ESX Servers.

We currently have 4 datastores configured on the primary ESX server which are connected to the primary SAN using Fibre Channel and consequently we already have an initiator group configured with an OStype of Vmware.

A bit more information - the log file suggests the discoverluns script is runnign and is finding the LUNS connecte dto the primary siteand also the replicated copies at the DR SAN. It falls over somewhere after this:

Discover Luns Started

Collecting igroup information

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

igroup vmware-FRB has initiator ****** and type FC

Collecting list of replicated Luns

lun hex vaule for /vol/datastore4/vmfs4-vmware is ******

datastore4 has a volume replica /vol/datastore4_SMT/vmfs4-vmware

lun hex vaule for /vol/datastore2/VMFS2-vmware is ******

datastore2 has a volume replica /vol/datastore2_SMT/VMFS2-vmware

lun hex vaule for /vol/datastore3/VMFS3-vmware is ******

datastore3 has a volume replica /vol/datastore3_SMT/VMFS3-vmware

lun hex vaule for /vol/datastore1/VMFS1-vmware is ******

datastore1 has a volume replica /vol/datastore1_SMT/VMFS1-vmware

Collecting NFS export information

Api nfs-exportfs-list-rules-2 requires license for nfs or flexcache_nfs

Could not find any exported NFS

Collecting list of replicated exports

Could not discover any replicated exports

Discover Luns completed with errors

</Response>

discoverLuns exited with exit code 0

'discoverLuns' returned <?xml version="1.0" encoding="UTF-8"?>

Thanks

0 Kudos
VMware Employee
VMware Employee

Do you have virtual machines on the replicated LUNs?

For LUNs to be discovered, they must be associated with VMs in some way.

Either there is a VMFS on the LUNs and a VM resides on the VMFS, or the VM has the LUNs mapped as RDMs.

Cormac

http://cormachogan.com
0 Kudos
Enthusiast
Enthusiast

Yes, we have one datastore with several running windows VM's and I have created at leass one VM in each of the other 3 datstores. These VM's don't have any operating system on as yet but this shouldn't matter should it?

0 Kudos
VMware Employee
VMware Employee

Nope - blank VMs should be fine.

It will probably be worthwhile attaching the complete vmware-dr.log file for the discoverLuns operation - it may give us more of a clue.

Cormac

http://cormachogan.com
0 Kudos
Enthusiast
Enthusiast

Hi Cormac, No Problem. latest log file is attached.

Thanks again.

0 Kudos
VMware Employee
VMware Employee

Hey Cluey,

The crux of the issue is this failure here:

Return code for discoverLuns: 4

The scripts returned an error, leaving the temporary file 'C:\WINDOWS\TEMP\vmware-SYSTEM\dr-sanprovider4704-0'

Unknown error encountered by the script

RecordOp ASSIGN: status, array-4024

Status of array 'array-4024' is set to 'syncFailed'

Failed to re-sync with storage array: (dr.san.fault.ArrayUnknownFault) {

dynamicType = &lt;unset&gt;,

faultCause = (vmodl.MethodFault) null,

command = "discoverLuns",

msg = "",

}

As you've highlighted in your original post, I too have only seen this with the IBM SVC SRA where there is a specific requirement to make sure that the ESX servers that are registered in IBM SVC contain the label 'vmware'. Is there anything in the SRA release notes for the IBM N-series which specifies a particular naming convention? From working with NetApp, I thought the only thing that you needed was an OS-Type of vmware. Maybe IBM have additional requirements.

I'd recommend opening a service request with VMware for this issue. This could be an issue with the SRA, in which care VMware Technical Support can contact IBM for assistance.

Kind regards

Cormac

http://cormachogan.com
0 Kudos
Enthusiast
Enthusiast

Thanks Cormac, I've just tried something else which is very interesting and may lead to a full solution.

While I was looking through the logs earlier I noticed several references to NFS which seemed a little odd, specifically:

Our SAN has not been licensed for NFS and consequently, we didn't add an IP address for NFS when we ran the Array managers configuration within SRM. So why would it detect the SAN IP as type NFS?

Anyway, to test I installed an old netapp SRA I had (1.0.1) which was FCP/iSCSi only with no NFS. When I ran the Array Managers configuration using this SRA it worked straight away. Ran straight through, detected all 4 replicated LUNs and looks fine (other than I'm using a NETAPP SRA with an IBM SAN, so probably not suitable for production).

In short, it looks very much like the known issue you mentioned at the beginning of this post but in reverse (if that makes sense). We are an FCP only environment with absolutely no NFS, (as opposed to an NFS only environment with absolutely no iSCSi/FCP), so by taking the shared SRA out of the equation and using the FCP/iSCSi only SRA it all works.

0 Kudos
VMware Employee
VMware Employee

Let me see if I get more info on this SRA Cluey ... however, I'd still recommend opening a support ticket in the meantime. It will raise the visibility of the issue internally.

Cormac

http://cormachogan.com
0 Kudos
Contributor
Contributor

Hey Cluey and the rest of the guys, I was wondering what version of DRA you have installed? As far as I know there are two versions, the SAN and the NAS version. The error and return code you got can not be found in the NAS version's documentation… are you sure you got the Disaster Recovery SAN Adapter 1.4 for VMware® vCenter Site Recovery Manager?

regards Albert Verhoeff

"Never send a human to do a machine's job."
0 Kudos
Enthusiast
Enthusiast

Hi Albert, the error was reported when using the latest version (1.4.3) which is the combined NAS & SAN Version. Whenever I try one of the earlier SAN only versions (so far I have tried 1.4.2 & 1.x) it works fine. The SRM logs do suggest that an attempt is being made to find an NFS datastore and this is causing an error, which results in the discovery failing (at least that is how it appears to me).

This looks like the known issue with NFS only environments except we have FCP only. I haven't managed to log either IBM or VMWare as yet as the kit is new and we have some problem with registration keys/purchase orders etc...etc - plus our VMware support has to go through IBM (or at least that is what I have been told). So for now, as we have no intention of licensing NFS the only options we have is to use an earlier version of the SRA. Although IBM don't make it easy to download this (or anything else for that matter).

0 Kudos
Contributor
Contributor

Hey Cluey, NetApp nor IBM has not yet officially released this new version. I think it is because of the BUG mentioned earlier (KB58002). Though it might be available on the VMWare site its not on IBM or NetApp :-S

Have you tried to register your N Series yet for downloading software from IBM? hope it Works

regards Albert

"Never send a human to do a machine's job."
Contributor
Contributor

Cluey,

If you haven't already figured this out, I think I might have the answer for you. I just discovered it in an NFS only environment after implementing the NetApp workaround. If you used RBAC, did you modify the permissions on the filer for all of the functions that the unified 1.4.3 SRA needs? The required permissions were different for NAS and SAN, and therefore if you configured based on one or the other, the unified SRA cannot perform all of the tasks it needs to now perform just to do discovery. I just posted about this over at my blog site.

Andy

0 Kudos
Contributor
Contributor

For NAS environment I have found resolution in using SRA version 1.4.1 in this issue Smiley Happy

-- Igor Nemilostivy
0 Kudos