VMware Cloud Community
habibalby
Hot Shot
Hot Shot

RDM and LUN IDs Mismatch

Hi All,

Environement

  • ESX 4.1  Build 260247
  • vCenter 4.1.0  Build 345043
  • SAN Storage EMC iSCSI AX4-5i

I'm having worried issue in my Exchange Server which has got around 15 RDMs in Virtual Compatible Mode. I have added all the RDMs to the new 1st host fine and the VM comes up. When I wanted to migrate, I got an error regarding the LUN Direct-Access Mapping.

I investigated the issue and I found that the same LUNs are found on the other hosts but the the LUN IDs are different that which are presented on the first host. I cross-check the WWN on each LUN and found is same for all the hosts.. But the LUN IDs is different.

I put the effected host on Maintenance Mode and unassigned all the LUNs, rescan and I put them again and I rescaned. But no joy. I rebooted the host, and I put the LUNs back, no joy.

Any help?

Best Regards, Hussain Al Sayed Consider awarding points for "correct" or "helpful".
Reply
0 Kudos
9 Replies
Saturnous
Enthusiast
Enthusiast

Navisphere (or unisphere) must recognise this 2 hosts as cluster - can navisphere access them on port 443 and they should be registered on the same DNS root as Navisphere having same subnet togehter.

But as long the LUNs are accessed they remain with wrong SCSI ID from ESX host perspective - so maybe you have equal assigned IDs but they do not show up on the host who still access them rightful.

1. Connect with a single vSphere Client to 2 ESX hosts - edit the VM and delete the RDM and choose DELETE FROM DISK - it means the meta file not your data.

2. Move the VM to the other host.

3. Reassign the LUNs to the host who had the VM before from Navisphere.

5. NOW rescan there.

6. Attach them back and it will be fine Smiley Happy .. to check up go to console browse to the vm on the datastore.

grep vmdk *.vmx

choose your RDM according the scsi address in your VM (scsi regs).

Run vmkfstools -q on them - you will see the vml.xxxxxxxxxxxxxxxxxxxxxxxx number who describe your LUN.

The sixt digit is the LUN ID and some way after there is the WWSCSIID (naa.yyyyyyyyyyyyyyyyyy) number coded in too.

Now you can compare them to the escfg-mpath -l output.

A year ago a customer with a bunch of esx clusters and strong usage of RDMs and a SAN department which communicated with the VMWare department by RFC1149 (by pigeons) came up with same symptoms - i wrote a script which plobbed out all affected RDMs and show up the solution path in a excel sheet. Unfortunatly i lost it.

I think this will embright you alot.

http://blog.laspina.ca/ubiquitous/understanding-vmfs-volumes

habibalby
Hot Shot
Hot Shot

Thanks for your reply, I will check that out and let you know..

You brought my attention to the RDM Pointers "Mappers" I didn't delete those which created on the older hosts, I simply shutdown the VM, mapped, removed the RDM LUNs, again I attached the LUNs to the new ESX Hosts, browsed the datastore where the .vmx file located to register it on the new host. Then I re-add the RDM using the Pointers which was previously created on the older hosts. May be that's the issue?

Regards,

Best Regards, Hussain Al Sayed Consider awarding points for "correct" or "helpful".
Reply
0 Kudos
habibalby
Hot Shot
Hot Shot

Hello,

The following test I have carried out on a test VM and test RDM LUN but didn't help;

  • Created test VM
  • Created test RDM LUN and  attached to both ESX Hosts.
  • Both hosts sees the LUN with different LUN ID but correct WWN.

ESX-Host02

  • DGC iSCSI Disk (naa.60060160daf125005e0895f314b5e011)
    naa.60060160daf125005e0895f314b5e011
    vmhba43:C0:T0:L8
    8 LUN ID
    disk
    iSCSI
    10.00 GB
    NMP
    Unknown

ESX-Host01

DGC iSCSI Disk (naa.60060160daf125005e0895f314b5e011)
naa.60060160daf125005e0895f314b5e011
vmhba43:C0:T0:L24
24 LUN ID
disk
iSCSI
10.00 GB
NMP
Unknown

I attached to both hosts at the same time and I did re-scan. Both LUNs show up but with different LUN IDs Smiley Happy

I attached to ESX02 first and as Virtual RDM and I attempt to vMotion, I got the error Mapped Direct-Access LUN that is not accessible.

I removed the RDM and as you suggest and Deleted from Virtual Machine and delete files from Disk.

I added again and I choose an RDM instead of Existing Disk Mappers. But no joy Smiley Sad

What other things could be the issue? ESX host? it seems it takes the LUN ID based on Incremental Number not based on the SAN LUN ID.

Best Regards, Hussain Al Sayed Consider awarding points for "correct" or "helpful".
Reply
0 Kudos
a_p_
Leadership
Leadership

I'm not familiar with the storage system you use, however LUN ID's are usually assigned at the storage level.

Maybe the hint about storage groups in "VMotion with VMFS volumes" at http://www.emc.com/collateral/hardware/white-papers/h1416-emc-clariion-intgtn-vmware-wp.pdf (page 35) will help.

André

Reply
0 Kudos
Saturnous
Enthusiast
Enthusiast

As i mention EMC is a bit strange here - in every other SAN Storage you assign LUNs directly to a HBA WWN or a host you defined manual by adding all HBA WWNs and declare the Host LUN Number manual from Storage side.

Navisphere tries to reflect the 'Networking approach' that you attach LUNs to Hostnames and a Agent does take care to find out the according Host HBA WWNs and choose the Host LUN #.

Navisphere tries to talk with the ESX hosts per https and pull information from hostd and vpxa to find out if hosts are grouped in Clusters (which needs to have the same Host LUN IDs).

First i think you have to start from a point where the mapping to both hosts is 100% similar - because i could imagine Navisphere just count the highest LUN # allready assigned just by 1 up.

I would give EMC a call ...

Gesendet mit BlackBerry von Vodafone

habibalby
Hot Shot
Hot Shot

Hello,

This type of SAN Storage doesn't have any place to configure the LUN ID. It's auto generated them once the vDisk created.

Best Regards, Hussain Al Sayed Consider awarding points for "correct" or "helpful".
Reply
0 Kudos
habibalby
Hot Shot
Hot Shot

Hello Saturnous,

Yeah, that's what worried me actually. I got surprised when I added the same LUNs to another host that doesn't have any LUN from the EMC AX4-5i and those LUN's IDs appeared to be starting from 0 to 16. Where in another host which has got a few LUNs from the same SAN Storage it gives me different LUN IDs from the same LUNs.

What I understood from your last paragraph to un-map all the LUNs from both hosts and start mapping all of the LUNs at once! So, both ESX will count the LUN IDs based on the LUN assignment.  Correct me if I'm wrong!

Thanks,

Best Regards, Hussain Al Sayed Consider awarding points for "correct" or "helpful".
Reply
0 Kudos
habibalby
Hot Shot
Hot Shot

Just came into my mind that, I will create small LUNS and I will add them as RDM or just vmfs LUNs to ESX02 "which now doesn't have  any VMs. Those small LUNs will be 0 to 4, where the effected Exchange LUNs starts from 5 to 24 LUNs. So, after adding the small LUNs 0 to 4, I should be able to present the 5 to 24 LUNs in the same LUN IDs where they appear in ESX01.  I think this trick should resolve the issue..

I will come back with an update...

Best Regards, Hussain Al Sayed Consider awarding points for "correct" or "helpful".
Reply
0 Kudos
habibalby
Hot Shot
Hot Shot

Hi,

Here’s an update for this issue.

Problem:

vMotion is not possible and when I attempt to vMotion a VM I got an error


"Virtual Disk 'hard disk 0' is a mapped direct access LUN and its not accessible"

This error is generated due to LUN ID Mismatch and vml.xxx LUN Signature mismatch.

Even When I created a match LUN IDs in both hosts, still I’m presented with the error when I attempt to vMotion the VM.

What’s wrong?

Both, LUN ID and WWN name are matched in both ESX Hosts. Vml.xxx also matches each LUN in each host correctly.  In here, at least Cold vMotion should works, but in my case it’s not and when I attempt to Cold vMotion the VM again I’m getting the “Virtual Disk 'hard disk 0' is a mapped direct access LUN and its not accessible"

LUN Name

LUN ID

ESX01

Staff-DB1-H

5

naa.6006016086f0250054536426c29ce011                            vml.02000500006006016086f0250054536426c29ce011524149442035

LUN Name

LUN ID

ESX02

Staff-DB1-H

5

naa.6006016086f0250054536426c29ce011                              vml.02000500006006016086f0250054536426c29ce011524149442035

I have found that in the VM Properties > Mapped RAW LUN Disk > Physical LUN and Datastore Mapping File, the LUN signature / vml.xxxx is wrong and it’s not referring any LUN among all the presented LUN.

Hummmmm, something strange going in here!!

This issue happened due to using existing RDM Mapper File. J This Exchange VM was running on ESX 4.0 and all the LUNs are mapped as RDM to the Exchange VM as a Virtual Compatibility Mode. I have upgraded one of the hosts which were in the same cluster, and I added it to another cluster where it’s managed by vCenter 4.1 latest build.

The new host has access to all the LUNs which accessed by the old host. Then, I shutdown the VM remove all the RDMs and I removed it from the old vCenter inventory, after that on the new host I browsed inside the datastore where the .vmx file located and added to the new inventory on the new vCenter 4.1. When the VM comes up normally, I start adding the RDM again but this time as an existing disk from the datastore which holds the mapper files.

The problem in this part, adding an exisiting rdm.vmdk mapper file to points to a LUN that directly presented to another host, here’s the result which created all the hassle.

MUB Mail - RDM Mapping File.jpg

This shows in the VM which is running on the new host and new vCenter01. But the Physical LUN and Datastore Mapping File points / referenced to the old vml.xxx signature where the VM were running on ESX 4.0 host.

LUN Name

LUN ID

ESX01

Staff-DB1-H

5

naa.6006016086f0250054536426c29ce011                            vml.02000500006006016086f0250054536426c29ce011524149442035

LUN Name

LUN ID

ESX02

Staff-DB1-H

5

naa.6006016086f0250054536426c29ce011                              vml.02000500006006016086f0250054536426c29ce011524149442035

If you have noticed on the above screen shot, the same LUN “6006016086f0250054536426c29ce011 points to new   vml.02000500006006016086f0250054536426c29ce011524149442035     but on the VM properties it shows different vml.xxx ID vml.0200000000060 which doesn’t have any referenceJ

Bottom line, the solution for this is vary

  1. Matching the LUN IDs across all the hosts it won’t solve the problem if the vml.xx ID is different.

  2. Matching the LUN IDs across all the hosts along with the vml.xxx signature, it might solve the problem or might not. Also, the datastore which holds in the RDM Mapper File should match the LUN ID and vml.xx in the other hosts.

  3. The only solution resolves this issue is

    1. Dismount Exchange Datastore to avoid any unpredictable  issue J

    2. Stop all exchange services and disable them.

    3. Shutdown the VM

    4. Remove the RDM LUNs and choose to Remove from Virtual Machine and Delete files from disk “This step won’t delete the actual Data in the NTFS LUN.

    5. Boot the VM and make sure the VM can be vMotioned using the VMDK which holds the OS only.

    6. Re-add the RDM to the VM and make sure it got all the WWN and vml.xxx matching all the hosts in the cluster.

    7. Start Exchange Services

    8. Mount Exchange Databases if they didn’t mount by itself

Result:

Migrate virtual machine

MUBMAIL001

Completed

Administrator

26/07/2011 20:51:36

26/07/2011 20:51:36

26/07/2011 20:53:13

Best Regards, Hussain Al Sayed Consider awarding points for "correct" or "helpful".
Reply
0 Kudos