VMware Cloud Community
91gsixty
Contributor
Contributor

SAN PATH : "The path is marked as 'busy' by the VMkernel"

root@ESX2 ~]# esxcfg-mpath -s off -P vmhba2:C0:T0:L0

Unable to set path state. Error was: Unable to change path state, the path is marked as 'busy' by the VMkernel.

Heres the situation:

600gb vdisk

I added a volume to LUN0, Set the paths to default in the san.
I realized i didn't like the size and removed the volume.
Split vdisk into two volumes.

500gb and 99gb Volumes

Created volumes for both and set them to LUN 0.
The 1st disk, created and mapped. 2nd disk wouldn't path because LUN 0 was already taking (stupid me put it on the same LUN 0)

This is when the ESX alerts come.

Alarm Definition:
( OR
OR
Event alarm
-- expression: Degraded Storage Path Redundancy--)
Event details:
Lost connectivity to storage device
+ naa.600c0ff000d7e9120000000000000000. Path vmhba1:C0:T1:L0 is down. Affected+
+ datastores: Unknown.+

I tried to deactive these luns but get

  • root@ESX2 ~]#* esxcfg-mpath -s off -P vmhba2:C0:T0:L0

Unable to set path state. Error was: Unable to change path state, the path is marked as 'busy' by the VMkernel.

Update: I have since added the 500gb and 99gb to LUN 3 and LUN 4 with no problems. LUN 0 is still being seen by ESX when i don't have these anymore. How can i remove these without interupting my production enviroment.

thanks

jeff

fc.20000000c987cd55:10000000c987cd55-fc.208000c0ffd7eaad:217000c0ffd7eaad-naa.600c0ff000d7e9120000000000000000

Runtime Name: vmhba2:C0:T0:L0

Device: naa.600c0ff000d7e9120000000000000000

Device Display Name: HP Fibre Channel Enclosure Svc Dev (naa.600c0ff000d7e9120000000000000000)

Adapter: vmhba2 Channel: 0 Target: 0 LUN: 0

Adapter Identifier: fc.20000000c987cd55:10000000c987cd55

Target Identifier: fc.208000c0ffd7eaad:217000c0ffd7eaad

Plugin: NMP

State: active

Transport: fc

Adapter Transport Details: WWNN: 20:00:00:00:c9:87:cd:55 WWPN: 10:00:00:00:c9:87:cd:55

Target Transport Details: WWNN: 20:80:00:c0:ff:d7:ea:ad WWPN: 21:70:00:c0:ff:d7:ea:ad

OR

vmhba2:C0:T0:L0 state:active naa.600c0ff000d7e9120000000000000000 vmhba2 0 0 0 NMP active san fc.20000000c987cd55:10000000c987cd55 fc.208000c0ffd7eaad:217000c0ffd7eaad

vmhba1:C0:T1:L0 state:active naa.600c0ff000d7e9120000000000000000 vmhba1 0 1 0 NMP active san fc.20000000c987f29f:10000000c987f29f fc.208000c0ffd7eaad:207000c0ffd7eaad

vmhba1:C0:T0:L0 state:active naa.600c0ff000d7e8690000000000000000 vmhba1 0 0 0 NMP active san fc.20000000c987f29f:10000000c987f29f fc.208000c0ffd7eaad:247000c0ffd7eaad

20 Replies
binoche
VMware Employee
VMware Employee

I did not understand your question well, do you want to clean up deleted luns from ESX?

if so, esxcfg-rescan -d vmhba1;esxcfg-rescan -d vmhba2 will clean up deleted luns here

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
vmwarefc
Contributor
Contributor

Hi,91gsixty

I think you might remove the LUN0 or the volume on it.

You can fulfill it by unmapping the LUN0 from your storage array and the execute esxcfg-rescan vmhbaX or rescan via UI. You will remove the LUN0 in this way.

You can remove the volume on LUN0 by deleting the datastore from UI. You will just remove the volume in this way. (You can still find and use the LUN)

Thanks

Zhifeng

Zhifeng
Reply
0 Kudos
91gsixty
Contributor
Contributor

I did a rescan thru the GUI but with no luck, i'll try the -d at cmd next

Reply
0 Kudos
91gsixty
Contributor
Contributor

update

# esxcfg-rescan -d vmhba1

# esxcfg-rescan -d vmhba2

#

Same result.

It doesn't show the data store in VIC and it doesnt show the lun on my SAN.

Same as the attached above.

Reply
0 Kudos
binoche
VMware Employee
VMware Employee

'busy', have you still I/Os to this lun?

also please clarify your expectation results, thanks

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
vmwarefc
Contributor
Contributor

Actually, you should not have tried it using esxcfg-mpath command if you want to remove a LUN. (esxcfg-mpath is unable to remove a LUN or datastore.)

The correct steps should be below if you want to remove a LUN.

1, unmapping the LUN to host (your ESX in this case) from Storage Array (HP array in this case).

2, On your ESX, esxcfg-rescan vmhba1 and esxcfg-rescan vmhba2 (your FC adapters) or execute RESCAN by UI.

3, check whether the LUN0 has been removed.

The steps to remove a datastore.

1, directly right clicking on the datastore and then choose delete it via UI.

At this time, the datastore is removed while the LUN still exists.

Have you you understanded it?

Please feel free to contact me if you have any more questions.

Thanks

Zhifeng

Zhifeng
Reply
0 Kudos
91gsixty
Contributor
Contributor

Wierd that it is seeing it as a enclosure.

I went into my SAN (msa2324fc) CLI and still didn't see a LUN 0 there.

I have restarted my vcenter and 1 of my esx hosts (in a cluster of 3), but i think the only way i'm going to get this to work if i restart all of my ESX hosts. In a production enviro. this doesn't help.

One more thing, Out of my 3 ESX clustered, only one was giving a problem, till i rescanned the others. Now they are giving snmp errors connecting to storage.

There must be a conf file somewhere that houses these???

Reply
0 Kudos
binoche
VMware Employee
VMware Employee

Hi,

could you please post the below results? thanks

esxcli nmp device list -d naa.600c0ff000d7e9120000000000000000

esxcli nmp path list -d naa.600c0ff000d7e9120000000000000000

esxcfg-scsidevs -l -d naa.600c0ff000d7e9120000000000000000

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
vmwarefc
Contributor
Contributor

Hi, 91gsixty,

I notice the picture shows the LUN0's type is enclosure instead of disk and its capacity is NULL. Does this phenomenon also occur to the your other servers which are sharing the storage array?

Are you sure that your LUN0 has been removed from storage array?

Thanks

Zhifeng

Zhifeng
Reply
0 Kudos
91gsixty
Contributor
Contributor

# esxcli nmp device list -d naa.600c0ff000d7e9120000000000000000

naa.600c0ff000d7e9120000000000000000

Device Display Name: HP Fibre Channel Enclosure Svc Dev (naa.600c0ff000d7e9120000000000000000)

Storage Array Type: VMW_SATP_ALUA

Storage Array Type Device Config: {implicit_support=on;explicit_support=off;explicit_allow=on;alua_followover=on;{TPG_id=0,TPG_state=ANO}}

Path Selection Policy: VMW_PSP_MRU

Path Selection Policy Device Config: Current Path=vmhba2:C0:T0:L0

Working Paths: vmhba2:C0:T0:L0

# esxcli nmp path list -d naa.600c0ff000d7e9120000000000000000

fc.20000000c987cd55:10000000c987cd55-fc.208000c0ffd7eaad:217000c0ffd7eaad-naa.600c0ff000d7e9120000000000000000

Runtime Name: vmhba2:C0:T0:L0

Device: naa.600c0ff000d7e9120000000000000000

Device Display Name: HP Fibre Channel Enclosure Svc Dev (naa.600c0ff000d7e9120000000000000000)

Group State: active unoptimized

Storage Array Type Path Config: {TPG_id=0,TPG_state=ANO,RTP_id=2,RTP_health=UP}

Path Selection Policy Path Config: {current path}

fc.20000000c987f29f:10000000c987f29f-fc.208000c0ffd7eaad:207000c0ffd7eaad-naa.600c0ff000d7e9120000000000000000

Runtime Name: vmhba1:C0:T1:L0

Device: naa.600c0ff000d7e9120000000000000000

Device Display Name: HP Fibre Channel Enclosure Svc Dev (naa.600c0ff000d7e9120000000000000000)

Group State: active unoptimized

Storage Array Type Path Config: {TPG_id=0,TPG_state=ANO,RTP_id=1,RTP_health=UP}

Path Selection Policy Path Config: {non-current path}

# esxcfg-scsidevs -l -d naa.600c0ff000d7e9120000000000000000

naa.600c0ff000d7e9120000000000000000

Device Type: Enclosure Svc Dev

Size: 0 MB

Display Name: HP Fibre Channel Enclosure Svc Dev (naa.600c0ff000d7e9120000000000000000)

Plugin: NMP

Console Device: /vmfs/devices/genscsi/naa.600c0ff000d7e9120000000000000000

Devfs Path: /vmfs/devices/genscsi/naa.600c0ff000d7e9120000000000000000

Vendor: HP Model: MSA2324fc Revis: M100

SCSI Level: 5 Is Pseudo: false Status: on

Is RDM Capable: true Is Removable: false

Is Local: false

Other Names:

vml.020d000000600c0ff000d7e91200000000000000004d5341323332

Yes there is no LUN 0 anymore on the SAN. I've checked in the CLI and GUI

Reply
0 Kudos
binoche
VMware Employee
VMware Employee

zhifeng is correct, this lun 0 looks like management lun from your storage array;

you forget to map your data lun 0 to vmhba1 vmhba2, please re-check lun mapping on your storage array,

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
91gsixty
Contributor
Contributor

MSA output

  1. show volume-maps

Info: Retrieving data...

Volume View Serial Number (00c0ffd7e86900009a21fc4a01000000) Name (Exchange_Data) Mapping:

Ports LUN Access Host-Port-Identifier Nickname Profile

-


A1,A2,B1,B2 3 read-write all other hosts Standard

Volume View Serial Number (00c0ffd7e8690000a122fc4a01000000) Name (Exchange_Logs) Mapping:

Ports LUN Access Host-Port-Identifier Nickname Profile

-


A1,A2,B1,B2 4 read-write all other hosts Standard

Volume View Serial Number (00c0ffd7e912000081fa7a4a01000000) Name (Virtuals_v000) Mapping:

Ports LUN Access Host-Port-Identifier Nickname Profile

-


A1,A2,B1,B2 2 read-write all other hosts Standard

Volume View Serial Number (00c0ffd7e912000081fa7a4a02000000) Name (Virtuals_v001) Mapping:

Ports LUN Access Host-Port-Identifier Nickname Profile

-


A1,A2,B1,B2 1 read-write all other hosts Standard

  1. show host-maps

Host View ID (5001438001697B28) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

Host View ID (10000000C987F217) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

Host View ID (5001438001697BDE) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

Host View ID (10000000C987CD6A) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

Host View ID (10000000C987CD55) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

Host View ID (10000000C987F29F) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

Host View ID (21FD00051E8E1140) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

Host View ID (21FD00051E9A8FD6) Name () Profile (Standard) Mapping:

Name Serial Number LUN Access Ports

-


Virtuals_v000 00c0ffd7e912000081fa7a4a01000000 2 read-write A1,A2,B1,B2

Exchange_Data 00c0ffd7e86900009a21fc4a01000000 3 read-write A1,A2,B1,B2

Virtuals_v001 00c0ffd7e912000081fa7a4a02000000 1 read-write A1,A2,B1,B2

Exchange_Logs 00c0ffd7e8690000a122fc4a01000000 4 read-write A1,A2,B1,B2

-


  1. show advanced-settings

Background Scrub: Enabled

Partner Firmware Upgrade: Enabled

Utility Priority: High

SMART: Enabled

Dynamic Spare Configuration: Disabled

Enclosure Polling Rate: 5

Host Control of Caching: Disabled

Sync Cache Mode: Immediate

Missing LUN Response: Not Ready <=--- hmmmm

Controller Failure: Disabled

Supercap Failure: Enabled

CompactFlash Failure: Enabled

Power Supply Failure: Disabled

Fan Failure: Disabled

Temperature Exceeded: Disabled

Partner Notify: Disabled

Auto Write Back: Enabled

Message was added by: 91gsixty

Missing LUN Response

Explained...

Some operating systems do not look beyond LUN 0 if they do not find a LUN 0 or

cannot handle noncontiguous LUNs. Missing LUN Response handles these

situations by enabling the host drivers to continue probing for LUNs until they

reach the LUN to which they have access.

Missing LUN Response enables the host drivers to continue probing for LUNs until they reach the

LUN to which they have access.• Not Ready – Sends a reply that there is a LUN where a gap has been created but that its “not

ready.” Sense data returned is sensekey = 2, code = 4, qualifier = 3.• Illegal Request – Sends a reply that there is a LUN but that the request is “illegal.” Sense

data returned is sensekey = 5, code = 25h, qualifier = 0

Still ongoing....

Reply
0 Kudos
vmwarefc
Contributor
Contributor

Hi,91gsixty

Could you check whether vmhba2:C0:T0:L0 is a disk(LUN) using esxcfg-mpath -l -P vmhba2:C0:T0:L0?

If it was an enclosure instead of a LUN, the return you got is correct. Otherwise, this is really an issue.

I mean your vmhba2:C0:T0:L0 might be an enclosure. If you have any questions, Please feel free to contact me.

Thanks

Zhifeng

Zhifeng
Reply
0 Kudos
vmwarefc
Contributor
Contributor

root@ESX2 ~# esxcli nmp path list -d naa.600c0ff000d7e9120000000000000000

fc.20000000c987cd55:10000000c987cd55-fc.208000c0ffd7eaad:217000c0ffd7eaad-naa.600c0ff000d7e9120000000000000000

Runtime Name: vmhba2:C0:T0:L0

Device: naa.600c0ff000d7e9120000000000000000

Device Display Name: HP Fibre Channel Enclosure Svc Dev (naa.600c0ff000d7e9120000000000000000)

Group State: active unoptimized

Storage Array Type Path Config: {TPG_id=0,TPG_state=ANO,RTP_id=2,RTP_health=UP}

Path Selection Policy Path Config: {current path}

The result indicates your device vmhba2:C0:T0:L0 is an enclosure instead of LUN. So the return is correct.

Zhifeng
Reply
0 Kudos
binoche
VMware Employee
VMware Employee

vmhba2, fc.20000000c987cd55:10000000c987cd55, Host View ID (10000000C987CD55)?

vmhba1, fc.20000000c987f29f:10000000c987f29f, Host View ID (10000000C987F29F)?

show host-maps says these 2 hosts there are only lun 1 2 3 4, your data lun 0 is not mapped then?

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
91gsixty
Contributor
Contributor

Binoche,

How do I determine what the Hosts are on the MSA?

I have 2 hba cards in each esx server and 2 FC SWs, I"m assuming those are the 8 hosts.

Here is the list of hosts on the MSA

Host ID 10000000C987CD55

Host ID10000000C987F29F

Host ID5001438001697B28

Host ID10000000C987F217

Host ID5001438001697BDE

Host ID10000000C987CD6A

Host ID21FD00051E8E1140

Host ID21FD00051E9A8FD6

But on the esx

# hostid

11ac0f00

# hostid

11ac1000

# hostid

11ac1100

How do i match these up?

Reply
0 Kudos
binoche
VMware Employee
VMware Employee

Hi,

I am not sure how hostid is generated, maybe you can check vmhba fc port wwn? for example 10000000c987cd55 is vmhba2 fc port wwn, which hostid on your msa has this wwn

as a work around, you can create 1 new lun for example 1GB on your msa and map as lun 0 to all 8 hosts

Reply
0 Kudos
91gsixty
Contributor
Contributor

FIXED.

This is how this came about.

Having no lun 0 causes problems, with datastores. It shows it as a ENCLOSER but give storage errors in events tab.

To solve this I removed my LUN 4 and made it LUN 0. Rescanned all Hosts and Disabled the alarm and re-enabled the "Cannot connect to storage" alarm and than no more storage warnings appeared.

Note: On my MSA there is a option for "Missing LUN Response: Not Ready / Illegal Request"

Keep the setting "Not Ready" if you already have LUNS on you SAN. What this means is, if luns have been skipped any luns after will still show eg. LUN01245 3 will show error but will be able to see 4 and 5. If Illega Request is click, it would stop at 3. eg. LUN012 Luns 4 and 5 will not be shown even if they have paths.

This is the reason why my Lun0 was showing still. You have to have a LUN 0 to see LUN 1. And because i had "Missing LUN Response: Not Ready " set as default, it was putting LUN 0 in anyhow, but as a enclosure so that you could see my other LUNS 1234.

See attached.

binoche
VMware Employee
VMware Employee

thanks for sharing your findings

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos