aduitsis
Enthusiast
Enthusiast

Failed to open device naa.xxxxx:1 : not supported (xpost from vsphere hypervisor forum)

Jump to solution

(xpost from the vsphere hypervisor subforum, I posted this earlier in Failed to open device naa.xxxxx:1 : not supported, apologies for duplicate )

I have a Dell R610 running esxi5.5 trying to access a 250Gb LUN in an EMC CX3-40f Clariion array. The LUN is indeed visible, as I can myself see by issuing

~ # esxcli storage core device list
....
naa.600601601f712100669db307a087e011
   Display Name: DGC Fibre Channel Disk (naa.600601601f712100669db307a087e011)
   Has Settable Display Name: true
   Size: 256000
   Device Type: Direct-Access 
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.600601601f712100669db307a087e011
   Vendor: DGC     
   Model: VRAID           
   Revision: 0326
   SCSI Level: 4
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: false
   Is Removable: false
   Is SSD: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: unknown
   Attached Filters: 
   VAAI Status: unknown
   Other UIDs: vml.0200000000600601601f712100669db307a087e011565241494420
   Is Local SAS Device: false
   Is USB: false
   Is Boot USB Device: false
   No of outstanding IOs with competing worlds: 32
...

However, when trying to mount the LUN via the esxi console, I get an error. The vmkernel.log says:

2015-03-23T14:20:28.006Z cpu9:33519)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x16 (0x412e803c0300, 0) to dev "naa.600601601f712100669db307a087e011" on path "vmhba2:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2015-03-23T14:20:28.006Z cpu9:33519)ScsiDeviceIO: 2338: Cmd(0x412e803c0300) 0x16, CmdSN 0xc44 from world 0 to dev "naa.600601601f712100669db307a087e011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2015-03-23T14:20:28.006Z cpu4:34247 opID=50ddd7ac)World: 14299: VC opID D6D55F59-0000010A maps to vmkernel opID 50ddd7ac
2015-03-23T14:20:28.006Z cpu4:34247 opID=50ddd7ac)LVM: 11786: Failed to open device naa.600601601f712100669db307a087e011:1 : Not supported

Anybody knows what is the culprit?

The LUN is formatted, as I've been able to format it and mount it correctly to other esxi machines (Dell R620). Everything goes smoothly. So as far as I can tell, the only difference is that this server is an R610.

Thanks,

Athanasios

Tags (4)
1 Solution

Accepted Solutions
aduitsis
Enthusiast
Enthusiast

OK, problem solved, so I'll describe here what we did in brief.

So there's this Dell R610 with a fibre channel card running FreeBSD 9.1 and connected to a Clariion CX3 array. Because FreeBSD does not have a host agent, manual registration of its HBA is necessary on the Clariion. Apart from that small annoyance, everything was running smoothly.

When we decided to format the system and install ESXi, we neglected to do anything with the Clariion, as (we thought) the card will obviously maintain its fibre channel wwid(s) and that the ESXi would have access to the same LUN as before. This LUN would have to be newly formatted as a datastore, naturally, but that was to be expected.

As it turns out, we were able to see the LUN from the ESXi and we were able to partition it, but we were unable to format it (from console or ssh). We also tried to give the LUN to another ESXi (ok) and format it there (success), but when we moved it back to the original owner we couldn't mount it, reasons unknown. Fearing that the previous contents of the LUN might somehow interfere (stray superblocks?) we gave it to a VM via Raw Device Access and we dd'ed it complete with zeros. Again, no effect. Basically, all our tests showed that the only way that we could use a LUN with that specific machine was via Raw Device Access. No way to format or mount VMFS.

A suggestion said that we should try changing the Host LUN presented to the machine from the Clariion. We did so, but that changed nothing. But the suggestion was nevertheless helpful, because we begun thinking about differences from the Clariion's perspective. Indeed, the problem was that the specific client was manually registered and this never changed regardless of the fact that the ESXi actually has a host agent.

So we powered off the R610, disconnected the fibre links (yeah maybe that wasn't required Smiley Happy ) and de-registered the client from the connectivity status menu of the Clariion array. We powered up the R610 and voila!, the host agent presented itself automatically and the LUN got mounted right after that. No other actions were necessary. A deeper explanation of why vmware esxi was actually able to see the LUN but at the same time not being able to fully use it eludes me at this point.

TL;DR: If your esxi is installed in a machine that used to run another OS, be sure to de-register the fibre channel client from any storage arrays and then let the host agent do the work for you.

View solution in original post

18 Replies
vickey0rana
Enthusiast
Enthusiast

Can you confirm on below:

--> how frequent this issue is.?

--> can you check if vmkernel/messages log have below entries: (make sure you check this before rebooting the ESXi once you face this issue again or on re-occurrence of issue.)?

ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40.




Thanks,

~Ravinder S Rana

---------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) BR, Ravinder S Rana
0 Kudos

the System may have a Problem with the partition xxx:1 on your system.

if you don't have any VMs on this datastore/lun i would delete the lun and create a new one...

------------------------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards from Switzerland, B. Fernandez http://vpxa.info/
0 Kudos
aduitsis
Enthusiast
Enthusiast

Hello,

The issue is persistent, I cannot format the LUN and if I format it in another esxi machine (R620) and give it back to the machine having the problem (R610), I cannot mount it. Basically, I cannot use the LUN in any way, except in raw device mapping. I tried esxi5.5 (vanilla and the dell flavor) and esxi6 and the behavior is the same.

There are no *.log files containing the word APIC in the /var/log directory.

Thanks,

Athanasios

0 Kudos
aduitsis
Enthusiast
Enthusiast

I have no problem to delete it, but I won't be able to format it in the system with the problem. The system seems to able to partition the LUN, but then fails to format the first partition.

Other esxi 5.5 machines can mount the LUN without trouble. I highly doubt there is any real problem with it.

0 Kudos
vickey0rana
Enthusiast
Enthusiast

I have experienced such issues where LUN is visible via CLI but it hangs or very slow or sometimes GUI cannot show the same.. Moreover not showing a LUN on single servers, forcing me to suspect that either HBA driver/firmware is not correct or upto date else there is issue with HBA itself .

---------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) BR, Ravinder S Rana
0 Kudos
aduitsis
Enthusiast
Enthusiast

Yeah, but raw device access is working. This shouldn't happen if there was a problem with the HBA, right?

0 Kudos
cykVM
Expert
Expert

Just found this old KB: VMware KB: Recovering a lost partition table on a VMFS volume

Maybe you experience similar problems on your R610 and it can't read the partition table for some reason?

0 Kudos
aduitsis
Enthusiast
Enthusiast

Well, since I have also tried to delete everything from scratch (and failed), that's not it, unfortunately. The log messages do not signify something wrong with the partition.

Basically, there is nothing of value in the LUN, so I can experiment at will. I haven't been able to format it at all. Accessing a partition created with another esxi is also impossible. Basically, IMHO there's something wrong with either the Dell R610 or the specific R610, but I cannot wrap my head around what can be wrong that could at the same time allow the esxi to e.g. partition the lun (success), allow raw device access (success) and at the same time fail on format or mount. The mind boggles.

0 Kudos

what LUNID does the LUN has?

------------------------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards from Switzerland, B. Fernandez http://vpxa.info/
0 Kudos
aduitsis
Enthusiast
Enthusiast
fc.20000024ff2bf9e7:21000024ff2bf9e7-fc.50060160c1e0d553:5006016841e0d553-naa.600601601f712100669db307a087e011
   UID: fc.20000024ff2bf9e7:21000024ff2bf9e7-fc.50060160c1e0d553:5006016841e0d553-naa.600601601f712100669db307a087e011
   Runtime Name: vmhba3:C0:T0:L0
   Device: naa.600601601f712100669db307a087e011
   Device Display Name: DGC Fibre Channel Disk (naa.600601601f712100669db307a087e011)
   Adapter: vmhba3
   Channel: 0
   Target: 0
   LUN: 0
   Plugin: NMP
   State: active
   Transport: fc
   Adapter Identifier: fc.20000024ff2bf9e7:21000024ff2bf9e7
   Target Identifier: fc.50060160c1e0d553:5006016841e0d553
   Adapter Transport Details: WWNN: 20:00:00:24:ff:2b:f9:e7 WWPN: 21:00:00:24:ff:2b:f9:e7
   Target Transport Details: WWNN: 50:06:01:60:c1:e0:d5:53 WWPN: 50:06:01:68:41:e0:d5:53
   Maximum IO Size: 33553920

fc.20000024ff2bf9e6:21000024ff2bf9e6-fc.50060160c1e0d553:5006016141e0d553-naa.600601601f712100669db307a087e011
   UID: fc.20000024ff2bf9e6:21000024ff2bf9e6-fc.50060160c1e0d553:5006016141e0d553-naa.600601601f712100669db307a087e011
   Runtime Name: vmhba2:C0:T0:L0
   Device: naa.600601601f712100669db307a087e011
   Device Display Name: DGC Fibre Channel Disk (naa.600601601f712100669db307a087e011)
   Adapter: vmhba2
   Channel: 0
   Target: 0
   LUN: 0
   Plugin: NMP
   State: active
   Transport: fc
   Adapter Identifier: fc.20000024ff2bf9e6:21000024ff2bf9e6
   Target Identifier: fc.50060160c1e0d553:5006016141e0d553
   Adapter Transport Details: WWNN: 20:00:00:24:ff:2b:f9:e6 WWPN: 21:00:00:24:ff:2b:f9:e6
   Target Transport Details: WWNN: 50:06:01:60:c1:e0:d5:53 WWPN: 50:06:01:61:41:e0:d5:53
   Maximum IO Size: 33553920
0 Kudos

as I can see the Lun has the ID 0

I have seen this problem allready a lot of times where the esx server can't handle luns with the ID 0

try to give here another lunid

------------------------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards from Switzerland, B. Fernandez http://vpxa.info/
0 Kudos
aduitsis
Enthusiast
Enthusiast

Um, do you happen to know how to do that?

0 Kudos

you have to do this on storage side. as I can see you wrote that you have an EMC CX3?

here a link how to change de LUNID:

https://community.emc.com/message/730638

------------------------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards from Switzerland, B. Fernandez http://vpxa.info/
aduitsis
Enthusiast
Enthusiast

Hm, I see what you're saying. I'll try that and get back. In the meantime, marking your answer as helpful, which it definitely was. Oh, and thanks very much!!!!

0 Kudos

you are welcome buddy Smiley Happy

thx for marking the reply as helpful

------------------------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards from Switzerland, B. Fernandez http://vpxa.info/
0 Kudos
aduitsis
Enthusiast
Enthusiast

It may very well be more complicated than that. I can see that other esxi machines also see host lun 0 and have no trouble accessing it. So this is not just the number not being 0 probably.

BUT, in my Clariion CX3 console I can see that the specific server has a mark saying "host agent not reachable", contrary to the other esxi's. So maybe this has some significance.

Before installing esxi, the server used to run FreeBSD, which AFAIK does not have a host agent proper. I'll probably rename the server or reinstall it on Thursday and see what happens.

0 Kudos
cykVM
Expert
Expert
0 Kudos
aduitsis
Enthusiast
Enthusiast

OK, problem solved, so I'll describe here what we did in brief.

So there's this Dell R610 with a fibre channel card running FreeBSD 9.1 and connected to a Clariion CX3 array. Because FreeBSD does not have a host agent, manual registration of its HBA is necessary on the Clariion. Apart from that small annoyance, everything was running smoothly.

When we decided to format the system and install ESXi, we neglected to do anything with the Clariion, as (we thought) the card will obviously maintain its fibre channel wwid(s) and that the ESXi would have access to the same LUN as before. This LUN would have to be newly formatted as a datastore, naturally, but that was to be expected.

As it turns out, we were able to see the LUN from the ESXi and we were able to partition it, but we were unable to format it (from console or ssh). We also tried to give the LUN to another ESXi (ok) and format it there (success), but when we moved it back to the original owner we couldn't mount it, reasons unknown. Fearing that the previous contents of the LUN might somehow interfere (stray superblocks?) we gave it to a VM via Raw Device Access and we dd'ed it complete with zeros. Again, no effect. Basically, all our tests showed that the only way that we could use a LUN with that specific machine was via Raw Device Access. No way to format or mount VMFS.

A suggestion said that we should try changing the Host LUN presented to the machine from the Clariion. We did so, but that changed nothing. But the suggestion was nevertheless helpful, because we begun thinking about differences from the Clariion's perspective. Indeed, the problem was that the specific client was manually registered and this never changed regardless of the fact that the ESXi actually has a host agent.

So we powered off the R610, disconnected the fibre links (yeah maybe that wasn't required Smiley Happy ) and de-registered the client from the connectivity status menu of the Clariion array. We powered up the R610 and voila!, the host agent presented itself automatically and the LUN got mounted right after that. No other actions were necessary. A deeper explanation of why vmware esxi was actually able to see the LUN but at the same time not being able to fully use it eludes me at this point.

TL;DR: If your esxi is installed in a machine that used to run another OS, be sure to de-register the fibre channel client from any storage arrays and then let the host agent do the work for you.

View solution in original post