(xpost from the vsphere hypervisor subforum, I posted this earlier in Failed to open device naa.xxxxx:1 : not supported, apologies for duplicate )
I have a Dell R610 running esxi5.5 trying to access a 250Gb LUN in an EMC CX3-40f Clariion array. The LUN is indeed visible, as I can myself see by issuing
~ # esxcli storage core device list .... naa.600601601f712100669db307a087e011 Display Name: DGC Fibre Channel Disk (naa.600601601f712100669db307a087e011) Has Settable Display Name: true Size: 256000 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/naa.600601601f712100669db307a087e011 Vendor: DGC Model: VRAID Revision: 0326 SCSI Level: 4 Is Pseudo: false Status: on Is RDM Capable: true Is Local: false Is Removable: false Is SSD: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: unknown Attached Filters: VAAI Status: unknown Other UIDs: vml.0200000000600601601f712100669db307a087e011565241494420 Is Local SAS Device: false Is USB: false Is Boot USB Device: false No of outstanding IOs with competing worlds: 32 ...
However, when trying to mount the LUN via the esxi console, I get an error. The vmkernel.log says:
2015-03-23T14:20:28.006Z cpu9:33519)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x16 (0x412e803c0300, 0) to dev "naa.600601601f712100669db307a087e011" on path "vmhba2:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE 2015-03-23T14:20:28.006Z cpu9:33519)ScsiDeviceIO: 2338: Cmd(0x412e803c0300) 0x16, CmdSN 0xc44 from world 0 to dev "naa.600601601f712100669db307a087e011" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. 2015-03-23T14:20:28.006Z cpu4:34247 opID=50ddd7ac)World: 14299: VC opID D6D55F59-0000010A maps to vmkernel opID 50ddd7ac 2015-03-23T14:20:28.006Z cpu4:34247 opID=50ddd7ac)LVM: 11786: Failed to open device naa.600601601f712100669db307a087e011:1 : Not supported
Anybody knows what is the culprit?
The LUN is formatted, as I've been able to format it and mount it correctly to other esxi machines (Dell R620). Everything goes smoothly. So as far as I can tell, the only difference is that this server is an R610.
Thanks,
Athanasios
OK, problem solved, so I'll describe here what we did in brief.
So there's this Dell R610 with a fibre channel card running FreeBSD 9.1 and connected to a Clariion CX3 array. Because FreeBSD does not have a host agent, manual registration of its HBA is necessary on the Clariion. Apart from that small annoyance, everything was running smoothly.
When we decided to format the system and install ESXi, we neglected to do anything with the Clariion, as (we thought) the card will obviously maintain its fibre channel wwid(s) and that the ESXi would have access to the same LUN as before. This LUN would have to be newly formatted as a datastore, naturally, but that was to be expected.
As it turns out, we were able to see the LUN from the ESXi and we were able to partition it, but we were unable to format it (from console or ssh). We also tried to give the LUN to another ESXi (ok) and format it there (success), but when we moved it back to the original owner we couldn't mount it, reasons unknown. Fearing that the previous contents of the LUN might somehow interfere (stray superblocks?) we gave it to a VM via Raw Device Access and we dd'ed it complete with zeros. Again, no effect. Basically, all our tests showed that the only way that we could use a LUN with that specific machine was via Raw Device Access. No way to format or mount VMFS.
A suggestion said that we should try changing the Host LUN presented to the machine from the Clariion. We did so, but that changed nothing. But the suggestion was nevertheless helpful, because we begun thinking about differences from the Clariion's perspective. Indeed, the problem was that the specific client was manually registered and this never changed regardless of the fact that the ESXi actually has a host agent.
So we powered off the R610, disconnected the fibre links (yeah maybe that wasn't required ) and de-registered the client from the connectivity status menu of the Clariion array. We powered up the R610 and voila!, the host agent presented itself automatically and the LUN got mounted right after that. No other actions were necessary. A deeper explanation of why vmware esxi was actually able to see the LUN but at the same time not being able to fully use it eludes me at this point.
TL;DR: If your esxi is installed in a machine that used to run another OS, be sure to de-register the fibre channel client from any storage arrays and then let the host agent do the work for you.
Can you confirm on below:
--> how frequent this issue is.?
--> can you check if vmkernel
/messages
log have below entries: (make sure you check this before rebooting the ESXi once you face this issue again or on re-occurrence of issue.)?
ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40.
Thanks,
~Ravinder S Rana
the System may have a Problem with the partition xxx:1 on your system.
if you don't have any VMs on this datastore/lun i would delete the lun and create a new one...
Hello,
The issue is persistent, I cannot format the LUN and if I format it in another esxi machine (R620) and give it back to the machine having the problem (R610), I cannot mount it. Basically, I cannot use the LUN in any way, except in raw device mapping. I tried esxi5.5 (vanilla and the dell flavor) and esxi6 and the behavior is the same.
There are no *.log files containing the word APIC in the /var/log directory.
Thanks,
Athanasios
I have no problem to delete it, but I won't be able to format it in the system with the problem. The system seems to able to partition the LUN, but then fails to format the first partition.
Other esxi 5.5 machines can mount the LUN without trouble. I highly doubt there is any real problem with it.
I have experienced such issues where LUN is visible via CLI but it hangs or very slow or sometimes GUI cannot show the same.. Moreover not showing a LUN on single servers, forcing me to suspect that either HBA driver/firmware is not correct or upto date else there is issue with HBA itself .
Yeah, but raw device access is working. This shouldn't happen if there was a problem with the HBA, right?
Just found this old KB: VMware KB: Recovering a lost partition table on a VMFS volume
Maybe you experience similar problems on your R610 and it can't read the partition table for some reason?
Well, since I have also tried to delete everything from scratch (and failed), that's not it, unfortunately. The log messages do not signify something wrong with the partition.
Basically, there is nothing of value in the LUN, so I can experiment at will. I haven't been able to format it at all. Accessing a partition created with another esxi is also impossible. Basically, IMHO there's something wrong with either the Dell R610 or the specific R610, but I cannot wrap my head around what can be wrong that could at the same time allow the esxi to e.g. partition the lun (success), allow raw device access (success) and at the same time fail on format or mount. The mind boggles.
what LUNID does the LUN has?
fc.20000024ff2bf9e7:21000024ff2bf9e7-fc.50060160c1e0d553:5006016841e0d553-naa.600601601f712100669db307a087e011 UID: fc.20000024ff2bf9e7:21000024ff2bf9e7-fc.50060160c1e0d553:5006016841e0d553-naa.600601601f712100669db307a087e011 Runtime Name: vmhba3:C0:T0:L0 Device: naa.600601601f712100669db307a087e011 Device Display Name: DGC Fibre Channel Disk (naa.600601601f712100669db307a087e011) Adapter: vmhba3 Channel: 0 Target: 0 LUN: 0 Plugin: NMP State: active Transport: fc Adapter Identifier: fc.20000024ff2bf9e7:21000024ff2bf9e7 Target Identifier: fc.50060160c1e0d553:5006016841e0d553 Adapter Transport Details: WWNN: 20:00:00:24:ff:2b:f9:e7 WWPN: 21:00:00:24:ff:2b:f9:e7 Target Transport Details: WWNN: 50:06:01:60:c1:e0:d5:53 WWPN: 50:06:01:68:41:e0:d5:53 Maximum IO Size: 33553920 fc.20000024ff2bf9e6:21000024ff2bf9e6-fc.50060160c1e0d553:5006016141e0d553-naa.600601601f712100669db307a087e011 UID: fc.20000024ff2bf9e6:21000024ff2bf9e6-fc.50060160c1e0d553:5006016141e0d553-naa.600601601f712100669db307a087e011 Runtime Name: vmhba2:C0:T0:L0 Device: naa.600601601f712100669db307a087e011 Device Display Name: DGC Fibre Channel Disk (naa.600601601f712100669db307a087e011) Adapter: vmhba2 Channel: 0 Target: 0 LUN: 0 Plugin: NMP State: active Transport: fc Adapter Identifier: fc.20000024ff2bf9e6:21000024ff2bf9e6 Target Identifier: fc.50060160c1e0d553:5006016141e0d553 Adapter Transport Details: WWNN: 20:00:00:24:ff:2b:f9:e6 WWPN: 21:00:00:24:ff:2b:f9:e6 Target Transport Details: WWNN: 50:06:01:60:c1:e0:d5:53 WWPN: 50:06:01:61:41:e0:d5:53 Maximum IO Size: 33553920
as I can see the Lun has the ID 0
I have seen this problem allready a lot of times where the esx server can't handle luns with the ID 0
try to give here another lunid
Um, do you happen to know how to do that?
you have to do this on storage side. as I can see you wrote that you have an EMC CX3?
here a link how to change de LUNID:
https://community.emc.com/message/730638
Hm, I see what you're saying. I'll try that and get back. In the meantime, marking your answer as helpful, which it definitely was. Oh, and thanks very much!!!!
you are welcome buddy
thx for marking the reply as helpful
It may very well be more complicated than that. I can see that other esxi machines also see host lun 0 and have no trouble accessing it. So this is not just the number not being 0 probably.
BUT, in my Clariion CX3 console I can see that the specific server has a mark saying "host agent not reachable", contrary to the other esxi's. So maybe this has some significance.
Before installing esxi, the server used to run FreeBSD, which AFAIK does not have a host agent proper. I'll probably rename the server or reinstall it on Thursday and see what happens.
Maybe this helps: https://community.emc.com/message/530614
OK, problem solved, so I'll describe here what we did in brief.
So there's this Dell R610 with a fibre channel card running FreeBSD 9.1 and connected to a Clariion CX3 array. Because FreeBSD does not have a host agent, manual registration of its HBA is necessary on the Clariion. Apart from that small annoyance, everything was running smoothly.
When we decided to format the system and install ESXi, we neglected to do anything with the Clariion, as (we thought) the card will obviously maintain its fibre channel wwid(s) and that the ESXi would have access to the same LUN as before. This LUN would have to be newly formatted as a datastore, naturally, but that was to be expected.
As it turns out, we were able to see the LUN from the ESXi and we were able to partition it, but we were unable to format it (from console or ssh). We also tried to give the LUN to another ESXi (ok) and format it there (success), but when we moved it back to the original owner we couldn't mount it, reasons unknown. Fearing that the previous contents of the LUN might somehow interfere (stray superblocks?) we gave it to a VM via Raw Device Access and we dd'ed it complete with zeros. Again, no effect. Basically, all our tests showed that the only way that we could use a LUN with that specific machine was via Raw Device Access. No way to format or mount VMFS.
A suggestion said that we should try changing the Host LUN presented to the machine from the Clariion. We did so, but that changed nothing. But the suggestion was nevertheless helpful, because we begun thinking about differences from the Clariion's perspective. Indeed, the problem was that the specific client was manually registered and this never changed regardless of the fact that the ESXi actually has a host agent.
So we powered off the R610, disconnected the fibre links (yeah maybe that wasn't required ) and de-registered the client from the connectivity status menu of the Clariion array. We powered up the R610 and voila!, the host agent presented itself automatically and the LUN got mounted right after that. No other actions were necessary. A deeper explanation of why vmware esxi was actually able to see the LUN but at the same time not being able to fully use it eludes me at this point.
TL;DR: If your esxi is installed in a machine that used to run another OS, be sure to de-register the fibre channel client from any storage arrays and then let the host agent do the work for you.