Re: ESXi 4.1 not recognizing existing VMFS volume.

leejb · ‎02-03-2011

Added a new Host into system at v4.1 ESXi. Other hosts are ESX v4.0. Shared LUN over Fiber formatted to VMFS v3.31 from ESX v4.0 host. Other ESX v4.0 Host can access LUN no problems. However, when I try to add to the new v4.1, it sees the LUN, even reads the VMFS Label, but only gives me the option 'Format the disk'.

Have tried from both the VCenter (newly upgraded VCenter v4.1) and directly from the v4.1 host, no change.

The only thing I can think is diff. other than v4.1 is the firmware on the HBA's. I have two QLogic 2432 PCI-E, both with same results. When I get a second, I have a dual port and an earlier version of same HBA model I can try, but if it can see the lun, I don't see how this would be an issue.

My understanding is that there should be no problems with v4.1 reading a v3.31 VMFS, only some of the newer features like Hardware acceleration are not available.

Any thoughts?

AndreTheGiant · ‎02-03-2011

A vSphere 4.1 can read also old 3.x VMFS partition.

So this could no be the issue.

Sure that is the same LUN?

Can you check the fdisk -l output from one 4.0 host and the new one?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

leejb · ‎02-03-2011

Pretty positive.

From i4.1...

Disk /dev/disks/eui.222a000155ed3901: 1999.9 GB, 1999999598592 bytes
255 heads, 63 sectors/track, 243152 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

From 4.0...

Disk /dev/sda: 1999.9 GB, 1999999598592 bytes
255 heads, 63 sectors/track, 243152 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

If I do a 'esxcfg-mpath -b' I get similar info, that eui.222a000155ed3901 is the disk seen on both systems. (also shown on both hosts under Configuration | storage Adapters | selected connected HBA. Both are defined as type 'disk', proper size (1.82TB), and owner NMP

opbz · ‎02-03-2011

ok same lun...

how different is the bios are we talking really old version really new or what?

is the new server seeing all paths to the lun? How big is the lun? What sort of SAN is it? Is your SAN seeing the server ok? On your fibre switches are you seeing any errors on the ports used by the new server? Is speed on ports set correctly. What about a new lun is that seen properly if you format it on the old server?

I think on this one we need to ensure that SAN is being properly accessed first the check for probs on the lun...

leejb · ‎02-03-2011

The two newer (original tried) single port QLE2460 and the dual port QLE2462 are at v5.02.00. The QLE2462 single port on the 4.0 host current accessing the volume is at v4.04.09.

A previous 4.0 host that was accessing it was an Emulex LP10000 a v1.90A4, unfortunately this is a PCI-X card and I can't test it out in the ESXi 4.1 host and the system the Emulex is in has been re-appropriated to other duties.

The next thing I can think of trying on the firmware front is to swap the 2462 in the current ESX4 host with the new ESXi 4.1, but it is currently in production and I will have to schedule a maint. for it. I guess I could downgrade the ESXi back to ESX to see if that's the source.

Regardless, the 4.0 and the i4.1 both recognize the existing HBA's as Model ISP2432-based 4Gb Fibre Channel to PCI Express HBA

LUN is a 2TB (1.82TB) volume, basic RAID10 stripped across 4 disks.

Storage is an Astra ES with dual (1 per controller) PromisVTrak connections to a Brocade E300. In scaning HBA, then scaning all under storage on the i4.1 host, I have 0 errors, rejections or busied reported on the FC switch, I don't know how to check for such errors on the Host. Speed setting is reading 4GB on Switch, Host and Storage. The HBA on i4.1 host recognizes 2 paths to the 1 device and has one tagged with (I/O) designate.

I will build and assign another LUN to the ESX4, format it then assign it to the new 4.1 as well to see what happens.

idle-jam · ‎02-03-2011

Hi,

one possible reason is that it's due to that the host detect it as a snapshot. If the below KB does not resolve your problem i would advise creating a support request with vmware support.

http://kb.vmware.com/kb/1011387

leejb · ‎02-03-2011

Interesting... esxcfg-volume -l displays:

VMFS3 UUID/label: 4917d152-0eaf3442-55ca-001d096b03f7/AstraLD1
Can mount: Yes
Can resignature: No (the volume is being actively used)
Extent name: eui.222a000155ed3901:1 range: 0 - 1907199 (MB)

When I performed esxcfg-volume -m "AstraLD1", then did a rescan on the storage in i4.1 through VCenter, the datastore is there and if I browse it, I can see the existing folders. Select a .vmx and add to the i4.1 inventory, however I it won't let me power on the vm "due to the connection state of the host"... maybe because I used -m rather than the persistant -M when mounting. Checking the switch after mounting shows that no errors/rejections/busied occur.

So I assume this means its not a Fiber or San issue per say, but specifically a i4.1 or i4.1 with this HBA. Concern is can I simply manually mount it persistantly and move on (assuming the 'connection state' is rectifiable)... I just don't like "hacking a solution". Trying to get around $$ for a support call Thanks, idle-jam, for the article... that at least moves me forward in trouble-shooting.

idle-jam · ‎02-03-2011

Glad that my KB points you to somewhere, i wish i could help more as normally at this stage i would just hand over to the support, (i have phobia touching/troubleshooting anything related to DATA). Perhaps someone else in the community will chip in. I wish you all the best buddy.

leejb · ‎02-04-2011

Just FYI on the inability to power on the VM added from the manually mounted storage... per a Dutch blog (I love google translate), someone suggested merely disconnecting and reconnecting the Host and indeed that allowed me to power up the VM w/o error.

Just to check to see if anything else changed, I stopped the VM, removed from inventory, used esxcfg-volume -u <label> and then attempted to add the storage back properly through the GUI, but again, I only get the Format Disk option.

To me I think it has something to do either with the new v4.1 VCenter Client (happens both if I go to VCenter or directly to the Host) or i4.1 itself not liking something about the VMFS structure/parameters/etc. Will keep working on it.

leejb · ‎02-04-2011

In more searching I keep coming back around to idle-jam's suggestion of Snapshot volume, specifically this post for the best summary. Appearantly in ESX <v4 you could control how a volume was seen, but according to this post, that is no longer the case in v4+ and if a volume already exists w/label, then it is automatically assumed the volume is a Snapshot. The post also goes on to say that the GUI option of "Keep the existing signature" is doing nothing other than a Forced Persistant mount (esxcfg-volume -M <UUID|label> ). How factual the guy is or my interpretation is another thing... but my guess is I'll need to resignature the disk or, assuming that isn't what does this, figure out how to set the Dissallow Snapshot LUN bit on the volume.

I did create a new LUN from scratch on the ESX4 host and mounted, was able to see it in the new i4.1, when going to add, have all options available under Specify a VMFS Mount option... when I select teh Keep the existing Signature option and finish it hangs a bit then errors with:

Call "HostStorageSystem.ResolveMultipleUnresolvedVmfsVolumes" for object "storageSystem-29" on vCenter Server "DCRC-VCENTER.col.missouri.edu" failed.

If I Delete that VMFS from the ESX Host, assign it to the i4.1 host, add/format it, then re-add that back to ESX4 I get the same error.

opbz · ‎02-04-2011

be careful with resignaturing as it might affect your other servers. Basically they might then not be able to see it.

Snapshot problems used to be more serious in 3.X in 4 they have solved a lot of problems. basically lun identified by the NAA number rather than the signature. Also in 4.X you can resignature or allow mounting of lus easilly by simply try to add storage. IT should see it and then give you option. In your case it does not...

I would suggest migrating whatever VMs you have on this problem lun to your new lun and then deleting it and recreating it.

leejb · ‎02-04-2011

Creating a new VMFS volume on a new LUN isn't solving it as if I create assign to ESX4, i4.1 see's it as a snapshot. If I create it in i4.1, then ESX4 sees it as a snapshot. If I do a resignature on these volumes from the system that see's it as a snapshot, it can then mount it, but then the other host suddenly sees it as a snapshot and cannot mount it.

It's like 4 full and i4.1 are fighting over it, if either one isn't the "signer", it see's it as a 'snapshot' and refuses to mount via the GUI... although I can still mount it manually via CLI. Maybe this is what I have to do from now on or until I get all hosts updated to i4.1.

It's just I had no problems with ESX4 so now I'm second guessing my upgrade.

opbz · ‎02-04-2011

so a datastore created on one server is not properly visible to the other server.

Vmware uses host id number (kind of like scsi ids) and how hbas see lun to generate the signature. This should be consistent between most versions of ESX.

Host ids can are set by the SAN and should be consistent for all servers that see the same lun. I have seen cases where this is not. I have not used the storage you are using. I am more familiar with EMC stuff. In certain models with older flares we had issues with not being able to fix host ids. This is why AX was originally not fully supported on older ESX clusters. If you can see the host ID given to the servers ensure its the same.

Secondly HBAs have bios level parameters that affect how they scan luns. Thats why there is vendor specific section for downloads in both Qlogic and Emulex sites. These can cause the datastore signatures to be different and prevent ESX servers from seeing stuff properly.

Its beginning to seem like your original idea about bioses might be correct...

Can you not update them?

leejb · ‎02-04-2011

Yeah, both see the eui.xxx as the same, but the UUID changes pending which one does 'signs' the volume.

I will look into updating the one. In fact, since I have two already at that BIOS, I can replace the HBA in the ESX4 system and see if that makes a difference. I'll just have to schedule long maint. window and start a clone of the VM's in the volume in case something goes wrong in addition to Veeam backups just to make sure.

leejb · ‎02-05-2011

Here's an interesting Tidbit. Had a friend send me a link to this KB 1015986, which id's the error message I received on the newly created VMFS and trying to add to a second host "HostStorageSystem.ResolveMultipleUnresolvedVmfsVolumes".

Basically it outlines what I was doing (Shared storage, A LUN seen as a snapshot, attempted a force mount... same as mount with existing signature, although I command lined it as well, focus on ESX4 and doesn't say anything about ESXi or v4.1).

It basically gives the go ahead to do the command line force mount (my interpretation), so I feel better about doing that. But it also gives an alternative of moving the second host into another site. Just for grins, I created a second site, removed the i4.1 host then added it to this new site... low an behold I can mount both the newly created VMFS and the older one I have in production via the GUI and have no error messages or issues.

I'm not sure what splitting across a dissimilar site would do, but I would assume as long as the network and storage structure are the same, I should be able to migrate inventory between the two hosts w/o issues. I'm not doing anything fancy like clustering, HDR, etc so this is a possible solution for me.

I will still need to try like firmware HBA's and then try between two i4.1 hosts.

leejb · ‎02-05-2011

After replacing the HBA in the ESX4 with the same model and firmware ver as in the i4.1, no change in reaction. I could still manually force mount with both hosts in the same VM via CLI, but GUI failed the i4.1 seeing the existing production VMFS as a snapshot and having to resignature a newly created one. Again, if I moved the i4.1 to it's own Site, I was able to mount both through the GUI.

I've decided not upgrade my ESX4 host for now so there is still a possibility that two i4.1 would not have this problem. Also as noted by opbz, it may also have to do with how my storage is reporting the luns to different hosts. Lastly, I <edit>did have an off channel contact point to a</edit> KB article 6482648 which talks about the shared storage needs to be presented under the same LUN ID to both hosts. This had no effect for me, but then again my storage LUN presentation may be usurping this.

From all accounts things seem to be a whole lot easier by going to iSCSI. Fiber was the best solution for me when I first built our system (and iSCSI was new then), but it looks like it maybe the best route unless you have hard core I/O and even then its 8GB Fiber vs 10GB iSCSI.

Lastly, rumors on several posts have stated that ESXi 5 maybe updating the 32bit SCSI protocol now used to go beyond the 2TB limit for RDM and possibly more options and efficiency.

Josh26 · ‎02-06-2011

leejb wrote:
Also as noted by opbz, it may also have to do with how my storage is reporting the luns to different hosts. Lastly, I did find a an off channel contact specified KB article 6482648 which talks about the shared storage needs to be presented under the same LUN ID to both hosts. This had no effect for me, but then again my storage LUN presentation may be usurping this.

You can check this.

Can you get us a screenshot of the "storage adapters" configuration tab, with each HBA selected?

bulletprooffool · ‎02-07-2011

I presume you have actually exported the NFS store to the new host (on the NFS server?)

One day I will virtualise myself . . .

leejb · ‎02-07-2011

Hope my ignorance isn't showing here too bad, but I do not have any NFS volumes. This was originally wanting a i4.1 to mount an existing VMFS volume already in use by a ESX4.0. Process with my storage is pretty simple... storage lists all initiators seen on fabric, assign initiator to logical disk with LUN assignment, commit change, rescan HBA on host... then the sequence of events detailed above.

With regards to the screenshot requests, I could, but not sure what you are wanting to see and I'm would have to bring down and swap HBA's again, right now not wanting to take the time to do again, unless there is a strong theory for me to test or check. It seems it's how the host sees the volume, i.e. as a snapshot or not. If it does see it as a snapshot, I cannot add that storage to the host via the GUI unless I move it to a different site, otherwise I have to force mount via the CLI command.

Now how or what is telling the host that it is a snapshot... I don't know where that is stored. Ideally that would be a simple bit change or entry in a config file I could edit, i.e. forcing esx to view it as not a snapshot. The theory I'm left with, posed by opbz, is that HBA and storage could have an effect on this... but again without knowing/understanding how the snapshot 'tag' is set or where and what the difference in LUN presentation by the storage or seen through a specific HBA model/firmware combo (as the host sees it and tell the difference when it isn't presented the same... as I said, the EUI.xxx is identical from both hosts and all HBA's used) or a diff. in how ESX4 vs ESXi4.1 interpret these.

Regardless, if I can't force how the host sees it, there's nothing I can do other than change hardware storage HW, try diff. HBA, move to diff. medium (iSCSI) which are not at my displosal, the previously stated KB article seems to imply a force mount via CLI poses no potential damage or I can move under diff. site <edit>Seems this is not necessarily true. After further testing like moving HBA around, using diff. HBA... moving to diff. site doesn't necessarily mean you can mount through GUI. Still trying to figure this one out.</edit>, I have a work around. The only thing I haven't tried yet is to have both hosts be i4.1 or 4.0... not sure if now warrants the effort. If I get around to either, I'll update post.

Josh26 · ‎02-07-2011

Hi,

I don't think this is an ignorance issue - you aren't using NFS so questions around it aren't related.

Have a look at the screenshot attached.

What you want to see is the "runtime name" identical across your servers.

All

ESXi 4.1 not recognizing existing VMFS volume.