VMware Cloud Community
PauliPinguin
Contributor
Contributor

vcbMounter: "Error: failed to open the disk: One LUN could not be opened"

Hello all,

We're encountering a problem with vcb and could not find a solution. We have two ESX3.0 and one Server for vcb with access to the vmfs-Storage. On the vcb-server alle paths except one to the LUN are disabled as described in the knowledge-base.

Everytime we run vcbMounter, the snapshot is being created and some files (*.log, .vmx) are being copied to the destination path. Then vcbMopunter runs at an error:

vcbMounter: "Error: failed to open the disk: One LUN could not be opened"

Could anyone help?

Many thanks

Reply
0 Kudos
37 Replies
geckman
Contributor
Contributor

The solution to this issue is to make sure that all inactive drives can not be seen in the Disk Management console within the VCB Proxy Windows server. Remove any multipathing software on the VM, remove redundant zones on your fiber switches, select "do not use this device (disable)" within Windows for any inactive drives remaining in the disk management console (specifically these are the drives that will show a red "no entry" symbol in the console). Once all of your inactive drives can not be seen within the disk management console on the VCB proxy, your problem should disappear - as long as you also have LUN alignment between your ESX servers and the VCB proxy. Let me know if this works for you all.

Reply
0 Kudos
geckman
Contributor
Contributor

When I say remove redundant zones on the fiber switches, I mean remove the redundant zone sets for just your VCB proxy server. This is assuming you have two fiber switches each with two zone sets for your HBAs. Your configuration may be different.

Reply
0 Kudos
todd_shawcross
Contributor
Contributor

You don't need to reboot if you get "Operation failed due to concurrent modification by another operation." error. Just restart the management service on the ESX HOST with the following command:

/etc/init.d/mgmt-vmware restart[/b]

Once restarted, you can delete the snapshot.

Reply
0 Kudos
bpierfy
Enthusiast
Enthusiast

I'm having the "One LUN could not be opened" problem as well. First, we had been backing up (successfully) one ESX server using VCB. Since we now have 3 servers, I changed the config file to point to VC to backup. Now I am getting this error. We have also since then deleted our main datastore and created two new ones.

Thinking it is a LUN issue, I removed the proxy's access to all LUNs and added them in the order they appear on the ESX servers. Still get the same error though.

In VC, I see the "concurrent operation" error when it tries to delete the snapshot. I've restarted the mgmt services on all 3 ESX servers and still get the same error.

What is left to do? How can I make sure that the LUN IDs are the same or that the proxy sees the exact same thing as the ESX servers do? There are also no other inactive disks in the Windows Disk Manager...

Help!!

Reply
0 Kudos
beneddt
Contributor
Contributor

Same here bpierfy. It was working fine. I'm trying to correlate it with either a VC patch, or one of the ESX patches that came out. Unfortunately one of the admins put all the patches on in one day, making it hard to determine which one did it.

Time to open a support case I suppose.

dean

Reply
0 Kudos
bpierfy
Enthusiast
Enthusiast

I also applied the patches in sequence - all at once... The only thing is I didn't do the ones that came out 3/29/07.

I did notice that there seems to be a VCB 3.0.1 - does this mean I needed to upgrade VCB or (for my case) the BE10d interoperability module when I upgraded from 3.0.0 to 3.0.1 and applied the patches?

Reply
0 Kudos
eharvill
Enthusiast
Enthusiast

Another quirk I run into (and maybe I don't have something configured properly) is although I disable the disks in HW manager I am hosed when a LUN fails over or trespasses from one SP to another SP. I now have the wrong disk disabled and my LUN essentially goes away.

Is there a workaround for this or do I have to manually enable/disable disks every time a LUN trespasses?

Reply
0 Kudos
ZMkenzie
Enthusiast
Enthusiast

Are you using a EMC clariion system or other disk arrays with active/standby controllers? Usually this error happens when you have a single path between vcb proxy and the storage controller and the storage sometimes "tresspass" the lun from a controller to another and vcb is unable to see the other one and also, not having drivers, is not able to tresspass it back.

Usually we use a "navicli" script from the vcb proxy to be sure that the lun is presented on the right controller before running vcb scripts.

Hope this helps,

regards.

Reply
0 Kudos
bpierfy
Enthusiast
Enthusiast

I don't personally care about redundancy for the proxy, but do you have that navicli script you mentioned? My VCB problems seem to be stemming from differing LUN IDs, and although I think I can fix that problem, it would be nice to have a script that would prevent (or circumvent) the problem in the future.

Reply
0 Kudos
eharvill
Enthusiast
Enthusiast

Yes, it's a Clariion CX3-20. I know, in theory, the Clariion's aren't supposed to trespass unless there is an issue with an SP, but I've seen it happen a few times and am not a SAN guy to know enough as to why. I'm pretty sure both SPs are fine. This is not an issue with the ESX hosts since they can see all the paths, but obviously is an issue with the VCB proxy and how it is supposed to be set up for single pathing.

I would definitely be interested in your navicli script as well if you would be willing to share. Thanks!

Eric

Reply
0 Kudos
toe-mas
Contributor
Contributor

I had the same problem and was able to fix it by putting ESX and VCB proxy servers to the same Storage Group in Navisphere. I have EMC CX-500. When I had them in different storage groups, even though I was assigning LUNs in the same order, they still had different Host IDs[/b] (LUN IDs were the same). In Navisphere you can verify if they have the same Host IDs, by right clicking on host, selecting Properites and clicking on Storage tab - you will be presented with LUN information (LUN ID, Host LUN ID, Capacity, etc)

Also make sure LUNs are visible in Device Manager on your VCB proxy (but that was already mentioned before). In order to do it you might need to reboot your proxy (rescanning drives didn't work for me). Hope this helps

Reply
0 Kudos
MaDax
Contributor
Contributor

Awesome, thanks for that Tomasz. Thats fixed the problem we had.

We use a CX3-20 and the VCB proxy was in a different storage group, placing it in the same storage group as the ESX servers cured the problem.

Cheers

Reply
0 Kudos
beneddt
Contributor
Contributor

We found out our issue as well (thanks again VMware support!)

Apparently the SAN folks forgot to set the 'common serial bit' on our EMC Symmetrix array, which presented each lun with different serial numbers to each host. They were shocked it was working at all (which it was, for about a month).

Anyway, setting that to enabled, and re-signaturing all the LUNs solved our problem.

Reply
0 Kudos
shlomo_rivkin
Contributor
Contributor

Thank you very much, toe-mas!!!

We had the same weird problem with HDS 9585.

After managing a little war with our storage guys, we've finally solved it.

Reply
0 Kudos
Jeff1981
Enthusiast
Enthusiast

Thanks toe-mas, I'm almost certain this will solve our problem as well (we are receiving the same error message for some (not all) of our VCB backups, only for those which are on a newly created LUN, which does indeed have a different Host ID in the storage group of our VCB proxy)

We're just investigating what the conqesquences are (if any) for putting our VCB Proxy in the same storage group as our ESX servers before we will actually do this.

Reply
0 Kudos
uma_kits
Contributor
Contributor

Now, I can solve this problem this below check list.

1. Verify SAME LUN ID on VCB Proxy and VMFS Database.

2. Verify DONT install multipath software such as RDAC virtual disk

3. On guest DONT check independent-persistent.

4. This is useful document about VCB.

http://download3.vmware.com/vmworld/2006/labs2006/vmworld.06.lab01-VCB-MANUAL.pdf

www.vmware-tsx.com/download.php?asset_id=49

Reply
0 Kudos
Jeff1981
Enthusiast
Enthusiast

Toe-mas (and others), you don't have to put the VCB server and ESX servers in the same storage group on your SAN. We had the same problem, also getting the " One LUN could not be opened" error and seeing that the LUN in question had a different Host ID in the two storage groups where we placed it (one SG for our ESX servers, on for our VCB server)

The trick is to go to the properties of the VCB Storage Group, go to Select LUNs and deselect the LUN with the different host id. (you put it to the window in the left) Click OK, go to the ESX Storage group and verify which Host ID it should be. Then, back to the properties of the VCB Storage group, Select LUNs and add the LUN in question (set your View to "All" in the dropdown list on the left) Click the LUN and put it in the window on the right. Now, before clicking OK, scroll to the far right in the right window. Under Host ID, you can click next to the LUN in question and select the Host ID it should receive. Select here the Host ID you previously noted in the ESX Storage Group. Click OK. On your VCB server, rescan disks and retry the VCB backup. All should work well now.

Good luck!

Reply
0 Kudos
Peter_Channing
Contributor
Contributor

Any chance anyone has used VCB with extended VMFS volumes? I'm just starting the setup and have run a few test jobs which fail with the same error.

I suspect its the extended LUNs but not 100% sure.

Any ideas?

Reply
0 Kudos