VMware Cloud Community
ekisner
Contributor
Contributor
Jump to solution

VCB not mounting VMs :(

I had it working... I honestly did. It was the coolest thing I've ever seen (well not quite... but it comes close!

0 Kudos
1 Solution

Accepted Solutions
polysulfide
Expert
Expert
Jump to solution

Did you tell them that you connected a VCB proxy to them without disabling automount?






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

View solution in original post

0 Kudos
22 Replies
Gerrit_Lehr
Commander
Commander
Jump to solution

The line

[2008-06-23 16:11:20.489 'App' 3504 error] No path to device LVID:4747a6fc-63cff054-f99c-001cc4ed7c5c/4747a6fb-bfe7bb5e-4d50-001cc4ed7c5c/1 found.

looks like there is a path problem between the Proxy and the LUN storing the VM. Have you trued to add the parameter -m san ? Also, make sure the proxy has access to the specific LUN and is not multi-path connected to the san. Check the zoning, LUN Masking, did you change anything since it last worked? Did the VM move to another Datastore which is not accessable from the proxy?

Kind Regards,

Gerrit Lehr

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

Kind regards, Gerrit Lehr If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
ekisner
Contributor
Contributor
Jump to solution

As to what I changed, I changed one thing: I edited the VM to add another HDD and then moved the backup exec backup-to-disk folder to that HDD.

I did, however, do something rather silly the other day (and it caused me an absolutely insane amount of grief).. I unplugged the wrong NICs on the ESX hosts. Those NICs were what provided the connection to the SAN. One of my hosts was okay... but the other I had to remove the iSCSI send targets, reboot the host, re-add the send targets, and then rescan the HBA for it to see the SAN again. Which was a real pain, as my virtual center / licensing server was on that host!!

The BE / VCB server has been restarted serveral times since this "mishap", and it did have access before (the initiator didn't change). The server being targetted in this command is stored on two separate LUNs... one LUN I have contains VMDK files for "C Drives" (operating system partitions) and one LUN is for production data. Before the problem, it was able to see both and mount all drives successfully.

I just added the -m san switch, and these logs are the result.

[2008-06-24 08:31:51.130 'App' 180 trivia] Attempting to open LVID:4747a6fc-63cff054-f99c-001cc4ed7c5c/4747a6fb-bfe7bb5e-4d50-001cc4ed7c5c/1.
[2008-06-24 08:31:51.130 'App' 180 error] No path to device LVID:4747a6fc-63cff054-f99c-001cc4ed7c5c/4747a6fb-bfe7bb5e-4d50-001cc4ed7c5c/1 found.
[2008-06-24 08:31:51.130 'BlockList' 180 error] 
[2008-06-24 08:31:51.130 'SOAP' 180 trivia] Sending soap request to [TCP:systems-console:443]: release
[2008-06-24 08:31:51.209 'BlockList' 180 info] Closing connection systems-console:administrator
[2008-06-24 08:31:51.209 'SOAP' 180 trivia] Sending soap request to [TCP:systems-console:443]: logout
[2008-06-24 08:31:51.209 'vcbMounter' 180 error] Error: Failed to open the disk: Cannot access a SAN/iSCSI LUN backing this virtual disk. (Hint: Option "-m ndb" swtiches vcbMounter to network base disk access if this is what you want.) 
[2008-06-24 08:31:51.209 'vcbMounter' 180 error] An error occurred, cleaning up...

0 Kudos
polysulfide
Expert
Expert
Jump to solution

The host that VCB is running on is not able to connect to the LUN that the VM is on. If you switched hosts, make sure that the new host has the same connections as the old host. If it's the same host make sure that your iSCSI initiator is connected or that your Fiber is connected (Check fabric ACLs if its a new host) and try again.






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

I am able to browse both datastores within both esx hosts (connecting through VI client directly to the esx hosts to eliminate any "funny stuff")... if I'm able to do this, that would mean to me that the host can access the datastores -am I wrong?

The VM is on the same host as before, as the tape drive is an external scsi autoloader attached directly to the host via an HBA.

Thanks.

0 Kudos
polysulfide
Expert
Expert
Jump to solution

That means that the VC server can access the data stores, not your backup server. I think you can do a VCB over LAN if you're using 2.5 but if its less than that your backup server needs to have an actual SCSI (iSCSI or FC) connection to your data store.






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

The VCB proxy has the MS iSCSI initiator software installed on it... that software does correctly show both LUNs. I've also completely reset the initiator settings and re-done them.

I'll try re-installing the initiator, but I'm not gonna keep my hopes up on that one -will let you know how that turns out.

0 Kudos
polysulfide
Expert
Expert
Jump to solution

If the initator sees the LUNs that should be sufficient for VCB. (Make sure you're connecting them, not just that the path is available)

Can you do me a favor, make sure that your DNS entries are correct for your virtual center server, the target machine, and the ESX server.

Then use fully qualified domain names for the host and target machines in your script. Repost your full command line and the new log.

Thanks,






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

I am unsure of how to actually browse the LUNs from within the VCB proxy... my only indication that things are working as they should on the initiator is the fact that I can remove the target, watch the bound volumes disappear, then re-connect to get the target back, and watch the bound volumes come back.

C:\Program Files\VMware\VMware Consolidated Backup Framework>vcbMounter.exe -h systems-console.countygp.ab.ca -u administrator -a ipaddr:sql01.countygp.ab.ca -r C:\MNT\sql01 -m san -L 6 > out.log

Password:

I ran a bunch of nslookup commands to get the DNS / IP stuff checked, and it all looks right... also checked our DNS server.

Not sure how much you know about iSCSI, but I see in the initiator, when I pull up the connection properties for the target, it says that the source address is 0.0.0.0:1973... that is not a valid IP address if I remember correctly? Destination IP correctly points to the portal, but the source looks a little funky... thing is, I cannot see any way to change that.

0 Kudos
polysulfide
Expert
Expert
Jump to solution

0.0.0.0 isn't a valid IP I'm not sure why it would say that. Computers are often mysterious Smiley Wink As long as you can see the LUNs?

Are you binding your VCB proxy host to the LUNs on your iSCSI target?

Make sure you've used diskpart to disable automount (VCB docs or Google) on your VCB Proxy host or you can corrupt your VMFS volumes.

Make sure you're actually binding to the VMFS LUNs. (You should be able to see disks with unknown partition types in Disk Management or you're not connected to the LUNs)

Do you possibly have a max connections issue with your iSCSI target? (Have you added any ESX hosts since your last success?)

All else failing, try using -m ndb instead of SAN to clone the disks over the network instead of SAN while you get things resolved. You should get similar throughput to iSCSI that way anyway using a software initiator.






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

ekisner
Contributor
Contributor
Jump to solution

The plot thickens... no it is not showing up in the disk managment. And automount was not disabled (guess I should have gone through the documentation better eh?) however it looks like all of the vmfs volumes are unharmed.

I'll spend a bit of time trying to get those disks to show up in disk managment -no reason for me to ask anyone here to help me with that one. I never knew that the disks were supposed to show up in the disk manager... makes sense now, but doh.

I'll let you know.

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

I have the volumes showing up in the disk manager now (I haven't a clue how I did it either... funny how that works eh?). The VCB proxy still ends up doing the same thing, but it seems to take a whole lot longer now (talking about the snapshot, deletion of said snapshot, and then releasing the disk lease).

I am somewhat interested in these lines of the log though... could MPIO stand for multipath IO? If I remember right, VCB does not support MPIO...?

[2008-06-26 10:45:07.097 'App' 1676 trivia] Evaluating 1 paths.
[2008-06-26 10:45:07.097 'App' 1676 trivia] Trying to open path \\?\mpio#disk&ven_hp&prod_msa1510i_volume&rev_1.32#1&7f6ac24&0&36303538423330393542393230413030394530393844303134#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}.
[2008-06-26 10:45:07.097 'App' 1676 info] Now using Path \\?\mpio#disk&ven_hp&prod_msa1510i_volume&rev_1.32#1&7f6ac24&0&36303538423330393542393230413030394530393844303134#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}.

0 Kudos
polysulfide
Expert
Expert
Jump to solution

mpio probably isn't the issue or you'd see a related error. I imagine that's just the pull path enumeration of the LUN and you only have one path anyway.

I would check if your LUNs are still visible in Disk Management directly after the failure. Are you using iSCSI for anything else? This could be a tuning or packet loss issue / bad IP connection.

Is your iSCSI on it's own subnet? Does your host have a dedicated adapter on that subnet? Does your switch support "Jumbo Frames?" Have you adjusted the TCP Window size on the VCB proxy itself?

Dedicated subnet, dedicated adapter, large MTUs 9999, Large TCP Window Size

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\[Adapter Name]\Parameters\Tcpip\TcpWindowSize

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\[Adapter ID]\MTU

Try quadrupling the default TcpWindowsSize and setting the MTU to 9999 (decimal). You'll need to reboot.






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

I wrote a batch file to simplify my testing... included right after the mount attempt is the line

ECHO exit | ECHO list disk | diskpart

Before the mount attempt, and after, the disks show up the same. VERY frustrating! Smiley Sad

I think, as a last attempt, I'm going to try and reboot the san. Gonna have to wait until 6pm my time to do it though >.<

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

So this is fun... I'm on the phone right now with tech support, because although the LUNs are showing up in ESX, none of the VMFS volumes are.

And the tech just tried to format about $500,000 worth of data because the Add Storage wizard said the LUN was empty.

>.<

0 Kudos
polysulfide
Expert
Expert
Jump to solution

Did you tell them that you connected a VCB proxy to them without disabling automount?






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

No... I think that may have been smart to tell them, as here is what happened (after 3 HP technicians and 2 VMware storage specialists). We found out that the partition tables were gone. Kaput. Because ESX had the partition tables cached, it was still working... however when I rebooted, the cached tables were gone. Thus it was able to see the LUNs, but not the storage inside.

And unfortunately, beacuse one of the LUNs was 2TB in size, the fdisk util wasn't able to fix it... the second VMware tech ended up needing to use partedUtil or a program similar to that name to manually build the partition table.

And it only took 17 hours of overtime! Yehaw >.<

0 Kudos
polysulfide
Expert
Expert
Jump to solution

Is your VCB working now? Smiley Wink

Go to vmworld and read the Advanced VMFS presentation. It'll tell you how to backup your partition tables. I think there's a VCB one also that outlines what to do if this happens to you.






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

0 Kudos
ekisner
Contributor
Contributor
Jump to solution

Truth be told, I'm afraid to try VCB again hahah.

I'll go read the vmworld stuff and back up my partition tables before I make any more attempts.

0 Kudos
polysulfide
Expert
Expert
Jump to solution

That's always a good practice. As long as you're got the automount off, you shouldn't have any more corruption issues with VCB, it works great.






If it was useful, give me credit

http://communities.vmware.com/blogs/polysulfide

VI From Concept to Implementation

0 Kudos