VMware Cloud Community
dspent
Contributor
Contributor
Jump to solution

VM has crashed and it can't get up...

Let me explain the scenario....

We have a vm (Windows 2003 Server) that had 2 virtual disks... C:\ 30GB and D:\ (unknown, but I believe it was a 400GB disk)

On Friday, 3/14/2008, it was apparently in the process of being backed up by ESX ranger when something happened (don't know what) that caused a massive amount of delta files to be created (68 of them of various sizes). On monday, our resident systems architect came in early to find what appeared to be a hung, non-responsive vm. He could not do anything with it, so he rebooted the host... Once the host came back up, the vm in question would no longer start. It originally gave the following error:

This CPU is VT-capabale, but VT is not enabled (check your BIOS settings). <----- this appears a few dozen times and then...

Cannot open the disk '/vmfs/volumes/47666f18-b53352a8-a394-0019b9f5a44c/ktboapdv01/ktboapdv01.vmdk' or one of the snapshot disks it depends on.

Reason: too many levels of redo logs

We called vm support and after tinkering around a bit, the support guy says, we're hosed and there is no way to recover this server, thank you, have a good day, check our website for a better backup solution....

I refuse to believe that with a folder that contains every file (vmdk, vmx, delta etc...) that there is no way to at least recover the data. At this point I don't care about the server. If I can just get the data it would be like Christmas in March and I would be hailed as the new Santa Claus. Anyway, after trying to , as he said it, compile the snapshots, this is the error we are getting:

This CPU is VT-capabale, but VT is not enabled (check your BIOS settings). <----- this appears a few dozen times and then...

Cannot open the disk '/vmfs/volumes/47666f18-b53352a8-a394-0019b9f5a44c/ktboapdv01/ktboapdv01.vmdk' or one of the snapshot disks it depends on.

Reason: Device or resource busy.

Can anyone help me here? Be advised that I am as newbie as they come with linux and vmware, but I guess the best way to learn something is trial by fire.... And its damn hot here...

Here's what I have tried so far:

1. Build a new vm and attach the vmdk file of the bad vm to it.... fail

2. Tried to copy all of the files from the current folder to a new folder so we could work without loss of data.... Only the even numbered deltas etc.. would copy, the odd give the Device or Resource busy error

3. Tried to find the process that is locking things up.... not sure if I did that correctly, so I don't want to say it failed, but obviously I didn't find it....

4. Tried using one of the deltas to boot from... fail

5. Prayer...jury is still out on that one..

Any help is appreciated. I have attached a text file that shows what's in the folder now....

Thanks,

Clarence

0 Kudos
1 Solution

Accepted Solutions
Rob_Bohmann1
Expert
Expert
Jump to solution

I agree that the cpu message is misleading and most likely not your problem. The device or resource busy message about your vmdk/snapshot files is probably where the problem lies.

If you know what esx host your vm was last running on (should be able to see in vc) then you can try this: http://communities.vmware.com/message/568371

#lsof |grep servername

that will list any open files related to you vm (that equals the value you put for servername above where servername is the file name with the .vmdk extension)

if you get any positive responses from that command then you want to kill those processes as shown in the link above. You may need to run the lsof command on all the hosts in the cluster to find the one that has a lock on the file, if indeed there is a lock on the file(s).

If snapshot manager does not help you resolve it, then try this. Good luck. If neither snapshot manager or what I outlined above helps, I would try to get vmware on the horn again and see if you can escalate the case to someone who can help. Try to call when the people in Cork are on the line (very early U.S. time), I have had very good service from them.

PS if you haven't already, take a look at the vmware.log files to see if there are any clues there too

View solution in original post

0 Kudos
5 Replies
opbz
Hot Shot
Hot Shot
Jump to solution

no idea what you did but that file listed looks REALLY nasty!!

Just clear up one thing. That VT error message you are getting is irrelevant. THis is a setting you can set if you go into the bios of your server under cpu you will find VT settings you can enable it from there. THat will get rid of that message.

Do you have any idea what caused all these snapshots? Did try using VCB or what?

what happens if you go to snapshot manager? does it list all your snapshots? Are you able to select delete all? THat will consolidate your disk you will also need a lot of space. You can get rid of spme of the stuff you have like vmware.logs...

You cal also look at all the smaller vmdk files you will see each points to its source file ensure those files are there.

0 Kudos
dspent
Contributor
Contributor
Jump to solution

When I first talked with vmware support there were 2 snapshots... One was Consolidated Helper and below that was one that said ESX Ranger xxxx (where xxxx is something I have forgotten). We tried going deleting the snapshots or I should say reverting to the parent. In the Vi Client it appeared to do that, but the server would not start and now there are no snapshots showing in snapshot manager.

As I understand it, ESX Ranger backup software caused the snapshots. I have an open ticket with them but no luck yet as to how to resolve the problem. I don't know what VCB is or how to use it... Please explain.

I will look at the smaller files while I await your response....

Thanks.

0 Kudos
Rob_Bohmann1
Expert
Expert
Jump to solution

I agree that the cpu message is misleading and most likely not your problem. The device or resource busy message about your vmdk/snapshot files is probably where the problem lies.

If you know what esx host your vm was last running on (should be able to see in vc) then you can try this: http://communities.vmware.com/message/568371

#lsof |grep servername

that will list any open files related to you vm (that equals the value you put for servername above where servername is the file name with the .vmdk extension)

if you get any positive responses from that command then you want to kill those processes as shown in the link above. You may need to run the lsof command on all the hosts in the cluster to find the one that has a lock on the file, if indeed there is a lock on the file(s).

If snapshot manager does not help you resolve it, then try this. Good luck. If neither snapshot manager or what I outlined above helps, I would try to get vmware on the horn again and see if you can escalate the case to someone who can help. Try to call when the people in Cork are on the line (very early U.S. time), I have had very good service from them.

PS if you haven't already, take a look at the vmware.log files to see if there are any clues there too

0 Kudos
dspent
Contributor
Contributor
Jump to solution

Rob...

You are the genius of the day.... The server is back up and running although I have no idea what happened to the second disk... I'll take what I can get for now....I talked with the developers a second ago and they only need the C drive back anyway.

Thanks 1000%, you have made my weekend!!!!!!!!!!!!! If ever comes a day in this small world where we should cross paths, I owe you one.

0 Kudos
dspent
Contributor
Contributor
Jump to solution

Now that the server is back up, the developers that use it have notices that it appears to be using files from a month ago.... How can i get it to go back to on around the day it crashed?

0 Kudos