VMware Cloud Community
ysflnm
Contributor
Contributor

ESXi crash - cannot start VM's nor delete files from disk

Hi

I have a ESXi server which crashed during the weekend with pretty much no warning. Last I checked it had sufficient space (10 gigs of 64), but apperently not.

When I try to power on a VM I get "Could not power on VM: No swap file". So I looked at this post: http://communities.vmware.com/thread/50436

Problem is, going to the "hidden console" and typing "kill -9" and "ps auxfww | grep is the name of my virtual machine for example: "WinXPProSP2". ps + grep returns a new line/console input (nothing found) and -9 is an unknown param. -TERM also.

Before all this I tried to go to the datastore and delete unused machines. This gives me the following error: "General fault caused by file". Most of the VM's I'm trying to delete were not mounted or in use by the server so I don't know why the server would lock them? I tried going into the /vmfs/volumes/datastore-folder to delete the files manually with rm -r but I get this error: "Unable to remove http://... Input/output error". Also tried with -rf. Even the log files.

Someone on another post suggested booting into linux or linux-up mode. I tried to find the grub or lilo conf, but with no luck. I also don't know what to do when I get there and how to reverse my changes so I can boot into VMware again. I also tried booting the server with a Gentoo minimal disk and mount the disk - but unsure which disk sda2 or sda4 to use. Only these had partitions but unknown to fdisk.

Is booting into linux(-up) the way to go, and most importantly how do I do it, or do you guys recommend something else?

Reinstalling is sadly not an option.

Thank you.

Reply
0 Kudos
8 Replies
DSTAVERT
Immortal
Immortal

To start please post the output from

fdisk -l

and

df -h

-- David -- VMware Communities Moderator
Reply
0 Kudos
DSTAVERT
Immortal
Immortal

Please describe "crashed" and how you found the machine and what you did afterwords? Restarted etc. I wouldn't do anything drastic or even try powering on until you have a better idea of what condition you are in.

-- David -- VMware Communities Moderator
Reply
0 Kudos
ysflnm
Contributor
Contributor

Crashed - as in looks like the power in the building went off. Another server was turned off. The ESXi server was on though. A coworker told me he couldn't access it so I went to the Infrastructure Client to turn them on agian. Got the error "no swap" as described above. Tried to turn the machines on again and this time it worked for a couple of hours. Later the coworker reported the same problem and now I can't power any of the VM's on, and so I did what I described in my first post.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

Use the unsupported console (ALT + F1) type unsupported -nothing will echo to the screen- and press enter. Now type the root password.

type

uptime

It will tell you how long the server has been running. Did it really go off??

collect the output of

fdisk -l

and

df -h

-- David -- VMware Communities Moderator
Reply
0 Kudos
ysflnm
Contributor
Contributor

uptime is 19:09. Which makes sense since I rebooted it yesterday trying to mount the disk with Gentoo to remove the files.

df -h

Filesystem Size Used Available Use Mounted on

Unknown 188,3m 121,5m 66,8m 65% /

Unknown 63,3g 63,3g 0 100% /vmfs/volumes/49f322f1-cdb87870-dbee-0014221a7bfe

Unknown 539,8m 180m 359,8 33% /vmfs/volumes/a79407ec-71c546c0-1368-0fcab0ae7895

Unknown 47,8m 1,0k 47,8 8% /vmfs/volumes/104f587b-25da6104-1edc-048421ee622d

Unknown 47,8m 38,7m 9,1 81% /vmfs/volumes/aa5e9d78-7dbeca3b-3663-7ee79b924cc2

Unknown 4,0g 1,0g 3,0g 25 /vmfs/volumes/49f322c7-2342ee58-7769-0014221a7bfe

fdisk -l

Device Boot Start End Blocks ID System

/dev/disks/vmhba1:0:0:1 5 751 763904 5 Extended

/dev/disks/vmhba1:0:0:2 751 4845 4193280 6 Fat16

/dev/disks/vmhba1:0:0:3 4844 69880 69595840 fb vmfs

/dev/disks/vmhba1:0:0:4 * 1 4 4080 4 fat16 to truncate the file, but with no luck.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

I would get a copy of the files off the server before anything else. Like I said the icons don't display as they should and the xp-flat.vmdk or whatever your vmdk is called, file is displayed. Normally the xp-flat.vmdk file is hidden and only the xp.vmdk is shown. Not a good sign. I can easily be wrong.

If you get a copy of the files you can try doing some things with the files to see if you can make a VMware player image or just try mounting the disk. If the disk is OK consider replacing the disk in the server and preserve to old one. Reinstall ESXi and then use converter to copy the VM back to the server.

-- David -- VMware Communities Moderator
Reply
0 Kudos
ysflnm
Contributor
Contributor

I'm backing up the files as I write this and is going to reinstall the server. I'm copying the files AND exporting them as virtual appliances. Hopefully I can reimport them.

Which icons are you referring to?

I can view both the -flat and normal .vmdk-file. Nothing is hidden.

I'm 100% sure the disk is OK like I wrote. I think this whole problem was caused by the disk being full and then the power failure in the building.

This still doesn't explain why I can't remove the files from the server. I tried stopping all the services and tried to find a process using the files via the ps auxfww command from the thread I linked to. But with empty results with all the VM's. So it doesn't look like a process was locking the file from deletion. So this goes back again to not being able to delete a file even though the disk is full. Which I don't understand. As I wrote I even tried to truncate it and also tried vmkfstools to delete, shrink and create an empty disk on top of the old. Shrink is deprecated in v. 3.0.x which I don't understand either (why would you remove this function?) and delete and create both returned failure due to lack of disk space.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

the -flat file should NOT display. It should be hidden.

-- David -- VMware Communities Moderator
Reply
0 Kudos