GlenB
Contributor
Contributor

Could not power on VM: no swap file

My ESXi 3.5 machine runs 8-10 VMs (Win2k3 and WinXP) normally. At the moment, 5 of them are complaining that they cannot Power On. They seem to start and then complain "Could not power on VM: no swap file". I had a look with the data browser. It's a small installation, so the vswp files ought to be in the same directory as the vmx file (I did not inttionally put them anywhere else). Of course I don't see a vswp file there because the machine is not running. I don't know enough about the vmx file structure to identify if anything is wrong in the specifications. I have downloaded one of the vmx files and attached it here. Please either tell me what to change in that vmx file, or suggest another approach to get the machines to start.



Regards - Glen

Regards - Glen
Tags (4)
0 Kudos
42 Replies
Chamon
Commander
Commander

SSH for root is disabled by default. You will need direct console access to enable root SSH

0 Kudos
GlenB
Contributor
Contributor

OK, discovered how to login as root usign the unsupported option. Attached is ls-lh for the Web machine directory. There was also an error message that appeared on the console that didn't appear in the output when I piped it into a log file (also attached). The complete output was:

  1. ls -lh

ls: ./Web-93b0ea0b.vswp: No such file or directory

-rw------- 1 root root 2.0G Feb 12 03:14 Web-000001-delta.vmdk

-rw------- 1 root root 241 Feb 12 02:41 Web-000001.vmdk

-rw------- 1 root root 261.1M Oct 12 17:54 Web-Snapshot1.vmsn

-rw------- 1 root root 8.0G Oct 12 17:53 Web-flat.vmdk

-rw------- 1 root root 8.5k Feb 12 03:14 Web.nvram

-rw------- 1 root root 396 Oct 12 03:53 Web.vmdk

-rw------- 1 root root 482 Oct 12 17:54 Web.vmsd

-rw------- 1 root root 2.4k Feb 12 16:13 Web.vmx

-rw------- 1 root root 258 Feb 12 03:19 Web.vmxf

-rw------- 1 root root 2.4G Feb 12 03:14 Web_1-000001-delta.vmdk

-rw------- 1 root root 245 Feb 12 02:41 Web_1-000001.vmdk

-rw------- 1 root root 8.0G Oct 12 17:52 Web_1-flat.vmdk

-rw------- 1 root root 398 Oct 12 03:53 Web_1.vmdk

-rw-rr 1 root root 0 Feb 12 17:45 ls.log

-rw-rr 1 root root 15.7k Feb 12 04:00 vmware-45.log

-rw-rr 1 root root 15.7k Feb 12 04:48 vmware-46.log

-rw-rr 1 root root 15.7k Feb 12 05:25 vmware-47.log

-rw-rr 1 root root 15.7k Feb 12 05:33 vmware-48.log

-rw-rr 1 root root 15.7k Feb 12 05:34 vmware-49.log

-rw-rr 1 root root 15.7k Feb 12 05:36 vmware-50.log

-rw-rr 1 root root 15.6k Feb 12 16:13 vmware.log

-


I'm thinking that the erro about not being able to find the vswp file is significant ... but what was it that it ran into that made it expect to find that?



Regards - Glen

Regards - Glen
0 Kudos
GlenB
Contributor
Contributor

DELETE (((I have direct console access - how do I enable SSH?)))

OK, found out the answer at http://www.vm-help.com/esx/esx3i/ESXi_enable_SSH.php

and now I have SSH working. Attached file is the output from ls -lh



Regards - Glen

Regards - Glen
0 Kudos
GlenB
Contributor
Contributor

Rubeck -- I think I see the problem but I don't know how to fix it --- there is a "phantom" vswp file. These are the 3 situations:

1 - Using the VC Datastore Browser on the messed up VM I do not see any vswp file at all

2 - using HTTP to the host machine and browsing the data volume I see:

DC2-93b09a46.vswp modified 31-Dec-1969 @ 23:59 size = -1

3 - using Telnet SSH to the host machine and running ls -lh on the messed up VM's directory it generates the error:

ls: ./DC2-93b094a6.vswp: No such file or directory

So is that vswp file really there or not? Using Telnet SSH I cannot delete it or copy something to that name or list it. I'm pretty sure that if I could delete all references to that in the directory then the VM would power on OK.

PS - I found an earlier post of yours that suggested making the allocated and reserved memory the same so that so vswp was required. I tried that, but it didn't help in my case. I still think it's all the fault of that phantom file. Got any ideas how to get rid of it? I'm going to try a few things ....



Regards - Glen

Regards - Glen
0 Kudos
Chamon
Commander
Commander

What patch level are you on?

On Feb 12, 2010, at 5:51 PM, GlenB <communities-emailer@vmware.com

0 Kudos
DSTAVERT
Immortal
Immortal

I would be very very careful about "just trying stuff" until you really understand the problem. How much free space do you have in the datastore? You have several snapshots. The machine that you recreated and pointed to the disk will now be without their snapshost if there were any. It is very common to encounter the problem you describe when you run out of storage space.

-- David -- VMware Communities Moderator
0 Kudos
Chamon
Commander
Commander

I believe in an earlier post he showed a lot of availble space. I

could be wrong I am on iphone email right now. The reason I asked

about patch level was the modified date in his previous post of the

vswap file. 1969. What was the U2 for 3.5 issue?? Something with the

date and powering on vm's.

On Feb 12, 2010, at 6:30 PM, DSTAVERT <communities-emailer@vmware.com

0 Kudos
Chamon
Commander
Commander

That was an expiring licence issue. Still want to know about the date

on that file.

On Feb 12, 2010, at 5:51 PM, GlenB <communities-emailer@vmware.com

0 Kudos
GlenB
Contributor
Contributor

You're right, Chamon. Lots of space as noted in an earlier post, thanks.

This is ESXi 3.5.0 build 153875. I always have trouble relating a build number to an update number with VMware Regards - Glen

Regards - Glen
0 Kudos
GlenB
Contributor
Contributor

As I noted, a couple of things odd about that file:

- it can't be found but it's there

- the date is 1 second before date zero (as set by many Y2K solutions)

- the size is -1, that's MINUS ONE, byte

Question #1 - how do I delete that file so that it does not confuse the power on sequence

Question #2 - how did that file get so messed up (so I can avoid doing it next time)



Regards - Glen

Regards - Glen
0 Kudos
Chamon
Commander
Commander

Can you change the location of the swap file to local storage? These

are not production vm's correct?

On Feb 12, 2010, at 7:38 PM, GlenB <communities-emailer@vmware.com

0 Kudos
GlenB
Contributor
Contributor

These are not Production VMs in the sense that I have no other users. This is my own computing facility, but I am trying to do work here, not spend my time maintaining and troubleshooting the machines. That said, I can change anything I want anytime I want. And at the moment, lamost all of my services are dead in the water (Web, SQL, Mail, Desktop) so it is of some urgency.

Most of the damaged machines are set to have the swap file in the "default" location which is, I believe, the same directory as the vmx file. I would have called that "local storage". Were you inferring something else?

HOWEVER - I think my earlier note about the "phantom" vswp files might be at the root of this, and I'd really appreciate some help trying to figure out how to get rid of them and see if the VM will then again power on correctly.



Regards - Glen

Regards - Glen
0 Kudos
Erik_Zandboer
Expert
Expert

I might have missed it, but are you sure you have enough space left on the disk where your VMs are running from? When a LUN fills up, you get these apparantly strange behaviour that VMs cannot restart, when you add a memory change to a VM even less space becomes available on the LUN and other VMs will all of a sudden fail to start...



Visit my blog at http://www.vmdamentals.com

Visit my blog at http://www.vmdamentals.com
0 Kudos
GlenB
Contributor
Contributor

Quoted from post #2 in this thread:

"The host machine has 8 Gb and the total of all the VMs allocated memory is 5256 Mb. The datastore is 1.81 Tb and the free space is 1.08 Tb."



Regards - Glen

Regards - Glen
0 Kudos
GlenB
Contributor
Contributor

I think I have an approach that works to get these VMs running again, but it still does not answer the question - why did they get messed up in the first place?

In broad steps, here's what I do to get rid of that phantom vswp file:

1 - remove the VM from the inventory in Virtual Centre

2 - open a Telnet SSH connection to the host

3 - rename the VM's directory ... if it was XX then mv XX XX-original

4 - mkdir XX

5 - cp XX-original/* XX (and the key thing here is that the phantom vswp file reports an error and does NOT get copied)

6 - edit the XX.vmx file to remove the sched.swap.derivedName line that pointed to the phantom vswp file

7 - in the Datastore Browser right click on the XX.vmx file and Add to Inventory

This has worked 4 out of 6 times and the other 2 are messed up for other reasons so I rebuilt them from my template (but that's another story ....)



Regards - Glen

Regards - Glen
0 Kudos
GlenB
Contributor
Contributor

See last post for details of remedy. Still wish I knew why / how it got messed up in the first place.

Regards - Glen
0 Kudos
danm66
Expert
Expert

Did something happen that all of these VM's were shut down? What do you use for storage? NFS???

0 Kudos
GlenB
Contributor
Contributor

All the machines use the local drives - it's a single host with a SATA RAID array in the box. So, loss of power to the box might do it, but it seems unlikely that many machines would be in such a fragile state that a power loss would make 6 of 8 VMs unstartable - that just sounds like bad software design. It happened just after Patch Tuesday (Microsoft's monthly event) so many or all of the VMs could have needed reboots, but that shouldn’t have affected the host's file system.

You're making me think I had better check the procedures being used for stopping and starting the VMs and the host too, just in case we're doing something wrong there. Do you know of a reference to the documentation that describes that?

Regards - GlenB

Regards - Glen
0 Kudos
Erik_Zandboer
Expert
Expert

As you say, notlikely. I lost power to my testing rig at least a hundred times - Never got VMs to -not- boot up afterwards. There can be one problem, and that would be enabling write cache if you do not have a cache battery attached... But even then loosing that many VMs seems unlikely...

Visit my blog at http://www.vmdamentals.com

Visit my blog at http://www.vmdamentals.com
0 Kudos
danm66
Expert
Expert

there are 2 kinds of locking that can prevent the file from being opened... a process lock where you have to kill the process to clear the lock and then there's a heartbeat region that is written to the disk. If the heartbeat slot(s) got corrupted, then the only way to clear it is to re-format the datastore or use a binary editor to rewrite it. I don't know of any published ways to edit the heartbeat slots, so if that's the issue; you'll need to get help from support to delete those swap files. Still, that's highly selective for it to corrupt only the vswp files.

In terms of shutting down/rebooting a host, you should shut down all VM's, enter maintenance mode (not necessary but is a good way to check step 1 is completed), then shutdown/reboot using the client gui.

0 Kudos