VMware Cloud Community
Seventh77
Enthusiast
Enthusiast
Jump to solution

VM showing up as invalid - failed to find a host for powering on

Hi folks.

I have a good sized ESX cluster, with about 20 hosts and Enterprise Plus licensing. I recently did a power down/up of my entire setup, and all but a handful of VMs came back up cleanly. About 5 of my virtual machines came up as "Invalid", and if I try to power them on, I get an error message saying "Failed to find a host for powering on the virtual machine".

Here's what I've tried. I'm working with just one of them now, assuming that once I find the magic bullet for the first one, the rest will be the same.

- Simply removing the VM from inventory and re-adding it (no change)

- Removing it, and adding it to a different cluster (no change)

- Removing it, adding it to a different cluster and different resource pool (no change)

Other VMs in the same datastore, on the same host and on the same resource pool came up with no issues. I'm looking at the vmware.log file in the datastore itself and the last thing that I see in there is the VM shutting down before my power cycle.

I've checked my alarms/settings on my actual hosts, and as far as I can tell it's not an HA issue, but I am definitely in over my head here. My current failover capacity is 9 hosts, and configured failover capacity is set to one host.

The VM itself is a simple config with 2 VCPU and 4GB of ram, and the cluster I'm trying to start it on has 10 hosts, 80 CPUs and 600GB of RAM.

I'm not sure what to try next. I can't edit the settings when I add the VM itself - the setting is not there. (All I have is a power on button, basically)

Any insight would really be appreciated. I'm a relative novice and I'm trying to troubleshoot a production environment as best I can without breaking anything. Smiley Happy

Thanks!

0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership
Jump to solution

In this case you may need to either login to the console or through SSH and change the virtual machine's .vmx file permission to "-rwxr-xr-x" running

#chmod 755 CAL-GW-ARCSIGHT1.vmx

André

View solution in original post

0 Kudos
10 Replies
a_p_
Leadership
Leadership
Jump to solution

So no "Edit Settings" for the VM!? Interesting. Can you please post (attach) the VM's .vmx file?

Maybe there something wrong in the configuration, although I wonder why this did not happen before the shutdown.

see e.g. http://kb.vmware.com/kb/1001637

André

0 Kudos
Seventh77
Enthusiast
Enthusiast
Jump to solution

Thanks Andre.

Upon further digging (and I honestly should have noticed it initially), my .vmx file for the one I'm working on first is 0kb. Unfortunately, there are so many VMs on this cluster, I haven't the slightest idea what the original settings for this particular VM were. Ive been looking in the logs and such and trying to see if I can pull the settings out of it using this post here:

http://www.techhead.co.uk/vmware-esx-how-to-easily-recreate-a-missing-or-corrupt-vmx-file

However, this VM is Redhat 5, and while that does get it back into a state that I can power it on, the kernel panics immediately.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Do you still have a vmware.log file from the working VM? All of the settings for the .vmx file can easily be derived from the log file.

André

Seventh77
Enthusiast
Enthusiast
Jump to solution

I do have the log, yes. It's huge, though - I assume that's normal?

(Definitely appreciate the replies, by the way!)

Edit: I used the script here:

http://www.vi-toolkit.com/wiki/index.php/Recover_vmx_from_log_file

On the logfile. That seemed to create the .vmx, however I can't tell if it's correct yet - when I upload this .vmx back into the datastore, it shows a a type of "file" instead of type "virtual machine". I can't add it back into inventory.. Hm...

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

We have got to be careful in this case, because the VM has active snapshots and you probably tried to start it from the base virtual disk, if you followed the first link you posted. In this case the snapshot chain might be broken and has to be repaired.

Please post a list of all the files in the VM's folder showing all details like name, extensions, sizes, time stamps.

André

Seventh77
Enthusiast
Enthusiast
Jump to solution

I'm not sure how to get a raw list, so hopefully this screenshot will do the trick.

Thank you for the vmx file also - yours does the same thing that the one that I generated with the perl script does (I can upload it just fine, but it shows up as just "File" not "Virtual Machine".

Did I mention that I really appreciate the help? Especially on a sunday - thank you again!

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

It looks like someone manually worked on the configuration for this VM in November. According to the log file only snapshots 6, 2 and the base disk are in use!? Anyway, please rename the .vmxold file back to .vmx and then upload the .vmx file I attached to my previous post, then right click the .vmx file and select "Add to Inventory". If the VM is currently listed in the inventory, remove it from the inventory prior to uploading the .vmx file. The reason the .vmx shows up as "file" might be due to its file attributes, which may not be correct unless you replace the exiting .vmx file.

If the VM comes up and works as expected I'd recommend you next delete the snapshots using the "Delete All" button in the Snapshot Manager. Assuming you do have at least 10-15GB of free disk space on the datastore this should work without issues. If - after deleting the snapshots - there are still some 00000x.vmdk files left in the datastore, then report back to see how we can safely get rid of them.

André

0 Kudos
Seventh77
Enthusiast
Enthusiast
Jump to solution

Thanks again.

I renamed the .old file to .vmx - it then showed up as a virtual machine (this is the old 0.0kb file). However, when I upload your .vmx, it goes right back to being type "File". (Weird..)

I'm still digging around..

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

In this case you may need to either login to the console or through SSH and change the virtual machine's .vmx file permission to "-rwxr-xr-x" running

#chmod 755 CAL-GW-ARCSIGHT1.vmx

André

0 Kudos
KubaLibre
Contributor
Contributor
Jump to solution

Hi to anyone that have this issue,

reboot vcenter solve this problem for me.

In my case it happed somehow because those machines that could not start were not in automatic power on after powerloss.

APC powerchute shut is shutting down and powering on machines in my case.

0 Kudos