Hello all!
I have looked into the other threads regards invalid machines in Infrastructure Client, but none of them seems to apply on my problem.
3 x ESX 3.5 U1/Virtual Center 2.5 U1
When you look into infrastructure client, the VM (running W2k3 + Blackberry Enterprise Server) is greyed out and marked as "invalid". If I connect directly to the host (with powershell) it says the machine is not running. But it is and responding.
I have restarted services on all esx servers, re-registrered the VM and restarted the VM. No luck. I cannot see anything weird in the .vmx. But I attached it if I missed something.
Please advice, thank you!
Best regards
Björn Johansson
I had a similar issues yesterday - I removed the VM from the inventory, restarted the vpxa and hostd services - added the VM back to the inventory. This worked for me.
Also try restarting your VirtualCenter server service.
Thanks guys,
Unfortunately I already tried that without success. Just to be sure, I tried it again and did following:
Unregistrered VM from infrastructure client
Stopped VMware VirtualCenter Service
Restarted vpxa, mgmt, webAccess and vmkauthd services on all hosts in the cluster
Disabled HA on cluster
Started VirtualCenter service
Added vmx to inventory
Enabled HA on cluster
Still no luck, still marked as invalid. I guess I covered everything there... order ok?
I also checked vc logs logs via PowerShell :
Get-VM "blackberry-srv" | Get-VIEvent | Format-Table CreatedTime, FullFormattedMessage -AutoSize)
No clues there either, just that machine is now registered in the datacenter.
Anyone see anything fishy in the .vmx or have any other suggestion?
Thanks!
/Björn
I had the same issue once, only solution was removing the ESX host from VC and add it again. for some reason the VM was stuck, and by removing the ESX host the VM was removed from the database and when adding the ESX host there's nothing there to add again.
Duncan
Blogging: http://www.yellow-bricks.com
If you find this information useful, please award points for "correct" or "helpful".
Hi,
This issue can also appear in storage access case.
If the vm is on a lun, and that is not visible on the storage area of ESX.
you can try thebelow.
Remove the VM from the inventory.
Try to refresh the storage section on the esx from vi client or from esxcfg-rescan on the console of esx, then restart the mgmt-vmware service.
Then ry to add the vm to the inventory of the esx machine.
This should resolve the issue.
-Karunakar
Thanks for the suggestion. Perhaps a stupid question: When I re-registered it again it ended up on another host. Does that mean I have to remove all hosts (we got three) from the cluster? Or would it be sufficient with the current host?
The register Virtual Machine wizard does not allow me to specify host, only cluster. I guess that is because DRS is enabled.
Thanks
/Björn
You need not unregister the ESX machine, try to remove the Virtual machine from the inventory of ESX host, where you see the VM .
Then on all the hosts try to refresh the storage area, and go in to storage adapters, and try to rescan all the storage adapters.
then again locate the lun or storage where the virtual machine is there, and try to browse the datastore, and locate the virtual machine folder, and in the folder, you have the vmx file of the VM,
right click on the vmx file and say add to inventory.
-Karunakar
Yep, I was replying to deppings post. You posted while as I was writing it
Thanks for the tip though, I did as you suggested without any luck. It got acually worse when another VM is also now marked as invalid. BUT, they reside on the very same LUN which implies that it is a storage problem. Also the VMs resides on the same host.
The LUN itself contains a bunch of VMs that is successfully registered. I have also checked that it is visible from all hosts that has the LUN presented to them.
Any suggestions? (except removing and adding esx from cluster - I will try that asap)
Thanks guys!
/B
ps. What will happen with the invalid machines during host removal from cluster? I can't migrate them because they are invalid... AFAIK they should continue to run. Or...? Catch 22... ds.
I've seen this problem with a CLARIION SAN - the esx hosts became unregistered from the SAN - try re-registering the ESX hosts on the SAN and see if that fixes the problem. It did for us.
Hmm... good thought. We are running HP EVA, anyone has any experience about this issue there?
Since my last post I have tried:
Upgrade Virtual Center to Update 3 - still no luck trying the stuff above
Removed the host from the cluster and added it again - no luck
I'm thinking about shutting down the troublesome VMs (from remote desktop) and copy the vmdk files. Then I create new virtual machines and use the existing vmdk's. Would that be something that might work?
But now I'm going home... long f**king day hitting my head into the wall...
Thanks guys!
/Björn
Hi guys,
Just wanna give you an update. I unfortunately never got any of the suggestions posted to work. This is the workaround that worked:
Created a new custom, identical VM except without any hard drive
Copied the .vmdk to the new VM's folder
Edited the the new VM hardware and added an existing .vmdk - the one I copied to the folder
Successfully started the VM
Thanks for all suggestions!
/Björn
<![endif]><![if gte mso 9]>
Hi Björn,
I experienced
the same problem twice – once a few months back and once recently. The first time I experienced this the VMware
tech performed the same steps you outlined that resolved your problem. It worked, but it required that the VM be
powered off.
I experienced
it again and performed the following steps to resolve it:
1. Remove
the invalid VM from Virtual Center by pressing the delete key.
2. Delete
the vmxf file in the VM’s directory.
Note: by vmxf file was empty.
3. Add
the VM to the ESX server inventory manually by right clicking the vmx file and
choosing “Add to Inventory.” You must
connect directly to the ESX server to do this.
4. Add
the VM to the Virtual Center inventory following the same steps – right the vmx
file and choose “Add to Inventory.”
Note:
I’ve never heard of a vmxf file before, but the vmx file indicated it was for
extended configuration settings.
It
appears that this problem can be caused by multiple host servers trying to
access the metadata on the VMFS partition at the same time. In some cases, it may be a mis-configuration
of the host group or LUN.
This
is the entry in the vmkernel log that tipped me off. It was recorded over 50 times on each host,
which is not common:
Date11 10:25:49 esxserver vmkernel: 8:02:54:08.798 cpu1:1147)StorageMonitor: 196:
vmhba0:2:3:0 status = 24/0 0x0 0x0 0x0
You
can find it by running the following command on the ESX host:
#grep
24.0 /var/log/vmkernel
Hope
this is helpful to others – leave a note on this discussion if it works for
you.
Blane
-- this is a re-post because the formatting got hosed the first time --
Hi Björn,
I experienced the same problem twice – once a few months back and once recently. The first time I experienced this the VMware tech performed the same steps you outlined that resolved your problem. It worked, but it required that the VM be powered off.
I experienced it again and performed the following steps to resolve it:
1. Remove the invalid VM from Virtual Center by pressing the delete key.
2. Delete the vmxf file in the VM’s directory. Note: by vmxf file was empty.
3. Add the VM to the ESX server inventory manually by right clicking the vmx file and choosing “Add to Inventory.” You must connect directly to the ESX server to do this.
4. Add the VM to the Virtual Center inventory following the same steps – right the vmx file and choose “Add to Inventory.”
Note: I’ve never heard of a vmxf file before, but the vmx file indicated it was for extended configuration settings.
It appears that this problem can be caused by multiple host servers trying to access the metadata on the VMFS partition at the same time. In some cases, it may be a mis-configuration of the host group or LUN.
http://www.vmware.com/pdf/hds_svd_technote.pdf
This is the entry in the vmkernel log that tipped me off. It was recorded over 50 times on each host, which is not common:
Date11 10:25:49 esxserver vmkernel: 8:02:54:08.798 cpu1:1147)StorageMonitor: 196: vmhba0:2:3:0 status = 24/0 0x0 0x0 0x0
You can find it by running the following command on the ESX host:
#grep 24.0 /var/log/vmkernel
Hope this is helpful to others – leave a note on this discussion if it works for you.
Blane
We had the same issue here after pushing vmware-tools update to some VM.
Deleting and re-adding the machine in the VI did it in one case.
The second machine seemed to be off after that, but connection via rdp was still possible. We then tried additionally to disconnect the vmware-tools-disk from the machine. Deleting and re-adding was successful then.
HTH
Juergen