Hi folks,
This morning I was unable to start one of the VMs on one of our ESX 3.5 clusters. Here are the messages from the VM’s vmware.log:
And here are the corresponding messages from /var/log/vmkernel on the ESX host:
Apr 4 11:38:58 roggwil vmkernel: 148:16:40:33.238 cpu15:3051)World: vm 3052: 901: Starting world vmm0:ovada.zurich.ibm.com with flags 8
Apr 4 11:38:58 roggwil vmkernel: 148:16:40:33.238 cpu15:3051)Sched: vm 3052: 5333: adding 'vmm0:ovada.zurich.ibm.com': group 'host/user': cpu: shares=488 min=0 minLimit=-1 max=-1
Apr 4 11:38:58 roggwil vmkernel: 148:16:40:33.238 cpu15:3051)Sched: vm 3052: 5352: renamed group 457 to vm.3051
Apr 4 11:38:58 roggwil vmkernel: 148:16:40:33.238 cpu15:3051)Sched: vm 3052: 5366: moved group 457 to be under group 4
Apr 4 11:38:58 roggwil vmkernel: 148:16:40:33.254 cpu15:3051)Swap: vm 3052: 2169: extending swap to 1548288 KB
Apr 4 11:39:00 roggwil vmkernel: 148:16:40:34.700 cpu10:3051)Cow: 1659: Failed on handle 1 (274582) of 2 with Not supported
Apr 4 11:39:00 roggwil vmkernel: 148:16:40:34.700 cpu10:3051)WARNING: Cow: 1262: Creation of the device with device lib failed Not supported
Apr 4 11:39:00 roggwil vmkernel: 148:16:40:34.749 cpu12:3051)Sched: vm 3052: 1031: name='vmm0:ovada.zurich.ibm.com'
Apr 4 11:39:00 roggwil vmkernel: 148:16:40:34.749 cpu12:3051)CpuSched: vm 3052: 13870: zombified unscheduled world: runState=NEW
Apr 4 11:39:00 roggwil vmkernel: 148:16:40:34.749 cpu12:3051)World: vm 3052: 2489: deathPending set; world not running, scheduling reap
Apr 4 11:59:06 roggwil vmkernel: 148:17:00:41.419 cpu8:1047)World: vm 3053: 901: Starting world vmware-vmx with flags 4
Apr 4 11:59:08 roggwil vmkernel: 148:17:00:42.729 cpu14:3053)World: vm 3054: 901: Starting world vmm0:ovada.zurich.ibm.com with flags 8
Apr 4 11:59:08 roggwil vmkernel: 148:17:00:42.730 cpu14:3053)Sched: vm 3054: 5333: adding 'vmm0:ovada.zurich.ibm.com': group 'host/user/pool8': cpu: shares=488 min=0 minLimit=-1 max=-1
Apr 4 11:59:08 roggwil vmkernel: 148:17:00:42.730 cpu14:3053)Sched: vm 3054: 5352: renamed group 458 to vm.3053
Apr 4 11:59:08 roggwil vmkernel: 148:17:00:42.730 cpu14:3053)Sched: vm 3054: 5366: moved group 458 to be under group 23
Apr 4 11:59:08 roggwil vmkernel: 148:17:00:42.745 cpu13:3053)Swap: vm 3054: 2169: extending swap to 1548288 KB
Apr 4 11:59:09 roggwil vmkernel: 148:17:00:44.249 cpu9:3053)Cow: 1659: Failed on handle 1 (1261657) of 2 with Not supported
Apr 4 11:59:09 roggwil vmkernel: 148:17:00:44.249 cpu9:3053)WARNING: Cow: 1262: Creation of the device with device lib failed Not supported
Apr 4 11:59:09 roggwil vmkernel: 148:17:00:44.291 cpu14:3053)Sched: vm 3054: 1031: name='vmm0:ovada.zurich.ibm.com'
Apr 4 11:59:09 roggwil vmkernel: 148:17:00:44.291 cpu14:3053)CpuSched: vm 3054: 13870: zombified unscheduled world: runState=NEW
Apr 4 11:59:09 roggwil vmkernel: 148:17:00:44.291 cpu14:3053)World: vm 3054: 2489: deathPending set; world not running, scheduling reap
The VM’s directory looks like this (minus the logs):
# ls -l
total 58631232
-rw------- 1 root root 116736 Apr 2 02:12 ovada.zurich.ibm.com-000001-delta.vmdk
-rw------- 1 root root 250 Apr 2 02:12 ovada.zurich.ibm.com-000001.vmdk
-rw-r--r-- 1 root root 37 Nov 8 17:28 ovada.zurich.ibm.com-681017c8.hlog
-rwxr-xr-x 1 root root 2424 Apr 4 11:32 ovada.zurich.ibm.com-bkup.vmx
-rw------- 1 root root 60020917248 Apr 2 02:12 ovada.zurich.ibm.com-flat.vmdk
-rw------- 1 root root 8684 Apr 2 02:12 ovada.zurich.ibm.com.nvram
-rw------- 1 root root 414 Mar 26 03:39 ovada.zurich.ibm.com.vmdk
-rw------- 1 root root 553 Apr 2 02:13 ovada.zurich.ibm.com.vmsd
-rwxr-xr-x 1 root root 2424 Apr 4 11:23 ovada.zurich.ibm.com.vmx
-rw------- 1 root root 275 Apr 4 11:38 ovada.zurich.ibm.com.vmxf
The disk file mentioned in ovada.zurich.ibm.com.vmx is: scsi0:0.fileName = "ovada.zurich.ibm.com-000001.vmdk". That file looks like this:
# Disk DescriptorFile
version=1
CID=f8bd0a58
parentCID=f8bd0a58
createType="vmfsSparse"
parentFileNameHint="ovada.zurich.ibm.com.vmdk"
# Extent description
RW 117228354 VMFSSPARSE "ovada.zurich.ibm.com-000001-delta.vmdk"
# The Disk Data Base
#DDB
ovada.zurich.ibm.com.vmdk looks like this:
# Disk DescriptorFile
version=1
CID=f8bd0a58
parentCID=ffffffff
createType="vmfs"
# Extent description
RW 117228354 VMFS "ovada.zurich.ibm.com-flat.vmdk"
# The Disk Data Base
#DDB
ddb.virtualHWVersion = "4"
ddb.toolsVersion = "7304"
ddb.uuid = "60 00 C2 9c d2 e8 2e 56-f6 ba d0 6a 13 4f 45 32"
ddb.geometry.cylinders = "7297"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "buslogic"
I’ve never see the "Funcrtion not implemented" error before. I would be grateful if anyone could give tips on how to resolve this problem.
Regards,
Michael L.
Hi,
Take the backup of the ovada.zurich.ibm.com-000001.vmdk.
Seems to be the snapshot disk is corrupted as its has mismatched between parent CID and CID.
Once you have taken the backup of ovada.zurich.ibm.com-000001.vmdk. perform the below command.
Vmkfstools –U ovada.zurich.ibm.com-000001.vmdk
Removes the Snapshot vmdk from the Virtual machine.
Then remove the snapshot disk from virtual machine by going to edit setting and add the parent disk "ovada.zurich.ibm.com.vmdk" to the virtual machine.
Power ON the virtual machine it will work.
try that out.
Hi,
Take the backup of the ovada.zurich.ibm.com-000001.vmdk.
Seems to be the snapshot disk is corrupted as its has mismatched between parent CID and CID.
Once you have taken the backup of ovada.zurich.ibm.com-000001.vmdk. perform the below command.
Vmkfstools –U ovada.zurich.ibm.com-000001.vmdk
Removes the Snapshot vmdk from the Virtual machine.
Then remove the snapshot disk from virtual machine by going to edit setting and add the parent disk "ovada.zurich.ibm.com.vmdk" to the virtual machine.
Power ON the virtual machine it will work.
try that out.
Hi Virtualinfra,
That worked. Thanks!
I was surprised that there even was a snapshot disk. The owner of the VM does not recall having made a snapshot, and no snapshot appeared in the Snapshot Manager.
Oh well—removing it worked, and the VM is now up and running. Thanks again.
Regards,
Micahel L.
did you check if the VM still uses the latest data - removing a snapshot without knowing more about the case can also set back the VM to outdated state
so be very careful with the big-hammer approach
> Seems to be the snapshot disk is corrupted as its has mismatched between parent CID and CID.
and ??? - that does not mean that the snapshot is corrupt - not at all
Hi Mic,
Thats great you got your VM back.
Dont forget to award points.
Regards
Seems to be the snapshot disk is corrupted as its has mismatched between parent CID and CID.
and ??? - that does not mean that the snapshot is corrupt - not at all
Check the logs please and comment.
Thats correct without knowning much about the snapshot and removing it, will lead to data lose.
# Disk DescriptorFile
version=1
CID=f8bd0a58
parentCID=f8bd0a58
createType="vmfsSparse"
parentFileNameHint="ovada.zurich.ibm.com.vmdk"
# Extent description
RW 117228354 VMFSSPARSE "ovada.zurich.ibm.com-000001-delta.vmdk"
# The Disk Data Base
#DDB
ovada.zurich.ibm.com.vmdk looks like this:
# Disk DescriptorFile
version=1
CID=f8bd0a58
parentCID=ffffffff
createType="vmfs"
Did you check the above bold items of CID, parent CID and CID.
All are same, this states that there is not data change after the snapshot is taken.
And its fails to poweron did you notice the log posted by Micheal.
there are ways to change a snapshot that do not change CIDs
it was safe in this case as the snapshot delta was so small and so either had no changes or minimal changes only
judging if it is safe to skip a snapshot just by checking CIDs is dangerous
of course I did read the logs snippets
Ulli Hankeln wrote:
there are ways to change a snapshot that do not change CIDs
it was safe in this case as the snapshot delta was so small and so either had no changes or minimal changes only
judging if it is safe to skip a snapshot just by checking CIDs is dangerous
of course I did read the logs snippets
What are the ways to which you refer?
Do you know why the snapshot did not appear in the Snapshot Manager?
It might be useful to know this if it happens again.
Thanks!
/Michael
> Do you know why the snapshot did not appear in the Snapshot Manager?
That does not mean anything - snapshotmanager only displays what is listed in the vmsd-file.
And vmsd files are wiped blank or have invalid data for many reasons - like buggy 3rd party backuptools in action, datastores running full, ESX bugs ....
continnum,
so how would you suggest to deal with this problem.
if you can share your point it would be great in furture we follow the same.
I generally do not trust vmdk-descriptor files - so I would not judge such a case by looking at the CID-values alone
If the delta-file in this case would have been some gigabytes large deleting it would have caused data-loss.
So I always check the filesize of the delta-file as well.
I also routinely check older vmware.logs to find out if this snapshot has been used successfully before
If it is any bit larger than a new snapshot created while the VM is powered off I would try to clone both snapshot and basedisk into a new vmdk first
Hope you do not feel offended - in this case your procedure was ok.
But I would not have recommeded to delete the snapshot from disk without having more details