Hi,
We just had the nasty experience of ESX (3.5 update 2) corrupting one of our Virtual Machines while trying to remove a snapshot. I'm interested to see if anyone on here can shed some light on why it might have happened - there are quite a few threads on similar issues but none seem directly equivilent to our experience.
This evening while performing some other work I noticed that the VM in question (W2K8 64-bit, if it matters) had a snapshot that we'd forgotten to remove - the snapshot was over a month old, but relatively small (only 800MB or so in the delta-disks).
I deleted the SS via VirtualCenter, which "worked" on it for a few seconds before erroring out with the message "Doing an online commit, cannot power off". At the same time the Virtual Machine stopped (was powered on when I started the SS commit).
Subsequent attempts to power on the Virtual Machine resulted in VC showing errors like "Failed to power on xxxx on xxx in xxx: A general ssystem error occurred: Internal error". Examining the VM's log files showed the problem was to do with inconsistencies between the delta disks and the main disk:
Nov 30 18:20:57.718: vmx| DISK: Cannot open disk "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk": The parent virtual disk has been modified since the child was created (18).
Nov 30 18:20:57.718: vmx| Msg_Post: Error
Nov 30 18:20:57.718: vmx| http://msg.disk.noBackEnd Cannot open the disk '/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk' or one of the snapshot disks it depends on.
Nov 30 18:20:57.718: vmx| http://msg.disk.configureDiskError Reason: The parent virtual disk has been modified since the child was created.
Also, the VM's log file at the time that the snapshot deletion was running shows lots of opening & closing of the delta disk and base VMDK files, followed by:
Nov 30 18:11:29.726: vmx| DISKLIB-LINK : Attach: Content ID mismatch (9bb515b0 != 95b90c26).
Nov 30 18:11:29.736: vmx| DISKLIB-CHAIN : "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx.vmdk" : failed to open (The parent virtual disk has been modified since the child was created).
Nov 30 18:11:29.738: vmx| DISKLIB-VMFS : "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003-delta.vmdk" : closed.
Nov 30 18:11:29.738: vmx| DISKLIB-VMFS : "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-flat.vmdk" : closed.
Nov 30 18:11:29.738: vmx| DISKLIB-LIB : Failed to open '/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk' with flags 0xa (The parent virtual disk has been modified since the child was created).
Nov 30 18:11:29.738: vmx| DISK: Cannot open disk "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk": The parent virtual disk has been modified since the child was created (18).
Nov 30 18:11:29.738: vmx| Msg_Post: Error
Nov 30 18:11:29.738: vmx| http://msg.disk.noBackEnd Cannot open the disk '/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk' or one of the snapshot disks it depends on.
Nov 30 18:11:29.738: vmx| http://msg.disk.configureDiskError Reason: The parent virtual disk has been modified since the child was created.
Looking at the snapshot data in the VM's .vmsd file after the above occurred, there were two snapshots listed - one was the original snapshot that I'd tried to delete, and the other named "Consolidate Helper- 0". I have no idea where there "Consolidate Helper- 0" one came from, and unfortunately am also not sure if it was present before the attempt to delete snapshots or not - it certainly wasn't listed in the "Snapshot Manager" in VC though.
No attempts on my behalf managed to get it "working again", so we restored the most recent SAN (NetApp) snapshot (luckily taken 5 minutes before the problem), which seems to have brought the VM back to life again without issues (aside from having to remove from VC / re-add).
Post-restore, a subsequent attempt to delete the ESX snapshot succeeded, with the VM showing no ill effects.
So - can anyone suggest why this may have occurred? While we suffered no data loss I did lose 4 hours of my time - on a Sunday night no less - and would prefer this didn't happy again!!
Cheers,
Matt Kilham
Message was edited by: mattjk
We just had the nasty experience of ESX (3.5 update 2) corrupting one of our Virtual Machines while trying to remove a snapshot. I'm interested to see if anyone on here can shed some light on why it might have happened - there are quite a few threads on similar issues but none seem directly equivilent to our experience.
This evening while performing some other work I noticed that the VM in question (W2K8 64-bit, if it matters) had a snapshot that we'd forgotten to remove - the snapshot was over a month old, but relatively small (only 800MB or so in the delta-disks).
I deleted the SS via VirtualCenter, which "worked" on it for a few seconds before erroring out with the message "Doing an online commit, cannot power off". At the same time the Virtual Machine stopped (was powered on when I started the SS commit).
Subsequent attempts to power on the Virtual Machine resulted in VC showing errors like "Failed to power on xxxx on xxx in xxx: A general ssystem error occurred: Internal error". Examining the VM's log files showed the problem was to do with inconsistencies between the delta disks and the main disk:
Nov 30 18:20:57.718: vmx| DISK: Cannot open disk "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk": The parent virtual disk has been modified since the child was created (18).
Nov 30 18:20:57.718: vmx| Msg_Post: Error
Nov 30 18:20:57.718: vmx| http://msg.disk.noBackEnd Cannot open the disk '/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk' or one of the snapshot disks it depends on.
Nov 30 18:20:57.718: vmx| http://msg.disk.configureDiskError Reason: The parent virtual disk has been modified since the child was created.
Also, the VM's log file at the time that the snapshot deletion was running shows lots of opening & closing of the delta disk and base VMDK files, followed by:
Nov 30 18:11:29.726: vmx| DISKLIB-LINK : Attach: Content ID mismatch (9bb515b0 != 95b90c26).
Nov 30 18:11:29.736: vmx| DISKLIB-CHAIN : "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx.vmdk" : failed to open (The parent virtual disk has been modified since the child was created).
Nov 30 18:11:29.738: vmx| DISKLIB-VMFS : "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003-delta.vmdk" : closed.
Nov 30 18:11:29.738: vmx| DISKLIB-VMFS : "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-flat.vmdk" : closed.
Nov 30 18:11:29.738: vmx| DISKLIB-LIB : Failed to open '/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk' with flags 0xa (The parent virtual disk has been modified since the child was created).
Nov 30 18:11:29.738: vmx| DISK: Cannot open disk "/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk": The parent virtual disk has been modified since the child was created (18).
Nov 30 18:11:29.738: vmx| Msg_Post: Error
Nov 30 18:11:29.738: vmx| http://msg.disk.noBackEnd Cannot open the disk '/vmfs/volumes/9a9d0976-a1cfd695/xxx/xxx-000003.vmdk' or one of the snapshot disks it depends on.
Nov 30 18:11:29.738: vmx| http://msg.disk.configureDiskError Reason: The parent virtual disk has been modified since the child was created.
Looking at the snapshot data in the VM's .vmsd file after the above occurred, there were two snapshots listed - one was the original snapshot that I'd tried to delete, and the other named "Consolidate Helper- 0". I have no idea where there "Consolidate Helper- 0" one came from, and unfortunately am also not sure if it was present before the attempt to delete snapshots or not - it certainly wasn't listed in the "Snapshot Manager" in VC though.
No attempts on my behalf managed to get it "working again", so we restored the most recent SAN (NetApp) snapshot (luckily taken 5 minutes before the problem), which seems to have brought the VM back to life again without issues (aside from having to remove from VC / re-add).
Post-restore, a subsequent attempt to delete the ESX snapshot succeeded, with the VM showing no ill effects.
So - can anyone suggest why this may have occurred? While we suffered no data loss I did lose 4 hours of my time - on a Sunday night no less - and would prefer this didn't happy again!!
Cheers,
Matt Kilham
Message was edited by: mattjk