Here's what happens when trying to power up one of my VMs (see also attachments):
Error stack:
An error was received from the ESX host while powering on VM vzilla-ws2012r2e.
Failed to start the virtual machine.
Cannot open the disk '/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e/vzilla-ws2012r2e_3.vmdk' or one of the snapshot disks it depends on.
22 (Invalid argument)
Module DiskEarly power on failed.
Cannot open the disk '/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e/vzilla-ws2012r2e_4.vmdk' or one of the snapshot disks it depends on.
22 (Invalid argument)
This circumstance may be related to a sata cabling issue, with possible momentary loss of connectivity, which could result in data loss/corruption, I realize. This is a lab box. Especially telling that the 2 VMDKs that it's complaining about (when trying to power on) are both on one physical drive enclosure. Data read and written to the enclosure since the problem arose is fine (indicating the cabling problem has been resolved, and that the VMFS5 filesystem seems to be healthy).
No snapshots. No linked clones. Just a Windows 2012 Server based VM, with several drive letters within, with those underlying VMDK files residing on different VMFS5 datastores. Thin provisioned (those drives aren't actually that huge), but nowhere near running out of physical space for the data either. It's all been working great for months, until today, when trying to power it up again.
Searching for:
"error module disk early power on failed" results in this kb article:
error module disk early power on failed
which indicates .lck files might be present. There aren't.
Next up, a variety of other articles:
Re: Unable to start VM : Invalid argument on *-flat.vmdk
error module disk early power on failed
but alas, none of them seem to relate directly, or exactly. My vmware.log file is attached below, as well as some screenshots to show the drive layout of this VM. Hoping this post proves fruitful, if somebody has had a similar circumstance. The data at stake here is (mostly) redundant, but I'd rather figure out my way out of this, in case it happens again to me, and/or can help others. Much preferred over giving up, reformatting the VMFS, and starting again.
Thank you!
Good news, best outcome I could have hoped for. No data lost. No corruption of VMFS, or NTFS drive in the VM. Nice! Saved me restoring a few terabytes of data, and learned a bit more about filesystem troubleshooting along the way.
It took an excellent, careful, methodical remote VMware Service technician about 3 hours in a WebEx earlier to resolve the issues with these 2 vmdk files manually, since he did find there was a lock on them. I had opened a Service Request (SR)# with VMware following the guidelines specified here:
VMware KB: Unable to access certain files on a VMFS datastore
To resolve this issue, file a support request with VMware Support and note this Knowledge Base article ID (1012036) in the problem description. For more information on filing a support request, see How to Submit a Support Request.
I'll be covering this saga, and the exact process for collecting and uploading logs, over at my TinkerTry.com, including video walk through. I even captured much of the technical work that was done. That said, admittedly, some of the magic that was done to resurrect the metadata will remain a mystery, since that piece happens back at VMware.
I'm ok with a bit of black box, given how happy I am that I got all my data back, and the time-savings that quick recovery represented.
Clicking the "Answered" button now.
Attempting to add one of these suspect vmdk files (as "Existing drive") to another healthy VM gets this immediate error:
Failed to add disk scsi0:1.
22 (Invalid argument)
Cannot open the disk '/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e/vzilla-ws2012r2e_3.vmdk' or one of the snapshot disks it depends on.
Failed to power on scsi0:1.
so this is looking more like a corruption rescue mission.
Recover Deleted VM From VMFS5?
Any ideas to share on the best path forward, for my circumstance?
http://blog.asiantuntijakaveri.fi/2012/01/fixing-broken-vmware-vsphere-5-vmdk.html
Some useful vmkfstools 'hidden' options | VMware vSphere Blog - VMware Blogs
Here's a look at good old vmkfstools results:
/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e # ls -l
total 2620022784
-rw------- 1 root root 67070209294336 May 31 03:05 vzilla-ws2012r2e_3-flat.vmdk
-rw------- 1 root root 536 Apr 29 15:27 vzilla-ws2012r2e_3.vmdk
-rw------- 1 root root 68169720922112 May 31 07:41 vzilla-ws2012r2e_4-flat.vmdk
-rw------- 1 root root 536 Apr 29 15:27 vzilla-ws2012r2e_4.vmdk
/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e # vmkfstools --fix check vzilla-ws2012r2e_3.vmdk
Disk is error free
/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e # vmkfstools -P -v10 vzilla-ws2012r2e_3.vmdk
Could not retrieve max file size: Inappropriate ioctl for device
VMFS-5.60 file system spanning 1 partitions.
File system label (if any): S2E-5TB-Mediasonic-PC Backups-AXT9IK
Mode: public
Capacity 6000874618880 (5722880 file blocks * 1048576), 3316831485952 (3163177 blocks) avail, max file size 0
Volume Creation Time: Sat Feb 23 07:15:48 2013
Files (max/free): 130000/129880
Ptr Blocks (max/free): 64512/61985
Sub Blocks (max/free): 32000/31984
Secondary Ptr Blocks (max/free): 256/256
File Blocks (overcommit/used/overcommit %): 0/2559703/0
Ptr Blocks (overcommit/used/overcommit %): 0/2527/0
Sub Blocks (overcommit/used/overcommit %): 0/16/0
Volume Metadata size: 832995328
UUID: 51286ca4-ef967828-664d-001b2129ad71
Logical device: 51286c94-672900eb-1f09-001b2129ad71
Partitions spanned (on "lvm"):
t10.ATA_____H2FW_RAID5_______________________________IKWIWQ60SIN5IRAXT9IK:1
Is Native Snapshot Capable: YES
OBJLIB-LIB: ObjLib cleanup done.
/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e # vmkfstools --fix check vzilla-ws2012r2e_4.vmdk
Disk is error free
/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e # vmkfstools -P -v10 vzilla-ws2012r2e_4.vmdk
Could not retrieve max file size: Inappropriate ioctl for device
VMFS-5.60 file system spanning 1 partitions.
File system label (if any): S2E-5TB-Mediasonic-PC Backups-AXT9IK
Mode: public
Capacity 6000874618880 (5722880 file blocks * 1048576), 3316831485952 (3163177 blocks) avail, max file size 0
Volume Creation Time: Sat Feb 23 07:15:48 2013
Files (max/free): 130000/129880
Ptr Blocks (max/free): 64512/61985
Sub Blocks (max/free): 32000/31984
Secondary Ptr Blocks (max/free): 256/256
File Blocks (overcommit/used/overcommit %): 0/2559703/0
Ptr Blocks (overcommit/used/overcommit %): 0/2527/0
Sub Blocks (overcommit/used/overcommit %): 0/16/0
Volume Metadata size: 832995328
UUID: 51286ca4-ef967828-664d-001b2129ad71
Logical device: 51286c94-672900eb-1f09-001b2129ad71
Partitions spanned (on "lvm"):
t10.ATA_____H2FW_RAID5_______________________________IKWIWQ60SIN5IRAXT9IK:1
Is Native Snapshot Capable: YES
OBJLIB-LIB: ObjLib cleanup done.
/vmfs/volumes/51286ca4-ef967828-664d-001b2129ad71/vzilla-ws2012r2e #
Ok, now we're getting warmer, with voma (things not looking too good)
vSphere 5.1 Storage Enhancements – Part 1: VMFS-5 | CormacHogan.com
If you find yourself in the unfortunately position that you suspect that you may have data corruption on your VMFS volume, prepare to do a restore from backup, or look to engage with a 3rd party data recovery organization if you do not have backups. VMware support will be able to help in diagnosing the severity of any suspected corruption issues, but they are under no obligation to recover your data.
In my voma output below, you'll see a healthy sata enclosure, followed by the suspect one with error. What to do about it is the next bridge to cross...
VMware KB: Data recovery services for data not recoverable by VMware Technical Support
VMware KB: Unable to access certain files on a VMFS datastore
/dev/disks # voma -f check -d t10.ATA_____H2FW_RAID5_______________________________GIJYMNPA2MGWGWKFNX4O
Module name is missing. Using "vmfs" as default
Checking if device is actively used by other hosts
Running VMFS Checker version 1.0 in check mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
Detected VMFS file system (labeled:'S2E-5TB-Mediasonic-Slower-e-RAID5') with UUID:52c0ed77-e8ea86b5-2af3-a0369f271a12, Version 5:60
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.
Total Errors Found: 0
/dev/disks # voma -f check -d t10.ATA_____H2FW_RAID5_______________________________IKWIWQ60SIN5IRAXT9IK
Module name is missing. Using "vmfs" as default
Checking if device is actively used by other hosts
Running VMFS Checker version 1.0 in check mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
Detected VMFS file system (labeled:'S2E-5TB-Mediasonic-PC Backups-AXT9IK') with UUID:51286ca4-ef967828-664d-001b2129ad71, Version 5:60
Phase 2: Checking VMFS heartbeat region
ON-DISK ERROR: Invalid HB address <3758110978>
Phase 3: Checking all file descriptors.
Found stale lock [type 10c00001 offset 214593536 v 1740, hb offset 3543040
gen 2133, mode 1, owner 535fc449-3ac678ea-a1d7-a0369f271a12 mtime 2845
num 0 gblnum 0 gblgen 0 gblbrk 0]
Found stale lock [type 10c00001 offset 214597632 v 1745, hb offset 3543040
gen 2133, mode 1, owner 535fc449-3ac678ea-a1d7-a0369f271a12 mtime 2847
num 0 gblnum 0 gblgen 0 gblbrk 0]
Phase 4: Checking pathname and connectivity.
ON-DISK ERROR: <FD c0 r0>has incorrect linkCount 4
Phase 5: Checking resource reference counts.
ON-DISK ERROR: FB inconsistency found: (23825,0) allocated in bitmap, but never used
Total Errors Found: 3
/dev/disks #
vmkfstools -P run against the drives, the first is the OK VMFS filesystem, and the second is the suspect VMFS filesystem.
Here's the results:
/dev/disks # vmkfstools -P t10.ATA_____H2FW_RAID5_______________________________GIJYMNPA2MGWGWKFNX4O
devfs-1.00 file system spanning 0 partitions.
File system label (if any):
Mode: private
Capacity 512 (1 file blocks * 512), 512 (1 blocks) avail, max file size 0
UUID: 00000000-00000000-0000-000000000000
Partitions spanned (on "notDCS"):
Is Native Snapshot Capable: NO
/dev/disks # vmkfstools -P t10.ATA_____H2FW_RAID5_______________________________IKWIWQ60SIN5IRAXT9IK
devfs-1.00 file system spanning 0 partitions.
File system label (if any):
Mode: private
Capacity 512 (1 file blocks * 512), 512 (1 blocks) avail, max file size 0
UUID: 00000000-00000000-0000-000000000000
Partitions spanned (on "notDCS"):
Is Native Snapshot Capable: NO
/dev/disks #
Good news, best outcome I could have hoped for. No data lost. No corruption of VMFS, or NTFS drive in the VM. Nice! Saved me restoring a few terabytes of data, and learned a bit more about filesystem troubleshooting along the way.
It took an excellent, careful, methodical remote VMware Service technician about 3 hours in a WebEx earlier to resolve the issues with these 2 vmdk files manually, since he did find there was a lock on them. I had opened a Service Request (SR)# with VMware following the guidelines specified here:
VMware KB: Unable to access certain files on a VMFS datastore
To resolve this issue, file a support request with VMware Support and note this Knowledge Base article ID (1012036) in the problem description. For more information on filing a support request, see How to Submit a Support Request.
I'll be covering this saga, and the exact process for collecting and uploading logs, over at my TinkerTry.com, including video walk through. I even captured much of the technical work that was done. That said, admittedly, some of the magic that was done to resurrect the metadata will remain a mystery, since that piece happens back at VMware.
I'm ok with a bit of black box, given how happy I am that I got all my data back, and the time-savings that quick recovery represented.
Clicking the "Answered" button now.
Good for you. How did you manage to resolve it? did you document it anywhere?
Well, to be fair, I didn't actually resolve it, but VMware support sure did, by doing a lot of undocumented tricks to the damaged datastore in the SSH session to my ESXi hosts, in a roughly 90 minute long WebEx session.
I had trouble getting the complex video published, and didn't see a lot of folks interested (or asking), but I'm glad to hear you are interested.
I'll see what I can dig up, and if I find something specific, I'll share. But no matter what I find (if anything), it was very clear to me this the fixes were only something that VMware Support should be doing, and the support rep acknowledged they're unlikely to publish a customer self-serve KB article on the procedures.
While this response is not likely what you were hoping for, I still hope it helps.
aidinbaran, I wrong, with over 2,800 viewing this thread, there is interest. I'm sorry to let folks down on getting a video of this, I'll try harder to see what I can scrape up, but again, please don't pin hopes what I might find will actually help (unsupported, can't open SR#) folks self-serve fix their situation.
I appreciate your bringing my attention back to this!
Thanks pbraren,
I am exactly having the same problem you described here. I get the following error:
Failed to start the virtual machine.
Module DiskEarly power on failed.
Cannot open the disk '/vmfs/volumes/5324b52d-ba6a77e8-1519-e8393521f86e/Mailserver/Mailserver.vmdk' or one of the snapshot disks it depends on.
22 (Invalid argument)
I have been working to fix the issue for the last two days without any success. It's really making me a headache since there important data I need to save there 😞
Any kind of help would be highly appreiated.
Check the VMkernel.log for any corruption messages.?
--Avinash
Hi ppbraren,
Any updates on the video? I am still trying while having tow other fellow ESxi experts by my side but no success yet!
Hi, I have the same problem. Can you remember the fix?
It was a carefully constructed, manually crafted recovery plan that VMware support had to perform for me, saving my data, a tale I retold in brief form here:
http://TinkerTry.com/a-bunch-of-stuff-broke-this-month-learned-a-lot-fixing-it-all
but with more details right here in this forum post. I don't believe I wound up finding the Camtasia recording of the video to be a good, nor would it have been a valid way for others to follow along and create their own recovery plans.
I hope this encourages you, if the data is valuable, to get a Service Request opened with VMware Support, even if there's a cost.
If you feel there's still value in me trying to dig up (and narrate) that video, then share it, let me know, and I'll try to prioritize sifting through and finding the footage and rechecking if there's anything good in there, on a best effort basis.