0 Replies Latest reply on Jun 10, 2014 4:06 AM by rgilja

    VMDK Corrupt - probably after RAID10 failure

    rgilja Lurker

      Hi,

       

      Our customer has an IBM x3400 running ESXi 4.1 800380.

       

      They have a LSI Raid Controller with 8 disks. 4 Disks are setup as Raid 10 for system drives, and 4 disks are setup as Raid 5 for data.


      Since we don't have a monitor of the hardware on the server, we missed the fact that 1 drive had failed in the Raid 10. Then, suddently another drive failed and the VM's powered off. After talking to IBM we managed to get the last failed drive online, forcing it. And the second drive that failed was replaced. The raid is optimal now and everything seems fine, except the fact that we can't backup one of the VM's system C drive.

       

      The servers are running fine, only the backup of this particular VM's system C drive is failing. The VM is an Windows Server 2003 Standard, with 4-5 virtual disks. For some reason (probably misconfiguration during P2V), the DISK 1 is the Data disk, (marked as boot partition), and DISK 2 is the System C drive, (marked as system partition). Anyways, this has been running fine for years, so I dont think this is the cause.

       

      1. I've tried cloning the disk to a brand new datastore (which fails), I've also tried downloading the VMDK, and that also fails. I can backup the server using Veeam if I exclude the DISK 2 (System C:).

      2. I've run the command: chkdsk c: /r /f with no luck. It said it repaired some sectors, but clearly not. Because when I run the chkdsk c: again, it says 8KB in bad sectors.

      3. We've tried to take manual snapshots and then deleting. No luck.

      4. We have tried to shut the VM down and click Migrate and chosen to move DISK 2 to the Raid 5 datastore, where all the other VMDKS are. No luck.

       

      We have Veeam Backup of the VM that seems to be okay, but we'd rather not restore the VM because it's been close to a month since it has run successfully. But, I've been thinking about restoring the C drive only, from the Veeam Backup, and see if that work? But that can possibly make more damage to the guest OS? They're running Exchange on the server and some Visma products. I smell corruption to DBs if I do restore the C drive only to a point back in time.

       

      Since this is a production VM with close to 400 GB of data, we haven't had the chance to shut it down to perform V2V and see if that works. And if that doesn't work, I'm out of ideas.

       


      I would like to thank you guys in advance for any help and tips.

       


      Best Regards,


      Ruben Gilja