VMware Cloud Community
asark
Contributor
Contributor

Failed disk on local datastore

Hello all,

I have/had a local datastore comprising of 3 disks. One of the disks have failed and fortunately I only lost 1 VM. The issue is that now I have the following:

[root@test-vmw:/dev/disks] vmkfstools -P -h /vmfs/volumes/datastore1

VMFS-5.61 (Raw Major Version: 14) file system spanning 3 partitions.

File system label (if any): datastore1

Mode: public

Capacity 2.7 TB, 1.1 TB available, file block size 1 MB, max supported file size 62.9 TB

Disk Block Size: 512/512/0

UUID: 5710114e-deec2c28-5dda-001018f08ddc

Partitions spanned (on "lvm"):

        t10.ATA_____Hitachi_HUA722010CLA330_______________________JPW9P0N03KWDKD:1

       (device t10.ATA_____ST31000528AS________________________________________6VPG53JQ:1 might be offline)

        t10.ATA_____WDC_WD10EZEX2D22RKKA0_________________________WD2DWMC1S5972516:1

        (One or more partitions spanned by this volume may be offline)

Is Native Snapshot Capable: YES

[root@test-vmw:/dev/disks]

Now, I am trying to add a new disk to this datastore and I am getting an error:

2018-10-10T09:47:48.872Z cpu2:2099193 opID=2f3a8b94)LVM: 10679: Error adding space (0) on device t10.ATA_____Hitachi_HDP725050GLA360_______________________GEA534RJ14BEDA:1 to volume 57101140-d0590300-8662-001018f08ddc: VMFS volume missing physical exten$

I guess I need to fix the offline issue before I can add any new disks?

Can someone guide me on what the steps are to fix the above issue.

Thanks in advance

Reply
0 Kudos
6 Replies
continuum
Immortal
Immortal

> Now, I am trying to add a new disk to this datastore ...

Really ???
When one volume fails this may render all VMs stored on the combined datastore unusable.
You are extremly lucky that so far only one VM is lost and you still want to add more extends ?
> I guess I need to fix the offline issue before I can add any new disks?
Yes - you are right.
A few weeks ago I came across a similar case.
Second of three extends was unmountable and because of that  all but one VM was corrupt.
I tried to fix the second extends but did not found what was wrong.
So instead I extracted all VMs and told my customer to build a new datastore.
Looks like you are in the same situation.
Anyway - if you want me to look into this case I need a VMFS header dump for all 3 partitions.
dd if=/dev/disks/t10.ATA_____Hitachi_HUA722010CLA330_______________________JPW9P0N03KWDKD:1 bs=1M count=1536 of=/vmfs/volumes/somedatastore/extend1.1536
dd if=/dev/disks/t10.ATA_____ST31000528AS________________________________________6VPG53JQ:1 bs=1M count=1536 of=/vmfs/volumes/somedatastore/extend2.1536

dd if=/dev/diskst10.ATA_____WDC_WD10EZEX2D22RKKA0_________________________WD2DWMC1S5972516:1 bs=1M count=1536 of=/vmfs/volumes/somedatastore/extend3.1536
Download the 3 extend*.1536 files , compress them and provide a downloadlink.
Call me via skype "sanbarrow" when you are ready.
Ulli



________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
asark
Contributor
Contributor

Hi,

I am unable to run the dd command on the ST31000528AS disk as it will no longer come online - its completely dead!

So, what you are saying is when a datastore is created spanning multiple disks a failure on one disk can corrupt the whole datastore?

In that case am I better off copying off my remaining VMs while I still can and rebuilding the datastore with the good disks?

Also, just out of interest is it possible to get say a weekly/monthly report on disk health off of the host?

Reply
0 Kudos
continuum
Immortal
Immortal

> So, what you are saying is when a datastore is created spanning multiple disks a failure on one disk can corrupt the whole datastore?
Yes - thats why I do not recommend to use extends - dont even consider it !!!
> In that case am I better off copying off my remaining VMs while I still can and rebuilding the datastore with the good disks?
Absolutely - if that is possible do that.
If not - let me know. I can assist when this is not possible with standard procedures..


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
Dave_the_Wave
Hot Shot
Hot Shot

I don't know how much storage your VMs are using on the host, but I am sure it is finite.

Don't you think it may be faster if you just copied everything off it, do whatever you need to do to the host, and then fresh install ESXi on a host that's been given a clean bill of health?

All can be done and completed in hours or less, as opposed to re-stiching something back together then unable to sleep because you are not sure if it was done right.

I'm not saying your spanned installed of ESXi can't be saved, I'm just telling you want I do is the fastest for me and gives secure results.

I do this often when I am changing out all the drives in a Hp ProLiant host for a larger capacity.

I admit, I don't do any of that for failed drives, since Smartarray hardware raid takes care of all that at the iron level. While the host is up and running in production, I just remove the bad drive that is lit up by a led, and put in a good drive, completely uninterrupted and unaware to the host.

If your host is mission-critical for production, you may want to re-think your hardware setup. I don't think money costs are really a decision maker.

I RAID5 everything, because life is too short for stress.

Reply
0 Kudos
asark
Contributor
Contributor

Yep. Obviously this is not a production box its just a scratch box.

So the latest is I have backed up all my VMs by adding some spare disks, creating a second datastore2 and copying them there. I then deleted the bad datastore1 with the missing extent. However, now I am having another problem. I can successfully re-create the datastore1 with 1 of the 2 available disks:

pastedImage_0.png

pastedImage_1.png

But when I try and increase capacity I can no longer see the other disk!

pastedImage_2.png

Any ideas why this is happening? I created the datastore2 with two disks with no problems!

Reply
0 Kudos
asark
Contributor
Contributor

OK I have managed to do this but only by downgrading from version 6.7 to 6.5!

I tried re-installing a fresh version of 6.7 which still gave me the problem. Downgraded to 6.5 and hey presto the GUI works! Now I am sure there must be a command line route that I could have used but hey the server is back on 6.7 the new datastore is intact and all is good in the world again.

Reply
0 Kudos