VMware Cloud Community
socbizkaia
Contributor
Contributor

Avoiding virtual hard disk failure to stop the virtual machine

Hello,

I would like to know if some virtual hard disks of the virtual machine (vmdk or RDM) can be marked so that they don't stop the virtual machine if hey become unavailable.

I have a virtual machine with two additional virtual hard disks (in this case vmdk but it would be the same for RDM) from two LUNs located in two different storage systems. The virtual machine performs a software RAID-1 with these two virtual hard disks. So, I would like my virtual machine not to stop if one of these storage systems fails and thus the LUN where the hard disk resides becomes unavailable.

Is this possible?

Thanks in advance,

Christian

Reply
0 Kudos
7 Replies
daphnissov
Immortal
Immortal

No, I don't know of a way to do that. Why are you using RAID inside the guest? In general that doesn't make sense in a virtual environment. You're unnecessarily complicating things when you use storage abstractions at both levels.

Reply
0 Kudos
socbizkaia
Contributor
Contributor

This was a "imaginative" way to have high availability of storage with two different storage systems. So, if one storage system fails, in theory the data is alive due to the RAID-1. I know the best solution would be to have an active-active solution such as vplex (EMC), hypermetro (Huawei)... But they considered this could be a real solution and I realized that it doesn't work because if one hard disk become unavailable the virtual machine is stopped.

Christian

Reply
0 Kudos
nachogonzalez
Commander
Commander

Hey, hope you are doing fine

I have seen what you are asking done with guest os raid (LVM or similar) but in VMDK's that is pointless.
Why is it pointless?

1. Virtual disks don't degrade or have failures as like physical disks.

Since they are defined in software the VMDK's don't have hardware failures, which is one of the main reasons to use RAID.

Note: When VMDKs or virtual disks fail or corrupt almost every time is because of a host or storage hardware failure (unexpected host shutdown, LUN corruption, bad HA failover, etc) but the virtual disk itself hasn't failed.

2. VMDK's are files being consumed from a datastore which is (in most cases) backed by a storage array LUN, the LUN itself has some sort of RAID protection.
- If you happen to have some sort of disk or hardware failure all datastore should be affected, so all disks in that datastore should be affected.

- In the same scenario you would be adding an extra layer of RAID so you would be paying double performance penalties, or consuming double disk space.

Hope that work's, please let me know if I can provide further assistance.

Reply
0 Kudos
continuum
Immortal
Immortal

> 1. Virtual disks don't degrade or have failures as like physical disks.

Wow - that is a tough statement !!!

In reality virtual disks can degrade similar to physical disks - the difference is that with virtual disks you dont get early warnings like with their physical relatives.

Virtual disks can become partly unreadable with I/O errors that are caused by VMFS-errors or flaws.

They can also become completely unreadable when the VMFS-metadata for thin provisioned vmdks gets damaged during power-failures.

Using virtual raids using software solutions is not a bad idea at all.

Actually it makes sense if for what ever reasons automatic daily backups with third party tools like Veeam are not an option - or are not implemented.

Anyway - do not assume that virtual disks can not deteriorate as long as the underlying storage is healthy.

They can die like physical disks - they just do not give you early warnings ...

@OP

there is no way to assign a vmdk to a specific VM that would work around the "are all assigned vmdks present and usable"-early check of ESXi.

You would have to use ugly workarounds via iSCSI ...

But would it be a showstopper ? - if the the software mirror of your important vmdks is missing you could still start restart your VM after a quick reconfig of the VM.

That would not be as flawless as software mirrored disk with physical hardware - but still better than having no backups.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
nachogonzalez
Commander
Commander

Thanks for the correction, really appreciate it.
I've sent you a Private message to further discuss the virtual disks degradations if it's not an issue for you.

Warm regards

Reply
0 Kudos
socbizkaia
Contributor
Contributor

Yes the idea of this software RAID inside the virtual machine is to be protected against a total failure of a storage array. The idea was to increase the availability of the virtual machine thanks to this software raid of two vmdks located in different LUNs of different storage arrays. However, as I checked, if one vmdk of the virtual machine is suddenly unavailable (because one of the two LUN or storage array fails), the virtual machine is stopped by vmware. So I don't have a increased availability, in fact I have a minor availability because now the virtual machine depends on the availability of two LUN/Storage Arrays instead of only one.

That was the reason I asked if it was possible to signal to vmware "please don't stop the virtual machine if this vmdk becomes unavailable". According to your answers I understand it is not possible.

So now, I think it is better to avoid this stupid idea of software raid inside the virtual machine. It would be better to depend on only one LUN/Storage Array, perform LUN replication between storage arrays and in case of failure in the first storage array I can reconfigure the virtual machine to the location of vmdks in the second storage array.

Reply
0 Kudos
daphnissov
Immortal
Immortal

I get the rationale behind what you're trying to do, but you're going to be better served by implementing some sort of replication solution outside the VM. It doesn't even have to be per-LUN replication. There are many solutions that can replicate, failover, and failback on a per-VM level. For example, the community favorite around here seems to be Veeam and it's quite robust. Whichever you choose, it's an availability solution that is being driven at a lower-level of abstraction, which is easier to manage and causes less problems than in-guest solutions like this.

Reply
0 Kudos