VMware Cloud Community
foofighter26
Enthusiast
Enthusiast

VSphere Virtual Machine caching to memory when storage is lost

Hi,

We are testing our disaster recovery and have noticed that when you lose your SAN connectivity the virtual machines continues to cache commands to memory and then when the SAN connectivity is restored the commands are then executed in one go. The VM also pings but that is due to the TCP stack being loaded into memory also I would guess?

We have tested this by kicking off a copy from a VM to a workstation disconnected SAN for 10 minutes or so then reconnected the SAN and the copy then springs back to life an carries on. I know the OS will try and clear its cached memory as soon as it can write the data so that would explain the dump of data once the connection is restored.

Is there anyway to stop or reduce this caching as we would prefer the VM to fail pretty quickly as there will be issues with vm's staying up but not actually working correctly?

We also tseted this on an ESX 3.5 vm connected to the same SAN etc... and it fails instantly so seems to be a VSphere "feature"

Thanks in advance

0 Kudos
2 Replies
RParker
Immortal
Immortal


Is there anyway to stop or reduce this caching as we would prefer the VM to fail pretty quickly as there will be issues with vm's staying up but not actually working correctly?

Well if you mean that the OS inside the VM are resilient, yes. They have improved logic. The OS is what needs to fail NOT the VM, so unless you can find a way to make the OS fail, I don't see where you can get this to work..

Besides, you WANT the OS to keep going as long as possible and NOT crash... That's how you keep from corrupting data.

0 Kudos
Anders_Gregerse
Hot Shot
Hot Shot

A possibility is to reduce the disk timeout that is recommended to be set to 120 seconds to something lower. It is located at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue in Windows (don't know how to set it in other OS's). Bear in mind that extensive testing is recommended because its a change against recommended guidelines.

0 Kudos