As many of us already know, there are Issues concerning running microsoft clusters in VMware that warrant some planning.
First of all there's the requirement for storing the boot disk of the VM on a _local_ (as in Direct Attached) file system, which is a rather severe limitation.
This in turn gives rise to the inability to run MSCS on a boot-from-SAN ESX server (there being no local drives in such a scenario)
Little information has been provided by VMware as to why you aren't allowed to store the boot disk of a cluster node on SAN, but in this thread a VMware Technician explains:
if we have swapped out a page for your VM to SAN and there is a failover or some other event we have no option but to stall the VM (e.g. it will not get any CPU time) until the page has been brought in
If I understand that statement correctly, we are talking about overcommitment of memory and VMware swap here. If an ESX server is overcommitting memory and the MSCS node has active memory in swap, the VM must be paused while memory is being fetched from disk. This is not normally a problem, BUT: In a SAN with multiple paths, you sometimes experience path failovers which ESX typically use 30-45 seconds to handle. If the swapped-out pages required to populate active memory of a cluster node happened to reside on this temporarily unavailable disk, the VM in question would be stalled throughout the entire 45 seconds - and that is long enough for the sister cluster node to detect a cluster failure and initiate failover actions.
The scenario above is my personal best guess as to what the actual reason for VMwares rather strange requirement might be.
However, if the assumptions I describe above are correct, then the root cause of the problem is memory overcommitment and the likelyhood of path switching in a SAN as opposed to local storage environment.
If so, then one way to work around the problem would simply be to reserve *all* the memory of the VM. By doing so you guarantee that VMware will never swap out any of its pages to VM swap, and hence a path failover will not cause the VM to pause.
Furthermore, the restrictions imposed by VMware to support MSCS in a virtualized environment are in and of themselves an increased risk to your platform!. If you think about it, the reason why failovers occur in a SAN is that some component just failed. If the same happened in a local storage system your disk would most likely be gone - period. And any failure which causes the ESX server to go offline will take your cluster node down too - and with no way of starting the VM from another ESX host, since it resides on a local disk. Lastly, there's the fact that ESX servers by design contain no vital data locally and thus do not require any specific data protection measures beyond the obvious - this all changes when you start hosting production VMs on local storage.
Personally, even for a production platform, I am far from convinced that going with local storage for the boot disk is the Correct solution to this problem.
First of all there's the requirement for storing the boot disk of the VM on a _local_ (as in Direct Attached) file system, which is a rather severe limitation.
This in turn gives rise to the inability to run MSCS on a boot-from-SAN ESX server (there being no local drives in such a scenario)
Little information has been provided by VMware as to why you aren't allowed to store the boot disk of a cluster node on SAN, but in this thread a VMware Technician explains:
if we have swapped out a page for your VM to SAN and there is a failover or some other event we have no option but to stall the VM (e.g. it will not get any CPU time) until the page has been brought in
If I understand that statement correctly, we are talking about overcommitment of memory and VMware swap here. If an ESX server is overcommitting memory and the MSCS node has active memory in swap, the VM must be paused while memory is being fetched from disk. This is not normally a problem, BUT: In a SAN with multiple paths, you sometimes experience path failovers which ESX typically use 30-45 seconds to handle. If the swapped-out pages required to populate active memory of a cluster node happened to reside on this temporarily unavailable disk, the VM in question would be stalled throughout the entire 45 seconds - and that is long enough for the sister cluster node to detect a cluster failure and initiate failover actions.
The scenario above is my personal best guess as to what the actual reason for VMwares rather strange requirement might be.
However, if the assumptions I describe above are correct, then the root cause of the problem is memory overcommitment and the likelyhood of path switching in a SAN as opposed to local storage environment.
If so, then one way to work around the problem would simply be to reserve *all* the memory of the VM. By doing so you guarantee that VMware will never swap out any of its pages to VM swap, and hence a path failover will not cause the VM to pause.
Furthermore, the restrictions imposed by VMware to support MSCS in a virtualized environment are in and of themselves an increased risk to your platform!. If you think about it, the reason why failovers occur in a SAN is that some component just failed. If the same happened in a local storage system your disk would most likely be gone - period. And any failure which causes the ESX server to go offline will take your cluster node down too - and with no way of starting the VM from another ESX host, since it resides on a local disk. Lastly, there's the fact that ESX servers by design contain no vital data locally and thus do not require any specific data protection measures beyond the obvious - this all changes when you start hosting production VMs on local storage.
Personally, even for a production platform, I am far from convinced that going with local storage for the boot disk is the Correct solution to this problem.
Tags:
mscs,
clustering,
local_storage