I was recently told by an integrator that if I build a Windows 2003 R2 Cluster across physical ESX hosts, I must store the Windows OS vmdk files on ESX local disk and not on the SAN.
Has anyone ever heard/read of this configuration?
Has anyone ever implemented a Windows Cluster in this manner across physical ESX hosts?
Thanks.
The problem appears to be related to disk timeouts. While it worked previously about 98.5% of the time, there is a scenario in which the node boot disk is located on a SAN, along with the quorum/data shared disks. In that scenario, the boot disk connection is "lost" from the SAN, triggering a cluster failover to the passive node, but the connection returns to the boot disk, in which the master tries to read/write to disks it "knows" to be mastered on it. This is now a split-brain scenario which can cause many an issue, as some will know if they've encountered it in the physical world. This why Update 1 includes timeouts related to MSCS that "fixes" this scenario, so the limitation for disks to be local is now gone.
Hope that makes it clear,
-KjB
Hi weinstein5,
I was not referring to the ESX OS. I was referring to the Windows Guest OS. Sorry if I did not explain the situation clearly enough.
So, with that in mind.. any thoughts?
Thanks.
I have also heard this before but its not really relevent, if you read the following guide all is explained... .
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Prior to ESX 3.5 Update 1m VNware only supported Microsoft Clusters in that configuration - where the VMs booted from systems disks stored on local VMFS volumes - with ESX 3.5 Update 1 VMware now supports the VMs booting from SAN LINs -
VMware ESX Server 3.5 Update 1 supports Microsoft Cluster Service. Support is similar to ESX Server 3.0.1 with the following additions:
Both 64 bit and 32 bit Windows 2003 guests are supported with MSCS.
Boot from SAN for VMs using MSCS is now supported.
Majority Node Set clusters with application-level replication (for example, Exchange 2007 Cluster Continuous Replication (CCR) is now supported.
Hi alanrenouf,
I read that document some weeks ago, it didn't answer my specific question. We've been running a windows cluster for file and another windows cluster for exchange 2003 on ESX 2.5 for over three years with all guest OS vmdk files up on the san. Never had a problem.
Hi weinstein5,
That's very strange. We've been running a windows cluster for file and another windows cluster for exchange 2003 on ESX 2.5 for over three years with all guest OS vmdk files up on the san. Never had a problem.
If putting the guest OS vmdk files on the SAN for windows clusters was not supported in previous versions of ESX, did that really stop/deter most companies from running windows clusters that way?
key word in my explanation is 'supported' by VMware - does not mean it will not work - and no I do not think that stiooed people form doing it -
I agree, its always been a support issue until Update 1
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
The problem appears to be related to disk timeouts. While it worked previously about 98.5% of the time, there is a scenario in which the node boot disk is located on a SAN, along with the quorum/data shared disks. In that scenario, the boot disk connection is "lost" from the SAN, triggering a cluster failover to the passive node, but the connection returns to the boot disk, in which the master tries to read/write to disks it "knows" to be mastered on it. This is now a split-brain scenario which can cause many an issue, as some will know if they've encountered it in the physical world. This why Update 1 includes timeouts related to MSCS that "fixes" this scenario, so the limitation for disks to be local is now gone.
Hope that makes it clear,
-KjB
Hi kjb007,
That's exactly the kind of detailed answer I was looking for.
Thanks.