kghammond2009
Enthusiast
Enthusiast

vSphere 4.0 Update 1 and MSCS (Microsoft Clustering) on iSCSI

I have spent the past week collecting as much information as I can about MSCS (Microsoft clustering) on VMware using a iSCSI SAN. iSCSI is making a lot of grounds as a legitatmate SAN solution but it appears both vSphere 4 and Windows 2008 are still on their infancy with fully supporting iSCSI like a FC SAN.

Below is a collection of notes that I have put together regarding running a MSCS cluster on vSphere 4 Update 1. As best I can tell, it is impossible to run a fully supported MSCS cluster on a iSCSI SAN while booting from the iSCSI SAN. Disappointing Smiley Sad

So what to do? I will keep updating this post with any additional information that is confirmed or corrected as I continue to research this.

Well here it all is...

vSphere and Microsoft Clustering

Issues:

Here are the supported clustering configurations in VMware:

• The OS or System volume must be located on the ESX host’s local physical disks (which defeats HA and DR) or it is supported on a FC SAN. The individual cluster node’s OS (vmdk) is not supported on an iSCSI LUN. This appears to be the single largest issue that, with iSCSI, we cannot overcome and be in a supported configuration.

• VMware HA is supported. DRS and vMotion are not supported. You can run a clustered node in a DRS cluster, but DRS must be disabled for all clustered nodes.

- This is a by-product of the requirement for independent disks.

• Shared Quorum drives and Shared clustered drives are supported on FC SAN’s only.

• Round Robin MPIO for multi-pathing is not supported (We are using round robin MPIO).

• I believe snapshots are not supported.

- This is a by-product of the requirement for independent disks.

• Thin provisioned vmdk’s are not supported

- I believe this is on shared storage volumes only

• Memory over-commit is not recommended with SWAP stored on SAN.

- I believe the SWAP needs to be located on the local server disks if you are using memory over-commit and your OS vmdk is located on a SAN volume.

- This may have changed in vSphere 4.0 Update 1

This basically means the following regarding running Exchange or SQL in a cluster on VMware:

• Unless we use FC SAN, we cannot place a clustered node on the SAN thus we cannot use HA, DRS or vMotion while being in a supported configuration. This defeats the value of virtualizing our clusters for DR purposes.

• Lack of snapshots and round robin IO would also put substantial limits on performance, backups and maintainability.

From what I have read, we should be able to build clusters on vSphere that work with all of the following:

• iSCSI

• vMotion

• HA

• DRS

• Maybe Snapshots

• Maybe Round Robin MPIO

• Shared storage may be slightly challenging for SQL based clusters. I have read mixed reviews of this on iSCSI/NFS storage.

Alternatives and unclear options:

• Windows 2008 supports a MNS cluster (Majority Node Set with a File Share Witness). In this solution, you do not need to use shared storage between clustered nodes. This should get around the FC requirement, but I cannot find any documentation that says MNS with a File Share Witness is supported or not supported in specific VMware environments.

• For Exchange CCR, with a MNS cluster, there should be no need for any shared storage between nodes, quorum or data.

o But, I have read differing opinions on what the effect of vMotion and iSCSI based storage can be in this type of environment. Basically as I best understand it, any SAN based virtual machine can be paused by VMware if there is congestion on the SAN. VMware will pause the VM’s temporarily until the SAN catches up. From what I have read, during these short pauses, a non-FC clustered node may act like it has gone offline and it may initiate a node failover. The same may be true for vMotion, which I understand why it is not officially supported since it may inadvertently cause a node to appear to have gone offline.

o Now from other reading there are people who have numerous clusters running with vMotion and no problems. But vMotion is definitely not supported.

• Additionally, iSCSI based storage is supported by Microsoft in a clustered environment. This is if you are using Microsofts software iSCSI initiator in the OS. Also the Microsoft iSCSI initiator is supported in VMware with HA, DRS and vMotion. So this might be another solution.

• In the SQL cluster world, shared storage is required unlike the Exchange CCR which does not require shared storage. This may be an instance where we need to use the Microsoft iSCSI initiator for the shared storage if we want to be close to a supported configuration.

With all of that being stated, I think the most supported environment we can get is the following:

• Windows 2008, MNS Cluster for Exchange CCR, no shared storage, no vMotion, HA, no DRS, VM’s might be able to live on iSCSI, no snapshots.

• For SQL, Windows 2008, MNS Cluster, shared storage using MS iSCSI Initiator, no vMotion, HA, no DRS, might be able to live on iSCSI, no snapshots.

References:

Setup for Failover Clustering and Microsoft Cluster Service (vSphere 4.0 Update 1) – VMware publication

http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_mscs.pdf

Micorosfot FAQ on iSCSI support for clusters with Windows 2003

http://www.microsoft.com/windowsserver2003/technologies/storage/iscsi/iscsicluster.mspx

A thread on how to setup a VM cluster that can be vmotioned (unsupported by VMware)

http://communities.vmware.com/message/625404#625404

Discussion threads and blogs:

pvSCSI driver appears to be incompatable with the Windows 2008 Failover cluster validation tool.[http://communities.vmware.com/message/1490569#1490569|http://communities.vmware.com/message/1490569#1490569]

Reference to shared storage being supported only on FC or via software iSCSI initiator

http://communities.vmware.com/message/1285706#1285706

Issues with memory over commitment, SWAP stored on SAN and possible multi-pathing. Also a discussion on the negative issues with the VMware requirement for putting the VM’s on local storage.

http://communities.vmware.com/message/638956#638956

Also discussed in this thread, speculation as to why VMware does not support SAN based VM clusters.

http://communities.vmware.com/message/639452#639452

My personal thoughts about this thread, if you are using a File Share Witness, this should alleviate the issues of SAN Connectivity, thus introducing more stability into a SAN based VM cluster.

Additionally this thread reinforces that you should not put a Quorum drive on a unsupported storage system. Software iSCSI “should work” for a quorum drive as well as using a File Share Witness.

Blog about why vmotion and DRS do not work with a VM clustered node, but HA does work:

http://www.rtfm-ed.co.uk/2007/05/04/vmmscs-clusteringvmware-ha/

There is another thread that discusses changing the shared storage controller to virtual (unsupported by VMware) in order to overcome the vMotion and DRS VM configuration issues.

Blog on how the File Share Witness works and why it is a bad idea to locate the File Share Witness on a Hub Transport server.

http://blogs.technet.com/timmcmic/archive/2009/01/22/exchange-2007-sp1-ccr-windows-2008-clusters-fil...

To sum this up, if we are using a File Share Witness in a VM cluster, the File Share Witness needs to be very reliable and needs to be online throughout any Exchange patching maintenance. We most likely would want to update the policy for the File Share Witness to be more aggressive than the 1 hour default setting.

This is a MSKB on placing volume mount points on clustered shared disks.

http://support.microsoft.com/kb/947021

A thread on a partially working MNS cluster on VMware with some pros and cons.

http://communities.vmware.com/thread/73508

Some good articles on configuring and managing the File Share Witness in an Exchange CCR cluster and how to manage and move the FSW and how to manage and failover the CCR.

http://technet.microsoft.com/en-us/library/bb676490(EXCHG.80).aspx

http://www.msexchange.org/articles_tutorials/exchange-server-2007/migration-deployment/deploying-exc...

Interesting blog on Exchange 2010, VMware and DAG Groups

http://kennethvanditmarsch.wordpress.com/2009/11/20/vmotion-and-exchange-2010/

Message was edited by: kghammond2009

0 Kudos
19 Replies
VMmatty
Virtuoso
Virtuoso

Wow, great post with lots of great info. Thanks for sharing your research with the community.

I try to stick with using the iSCSI Initiator inside the guest for clusters where necessary as you get additional flexibility that way. I've done this with Exchange 2007 CCR clusters, file clusters, SQL clusters, etc. And while I've been able to perform VMotion on those I wouldn't necessarily recommend it since I have seen bad things happen. As the technology progresses and/or you do not need to access SAN based software (replication, snapshots) from inside the guest, I've seen people start to move away from this and back to using straight VMDK files. MNS clsuters like those used by Exchange 2007/2010 make this much easier since they do not required shared storage.

I too am frustrated by the support policies around clustering and VMware and find it to be extremely limiting. But one thing to keep in mind is that you're generally going to speak with the application vendor for an application level issue and not VMware, so their support (to me) is less of a concern. For example, you should always check the Server Virtualization Validation Program to make sure that your configuration is supported on a specific hypervisor. Configurations like Exchange 2007 CCR or Exchange 2010 DAG are supported by Microsoft when running on the VMware platform, so to me that is most important.

Here's a link to Microsoft's SVVP wizard where you can put in your specific configuration and verify support:

http://www.windowsservercatalog.com/svvp.aspx?svvppage=svvpwizard.htm

Matt | http://www.thelowercasew.com | @mattliebowitz
kghammond2009
Enthusiast
Enthusiast

I am in the process of trying to build a MS validated - Failover Cluster using the iSCSI initiator on vSphere 4.0 Update 1. I am hitting a few minor hurdles but it seems like it should work.

I would tend to agree a MS supported cluster is more important than a VMware supported solution. VMware isn't going to support a iSCSI cluster no matter what.

Out of curiosity, from a maintenance perspective. If you have a live cluster running across two nodes and you need to do maintenance to the ESX host. Do you fail over the cluster, then take the node offline and do maintenance then? Or do you vMotion the primary node of the cluster to a different host then do your ESX maintenance.

I also like the idea of the iSCSI imitator over a VMDK. The one downside to a VMDK is that you have less flexibility from your SAN tools, and I wouldnt trust storage vMotion in a cluster as well, so any changes to your VMFS will require taking your cluster offline.

The downside to the iSCSI initiator, is that you need one or two more physical NICs to punch through our ESX hosts for VM iSCSI traffic. We are already at 9 - 13 nic's per ESX host... With 10 - 15 hosts, that occupies a lot of switching infrastructure.

0 Kudos
kghammond2009
Enthusiast
Enthusiast

Of note, I added a thread regarding pvSCSI driver causing the Failover Cluster Validation tool to fail.

0 Kudos
VMmatty
Virtuoso
Virtuoso

For me if I need to do maintenance on the ESX hosts (like patches), I'll failover the cluster to one node and shut the other one down. I have used VMotion to move the live cluster nodes in the past but I don't like doing it for production workloads. Using VMotion in that configuration isn't supported by either VMware or Microsoft so I would avoid doing it. In fact I recently tried to do it during a scheduled downtime with a SQL cluster and the timeout during VMotion ended up causing the cluster to failover. Clusters that don't use shared storage (like the majority node set clustering used by Exchange 2007/2010) seems to handle VMotion a lot better but I'd still avoid it.

I do like using the iSCSI initiator inside the guest since it makes things more flexible if you need to use SAN software. If you don't then actually having a VMDK gives you some options as well. With a VMDK you can use backup software like Veeam Backup which can be nice depending on your backup strategy. Depends on what you're really after. With Exchange 2010 we've been using VMDKs more often since we find that the SAN snapshots aren't really giving us much.

I don't think you need so many NICs. Is there a reason why you can't create VLANs and do VLAN tagging at the portgroup level? Unless you have a really complicated network I can't see putting 13 NICs in an ESX host just to separate out traffic. I would use VLANs if possible to simplify that setup.

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
geddam
Expert
Expert

Thanks for sharing your experience! Information given above states pros and cons, really appreciate it!

So, whats your bottom line of MSCS with VMware, are you happy?

Thanks,,

Ramesh. Geddam,

VCP 3&4, MCTS(Hyper-V), SNIA SCP.

Please award points, if helpful

Thanks,, Ramesh. Geddam,
0 Kudos
VMmatty
Virtuoso
Virtuoso

For me I would be happier with running Failover Clusters on VMware if the support policies weren't so strict. Having to exclude those VMs from HA is a little annoying (though I understand why they do it) and just in general having to be so specific to get support is frustrating.

Most of the major applications like Exchange and SQL are being run in Failover Clusters more and more, especially as their features mature to take advantage of clustering. I'd really like it if I wasn't walking a tightrope when it comes to support.

All of that said - I've called both Microsoft and VMware for support an Exchange 2007 cluster with iSCSI attached volumes inside the guest and neither had any issue supporting me. That isn't to say that you might get a more strict support engineer, but you never know.

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
kghammond2009
Enthusiast
Enthusiast

My gut feel is that the main issues with vMotion, snapshots, etc occur when you are using a shared quorum drive. In the shared quorum solution, any pauses to the OS caused by iSCSI congestion, vMotion pauses, MPIO path fail-overs, memory swapping, etc all can cause an inadvertent fail-over on the shared quorum since the OS is paused.

The MNS with a File Share Witness should resolve most of these issues since the latencies involved are more substantial. The downside to the File Share Witness is that if both nodes go offline for some reason, the cluster will not come back online "automatically" until the node that owned the File Share Witness comes back online.

I think we are going to try to implement our SQL clusters sticking as close to physical boxes as possible and staying as supported as possible. So we will do the iSCSI initiator, shared quorum, no vMotion, no DRS, just HA. Then fail-over clusters when doing maintenance on ESX hosts. It just feels like we should be able to use some of the advanced VMware features like vMotion. Oh well.

For our organization, Exchange 2007 CCR is staying on physical servers after all the research. Keeping FC around just for Exchange CCR is not cost effective. The lack of support for iSCSI based Exchange CCR in VMware is too concerning. As is the case in a lot of organizations, Exchange is probalby are most critical application.

On the NIC side, 2 nic's for the Service Console, 2 nic's for vMotion, 2 nic's for iSCSI, 2 nic's for production vlan's, 2 nic's for a seperate switch fabric set of vlan's, 2 nic's for DMZ also on a 3rd switch fabric, 1 nic for a direct Internet connection (externally connected test VMs) also on a seperate switch fabric. There you have it, 13 nics. You could argue that in vSphere 4, the Service Console is fully supported on a VLAN, so we probably don't need those two dedicated anymore, but that is still 11 nic ports.

0 Kudos
VMmatty
Virtuoso
Virtuoso

I agree regarding the shared quorum clusters. I think Microsoft is moving away from them so it will probably be less of an issue in the future. I still would stay away from doing VMotion for clusters unless you had no other choice since you definitely run the risk of timeouts and cluster failovers. VMware has been talking about improvements coming to VMotion in the near future, such as big increases in performance, so again this might not be a big deal in the future.

For me I care more about Microsoft support for CCR in a VM, and even running on VMware you are supported. I can live without VMware supporting my cluster.

When you're talking about two nodes going down at the same time and not coming back until the node that owns the FSW, are you talking about actually putting the FSW on the cluster itself? The article you linked above talks about a specific scenario in which putting the FSW on a Hub Transport as being a bad idea. While he is technically correct, you could run into the scenario he describes no matter what server you put the FSW on. I don't want to get too far off topic but I do believe that putting the FSW on an Exchange Hub Transport is a best practice. See the following for more info: http://msexchangeteam.com/archive/2007/04/25/438185.aspx

I think you could combine some of those vSwitches and use VLANs to reduce the NIC count. You can still have redundancy between SC, VMotion, VMs, etc, and do it with less NICs. Network/vSwitch design is another whole discussion and every environment is different but in general using VLANs should be able to reduce the total NICs necessary.

Good discussion... Hopefully discussions like this help VMware realize organizations want to run their applications on Failover Clusters running on VMware ESX. Changing their support policies would be another way to steer people away from Hyper-V instead of driving them towards it.

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
DaveBerm
Contributor
Contributor

If VMwareHA were extended to support application monitoring such that an application specific agent would monitor the availability of the application, would that eliminate the need to cluster within the VM? For instance, what if there were a solution like this...

-Application specific agent runs in the VM and heartbeats to vCenter

-The agent periodically checks the health of the application

-If the agent detects a failure of the application it attempts to fix the problem

-If the agent is unable to fix the problem, vCenter is notified and the VM is restarted

-The process continues....

Would this eliminate the need for MSCS/WSFC? Let me know what you think.

Thanks!

0 Kudos
VMmatty
Virtuoso
Virtuoso

I don't think anything is going to truly eliminate the need for Microsoft clustering. Microsoft is incorporating application level replication/HA features that rely on some form of clustering. I would much rather rely on the application level awareness than an agent that VMware (or other vendor) has to write, update for new versions, and ultimately certify with Microsoft so that it is supported.

What I'd really like to see is more support from VMware for virtual clusters. I'm not sure how organizations can get to 100% virtualization when there are so many restrictions on support for virtual clusters.

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
vSeanClark
Enthusiast
Enthusiast

So for larger shops, are folks building pairs of ESX servers to be purpose built for SQL Clusters? Or are folks still building larger clusters with DRS/HA, but disabling HA/DRS features for the clustered VMs?

Simplest solution (taking into account the restricitions of virtual cluster) seems to be 2 ESX Server clusters w/ DRS/HA disabled.






Please consider voting for my 2010 VMworld Session if you'd like to hear more: http://seanclark.us/?p=314

Sean Clark - vExpert, VCP - http://twitter.com/vseanclark - http://seanclark.us

Sean Clark - http://twitter.com/vseanclark
0 Kudos
vSeanClark
Enthusiast
Enthusiast

So for larger shops, are folks building pairs of ESX servers to be purpose built for SQL Clusters? Or are folks still building larger clusters with DRS/HA, but disabling HA/DRS features for the clustered VMs?

Simplest solution (taking into account the restricitions of virtual cluster) seems to be 2 ESX Server clusters w/ DRS/HA disabled.






Please consider voting for my 2010 VMworld Session if you'd like to hear more: http://seanclark.us/?p=314

Sean Clark - vExpert, VCP - http://twitter.com/vseanclark - http://seanclark.us

Sean Clark - http://twitter.com/vseanclark
0 Kudos
VMmatty
Virtuoso
Virtuoso

I haven't seen folks create dedicated vSphere clusters just for clustered Windows VMs. The extra complexity (and potentially cost) isn't really worth it when you can simply exclude those VMs from DRS/HA and still remain supported. I agree that the dedicated cluster is a simpler solution but may negate some of the cost saving benefits of virtualizing in the first place.

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
aenagy
Hot Shot
Hot Shot

vSeanClark:

We have a medium size environment (1000 VMs, 50 hosts, including a half dozen remote offices) and have built two hosts (ESXi 4.0.0 Update 1) exclusively for MSCS virtual machines for exactly the support concerns raised by the original poster. Because there are no non-MSCS virtual machines on these two ESXi hosts, they have not been configured in a VMware Cluster. I have even gone to the extra step of configuring the SAN LUNs for these two hosts in a different Storage Group (we are using CLARiiON CX4 via FC). As a result VMotion to/from the other hosts at the data center is a moot point. We don't use iSCSI in any way, and NFS will only be implemented for Templates and ISO files. We also don't use RDMs, even for MSCS virtual machines. The local (boot) virtual disks for these MSCS virtual machines reside on the SAN, not local HDD of the ESXi host.

All of this was done to ensure that either vendor (VMware or Microsoft) could not point to some cross dependancy if we encountered a problem. Otherwise I would have a lot of explaning to do if there were ever a problem.

That being said, I would really like it if both VMware and Microsoft would support VMotion/SVMotion of MSCS virtual machines.

0 Kudos
vSeanClark
Enthusiast
Enthusiast

aenagy,

I like that idea of not doing RDMs. That how does that help better your support of the solution? Thought RDMs were required?






Please consider voting for my 2010 VMworld Session if you'd like to hear more: http://seanclark.us/?p=314

Sean Clark - vExpert, VCP - http://twitter.com/vseanclark - http://seanclark.us

Sean Clark - http://twitter.com/vseanclark
0 Kudos
aenagy
Hot Shot
Hot Shot

aenagy,

I like that idea of not doing RDMs. That how does that help better your support of the solution? Thought RDMs were required?

vSeanClark:

I should have been more clear: we have not used RDMs for MSCS virtual machines, only becuase until recently we were doing cluster in a box on ESX 3.0.3. We are just now implementing the second ESXi 4 (build 219382) host (the first one was already upgraded), but I have not had the time to work out the process for our support team. As I re-read the documentation (http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_mscs.pdf) it does clearly state on page 21 that RDMs are required for cluster across boxes. It looks like I'm going to have to move up my MSCS on vSphere dev work.

0 Kudos
kghammond2009
Enthusiast
Enthusiast

Well after a few months of having both a SQL 2005 and a SQL 2008 cluster running on vSphere on Windows 2008 MSCS, our conclusion is that it is not stable in our environment.

We attempted to use the MS iSCSI initiator for a quorum drive and shared storage. We tried to stay as close to a supported configuration as possible. The end result running in test/dev is that the cluster services fail regularly.

I suspect a timing issue of some sort, but what part is failing is hard to pinpoint.

For clarity our infrastructure is as follows:

IBM x3650 VMware vSphere hosts

Windows 2008 Enterprise MSCS using Microsoft iSCSI initiator

HP Lefthand iSCSI SAN

HP ProCurve Network Infrastructure

vSphere using teamed NIC's for MS iSCSI initiator traffic

iSCSI SAN on same switch backbone as VMware hosts so no trunking between switches is involved

It is possible that switching to a fail-over witness may stabilize the Quorum but if the quorum drive is having issues, then can I really trust shared storage?

0 Kudos
VMmatty
Virtuoso
Virtuoso

Sorry to hear you're having problems. I haven't seen those problems on the many different kinds of clusters I've run. I have SQL 2005 clusters running on similar hardware (including LeftHand SANs) and don't have any issues failover over or back. In fact the only issue I saw that causes the cluster to fail was when I tried to VMotion one of the nodes. VMotion isn't supported on those anyway and I was only doing it after hours but it didn't like it. For all other operations it has been fine.

I'd suspect the networking or the storage as the culprit. I don't think that it is running poorly just because it is running as a virtual machine.

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
UHCU
Contributor
Contributor

I've been researching how to install a successful SQL2008R2 cluster across ESXi4.x hosts myself, and seen similar replies; some folks have no problem, others experience regular issues with failover.

If you are using a cluster heartbeat with your clustering solution, you might consider changing the default timing of the heartbeat itself:

http://technet.microsoft.com/en-us/library/dd197562(WS.10).aspx

According to that link, the default heartbeat is 1 second, and the default timeout at which a failover is initiated is 5 heartbeats, or 5 seconds, yet the settings can be adjusted so that you have as much as a 20 second delay before a failover is initiated.

Here's the rub - In the world of iSCSI it's not unheard of to sometimes have delays of 5 seconds or more, that's why Microsoft best practice is to adjust one's iSCSI disk timeout to 60 seconds. In fact, a delay of 5-10 seconds is extremely common when it comes to vMotion. I have to wonder, then, if perhaps simply adjusting the default timeouts for the cluster heartbeat might not result in a far more stable failover cluster using iSCSI? It's not as if a regular, non-clustered server (even an SQL or Exchange server) has any particular problems with iSCSI delays, why should a clustered version, except as regards the default cluster heartbeat timeout? The fact that Microsoft even allows you to set the cluster heartbeat for a maximum of 20 seconds indicates they at least support it (if not exactly recommend it I would imagine).

Anyway, just figured I might suggest this as a way to help with your failover issues.

0 Kudos