VMware Cloud Community
zwhitea
Contributor
Contributor
Jump to solution

What happens when back end storage fails

I cannot find out much information about this. We use NAS and NFS as our storage, it is clustered but a cluster failover causes us problems. A cluster failover usually takes about 10+ seconds. In this time the Windows based virtual machines blue screen becasue they realise that they no longer have a virtual disk.

Is it supposed to work like this ??

Reply
0 Kudos
1 Solution

Accepted Solutions
jpdicicco
Hot Shot
Hot Shot
Jump to solution

Review this aritcle. Some values are Exchange-specific, but it covers the registry value in question:

JP

Happy virtualizing! JP Please consider awarding points to helpful or correct replies.

View solution in original post

Reply
0 Kudos
7 Replies
TobiasKracht
Expert
Expert
Jump to solution

Possibly you are using active-passive HA storage, while for avoid such situation you need to use active-active HA storage, like StarWind 5, Datacore SANMelody or LeftHand.

StarWind Software R&D

StarWind Software R&D http://www.starwindsoftware.com
mcowger
Immortal
Immortal
Jump to solution

It can - you can change the timeout for a bluescreen for windows....






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
jpdicicco
Hot Shot
Hot Shot
Jump to solution

Review this aritcle. Some values are Exchange-specific, but it covers the registry value in question:

JP

Happy virtualizing! JP Please consider awarding points to helpful or correct replies.
Reply
0 Kudos
raadek
Enthusiast
Enthusiast
Jump to solution

We use NAS and NFS as our storage, it is clustered but a cluster failover causes us problems.

What particular storage device are we talking about? NetApp? EMC NS? Something else?

Is the cluster fail-over you are taking about related to failing over from one storage controller to the other? Is it done purely for testing purposes or triggered by something else?

Are you utilising any form of cross-stack LACP port grouping (like Cisco Etherchannel) allowing both network paths to be active?

Regards,

Radek

zwhitea
Contributor
Contributor
Jump to solution

We are using netapp clustered filers in a passive active mode. I have now set the disk timeout value on the virtual windows servers to 125 seconds, so hopefully they will be okj now. It does raise the question of how long a windows server can survive without writing to disk.

Reply
0 Kudos
raadek
Enthusiast
Enthusiast
Jump to solution

We are using netapp clustered filers in a passive active mode. I have now set the disk timeout value on the virtual windows servers to 125 seconds

You are on the right track - the actual NetApp recommendation is to set it to 190 seconds.

You can read more here (providing you have NetApp NOW login):

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb41511

Regards,

Radek

Reply
0 Kudos
AaronDelp01
Enthusiast
Enthusiast
Jump to solution

If you don't modify the systems, you will not fail over without blue screens. On NetApp the cluster failover takes in the neighborhood a minimum of 40secs. If you don't increase the I/O values timeout settings on EVERY vm, you will probably blue screen some during a take over. I have done a bunch of installs and setting this value is mandatory if you want to survive a fail over. You will need a NetApp NOW account to see the article.

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb41511

In addition if you haven't, take a look at the NetApp/VMWare Best Practices document. It has values to set at the ESX/vSphere level to help the systems for performance and stability during a fail over. Read the doc and perform all the edits mentioned.

http://blogs.netapp.com/virtualization/2009/07/new-tr3749-netapp-vmware-vsphere-best-practices.html

Lastly, make sure your network interfaces on NetApp are properly configured. There is a tool called Cluster Configuration Checker. Run this to make sure both your heads are configured properly. I wrote an article on it awhile back. Here is the link:

http://blog.aarondelp.com/2009/10/netapp-cluster-confiugration-tool.html

If you do all of the above, the systems will be rock solid during failover. I have installed this and tested it for both LUNs and NFS many times.

Aaron Delp

http://blog.aarondelp.com

Aaron Delp aarondelp.com // @aarondelp
Reply
0 Kudos