Hello everyone,
We are having lots of problems with multipathing and losing connection to the san
If you can, please help or speak up if you are having similar problems.
There doesn't seem to be a detailed reference anywhere on how MRU and Fixed Policy work when dealing with a san that want's to control what paths should be used. And in what situations MRU or FIXED should be used.
We have
6 ibm 445's w/ each with 2 Qlogic hba's (hba0 & hba1), about 15 VM's per server
2 McData fiber switchs (Switch0 & Switch1)
IBM FastT900 w/ 2 Storage Processor's (SP0 & SP1)
7 400GB LUN's - 6 assigned to only one VMware server and 1 shared
each hba0 connects to switch 0 which connects to SP0
each hba1 connects to switch 1 which connects to SP1
The default Vmware install set the policy to Fixed path.
-From the get go the fixed/preferred path on the vmware server did not match the fixed/preferred path on the fastT.
First Incident:
Three vmware server's lost connection to the SAN. We thought that was because ADT (auto logical drive transfer) on the FastT was fighting with the vmware drivers on what path should be used.
We disabled ADT on the FastT900.
A couple of weeks later
Second Incident:
One VMware server lost connection to san, This caused 18 VM's to die, the Vmware server could not see its 400GB LUN, the only thing we could find is that the Preferred path on the FastT900 did not match the Preferred path on the VMware server. We changed the FastT900 to match up with the preferred path of the VMware server, restarted the VMware server and everything was good again.
The only documentation we could find, mentioned that ESX 2.1 with most sans uses MRU. This leads us to believe that ESX 2.1 for some reason decided that when talking to the FastT900 that it would be best to have a Fixed Policy, this being said. Does Fixed Policy try to detect the preferred path settings from the san or does it define it on its own?
From what we have seen it looks like the VMware servers fight with the FastT900 on what path to use, when a failover trys to occur and both paths look like they are available to VMware, it takes a very long time for the failover to happen. Long enough to kill all the VM's and cause you to rescan the san or reboot before the LUN's become available again.
We tested failover initially by pulling the fiber from one of the Switch's to the SP, when this is done failover happens quickly. We took that method as a good way to test failover.
We are having lots of problems with multipathing and losing connection to the san
If you can, please help or speak up if you are having similar problems.
There doesn't seem to be a detailed reference anywhere on how MRU and Fixed Policy work when dealing with a san that want's to control what paths should be used. And in what situations MRU or FIXED should be used.
We have
6 ibm 445's w/ each with 2 Qlogic hba's (hba0 & hba1), about 15 VM's per server
2 McData fiber switchs (Switch0 & Switch1)
IBM FastT900 w/ 2 Storage Processor's (SP0 & SP1)
7 400GB LUN's - 6 assigned to only one VMware server and 1 shared
each hba0 connects to switch 0 which connects to SP0
each hba1 connects to switch 1 which connects to SP1
The default Vmware install set the policy to Fixed path.
-From the get go the fixed/preferred path on the vmware server did not match the fixed/preferred path on the fastT.
First Incident:
Three vmware server's lost connection to the SAN. We thought that was because ADT (auto logical drive transfer) on the FastT was fighting with the vmware drivers on what path should be used.
We disabled ADT on the FastT900.
A couple of weeks later
Second Incident:
One VMware server lost connection to san, This caused 18 VM's to die, the Vmware server could not see its 400GB LUN, the only thing we could find is that the Preferred path on the FastT900 did not match the Preferred path on the VMware server. We changed the FastT900 to match up with the preferred path of the VMware server, restarted the VMware server and everything was good again.
The only documentation we could find, mentioned that ESX 2.1 with most sans uses MRU. This leads us to believe that ESX 2.1 for some reason decided that when talking to the FastT900 that it would be best to have a Fixed Policy, this being said. Does Fixed Policy try to detect the preferred path settings from the san or does it define it on its own?
From what we have seen it looks like the VMware servers fight with the FastT900 on what path to use, when a failover trys to occur and both paths look like they are available to VMware, it takes a very long time for the failover to happen. Long enough to kill all the VM's and cause you to rescan the san or reboot before the LUN's become available again.
We tested failover initially by pulling the fiber from one of the Switch's to the SP, when this is done failover happens quickly. We took that method as a good way to test failover.