We have 21 ESX servers all sharing the same set of LUNs. When the 21st ESX server was added some weeks ago there was an error in one of the LUN id settings on the SAN side so that this one LUN was presented to the 21st ESX server with a different signature from what the other 20 ESX servers were already using. We did not discover the error right away.
The 21st ESX server generated an error since it sees the different signature and disabled access to the LUN. Over this past weekend we had some major SAN maintenance that caused all the ESX servers to go through failover on the HBAs. The net result of all this is that we now only have 3 ESX servers that can still see the LUN (they are the ESX servers that are hosting the VMs on this one LUN), all the other ESX servers show a broken link. We have fixed the 21st ESX server to change to the correct LUN id on the SAN side but reboots and rescans of that server still do not clear up the problem.
VMware support suggests that we power off all the VMs on the LUN in question, turn on resignaturing on the 21st server, rescan the 21st server, turn off resignaturing on the 21st server, rescan all the other 20 ESX servers, the VMs on the LUN will now be orphaned so we will have to add them back to inventory before we can power them on again.
I have two questions. Is there any solution that would not require downtime for the VMs? How is it that this one LUN id problem would impact all the other ESX servers, and not just itself?
-Robert
Tags:
luns,
resignature