VMware

This Question is Answered

1 "helpful" answer available (6 pts)
8 Replies Last post: Apr 15, 2008 4:57 PM by rreynol  

LUN resignature question posted: Apr 14, 2008 4:20 PM

Click to view rreynol's profile Hot Shot 178 posts since
Jan 26, 2004

We have 21 ESX servers all sharing the same set of LUNs. When the 21st ESX server was added some weeks ago there was an error in one of the LUN id settings on the SAN side so that this one LUN was presented to the 21st ESX server with a different signature from what the other 20 ESX servers were already using. We did not discover the error right away.

The 21st ESX server generated an error since it sees the different signature and disabled access to the LUN. Over this past weekend we had some major SAN maintenance that caused all the ESX servers to go through failover on the HBAs. The net result of all this is that we now only have 3 ESX servers that can still see the LUN (they are the ESX servers that are hosting the VMs on this one LUN), all the other ESX servers show a broken link. We have fixed the 21st ESX server to change to the correct LUN id on the SAN side but reboots and rescans of that server still do not clear up the problem.

VMware support suggests that we power off all the VMs on the LUN in question, turn on resignaturing on the 21st server, rescan the 21st server, turn off resignaturing on the 21st server, rescan all the other 20 ESX servers, the VMs on the LUN will now be orphaned so we will have to add them back to inventory before we can power them on again.

I have two questions. Is there any solution that would not require downtime for the VMs? How is it that this one LUN id problem would impact all the other ESX servers, and not just itself?

-Robert


Re: LUN resignature question

1. Apr 14, 2008 5:55 PM in response to: rreynol
Click to view mike008's profile Enthusiast 95 posts since
Apr 7, 2004
If the disk signature does not match the characteristics of the LUN as it is presented to the host (flags, LUN ID, etc.) then the ESX host will generally view it as a snapshot LUN. If you set disallowsnapshotLUN to 0 in the Advanced Settings --> LVM on the host, you should then be able to see the LUNs. May have other issues though. I have had to deal a lot with signaturing and snapshot LUNs - it can be a bit of a headache.

Although I feel that I don't quite have all the info for your situation to accurately give my $.02, I'll try. After re-reading your post, unless you can present the LUN on the same ID the same way as before, I don't think there is any other way than what VMware support suggests for a long-term solution. You will have to resig it at some point to match it's presentment. With regards to your second question, the key is that the signature must match the presentment. Either the presentment was changed to all the other servers at the same time, or the disk was resignatured to match perhaps the presentment on that one server breaking it for everyone else. One thing I can advise though is that if you have VMs running on that LUN right now (on those three hosts), don't rescan those hosts until you have a plan of attack for this situation unless you can afford the downtime.

Mike

P.S. Check out VMWorld 2007 Breakout Session Slides for "Top Support Issues & How to Troubleshoot The Part I) - Issue #2 VMFS Volumes and Snapshots. May be helpful. Not sure if I can post it here. If I can't, moderators please remove the attachment.

Attachments:

Re: LUN resignature question

4. Apr 14, 2008 6:23 PM in response to: rreynol
Click to view mike008's profile Enthusiast 95 posts since
Apr 7, 2004

Assuming the LUN has NOT be resignatured, then yes, in theory I believe it can be presented back to the servers the same way it was. Rescan and if the data in the LVM header matches what is returned to the server, then voila you should be back to the original configuration. The big question is - what changed about the LUN presentation? Is it just the id? Something else? This is the area where I didn't feel like I quite had all the info from the original post.

Mike


Re: LUN resignature question

6. Apr 14, 2008 6:50 PM in response to: rreynol
Click to view mike008's profile Enthusiast 95 posts since
Apr 7, 2004
Unfortunately, that doesn't tell us whether or not the LUN was resignatured. It is being viewed by the 21st ESX server as a snapshot LUN (hence the prefixed name assigned to it) so LVM.DisAllowSnapshotLUN must have been changed from it's default setting of enabled. Hopefully LVM.EnableResignature was not changed and enabled also. The rest of the ls -al output is just a GUID not the header info we are looking for. Unless you have a backup of the VMFS header, we have nothing to compare it to so the only way to tell is by rescanning the LUN from a server that until then saw the old sig (which will bring down your working servers I'm pretty sure) and then look at the vmkernel log. Can you sacrafice 1 of 3? Probably not.Otherwise, I am not sure how we would determine if the signature was rewritten. I think I determined it before by looking through the vmkernel logs of the server I suspected as the culprit. If you know when the 21st server was added, then it should say in the log that it saw it as a snapshot and resignaturing is enabled so it is resignaturing it (or someting like that).

VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities