We are working with Boot from
San and Installed ESX4.0 GA on a
After Installation the ESX Server booted with no problems.
We are cloning the iscsi boot LUN in order to provision more
When doing so and booting
it on a different physical server (exact same h/w though) I get
" sh: can't access tty; job control turned off" error message when
trying to boot.
And the server is not operational.
I must mention that the same process works flawlessly when installing ESX 3.5.
Problems started with ESX4..
FYI: I use QLogic with iSCSI HBA and work with NetAPP storage.
Any idea ?
At the top of my head I can think of one reason why this might be happening. One difference between ESX 3.X and 4.0 is that the service console no longer has a dedicated file system of its own, instead it resides on a VMFS volume.
Furthermore, VMFS has builtin safeguards against accidental presentation of CLONE or SNAPSHOT luns, eg it requires the volume signature to match the LUN ID. When the LUN is cloned it may get a different UUID from the one hardcoded into the VMFS label, in which case the vmkernel will refuse to touch it unless you either enable resignaturing or set disallowsnapshotlun to 1.
Can you see any warnings in the F11 view of the console?
I do not think cloning going to work. The cloned LUN is going to have new LUN uuid and VMFS would treat that as snapshot and would not register the FS. If you re-signature the volume from another system, it may work.
thanks for the quick reply.
can you elaborate on how the ESX get a new UUID since im using the same Lun ID on the original ESX and on the cloned ESX (alwayes lun id 0).
is there any way to resignature the lun from the cloned host and not exposing the lun to another host just for resignaturing ?
btw did anyone saw officicial VMware docomentation about boot from san with ESX 4 , i didnt found one.
Come to think of it, this is actually a very interesting topic.
Based on this blog I assumed that there are storage-system generated values in addition to the LUN ID as seen by the ESX host which go into the VMFS header.
however, according to Chad Sacaz from EMC (referenced in Duncan Eppings blog post here ) it may appear that only Host LUN ID and SCSI device type are used - which ought to be the same as long as the clone LUN gets the same LUN ID every time.
one question: Are you able to switch to the F11 console of the ESX server? This is the log dumper and errors such as an inconsistent VMFS label should get printed here. If the console window exists and contains no relevant entries we are probably barking up the wrong tree...
thanks for the quick responce.
i tried to find how to set the enable resignature on VI4 and only saw this :
<code>[root@****]# esxcfg-volume --help esxcfg-volume <options> -l|--list List all volumes which have been detected as snapshots/replicas. -m|--mount <VMFS UUID|label> Mount a snapshot/replica volume, if its original copy is not online. -u|--umount <VMFS UUID|label> Umount a snapshot/replica volume. -r|--resignature <VMFS UUID|label> Resignature a snapshot/replica volume. -M|--persistent-mount <VMFS UUID|label> Mount a snapshot/replica volume persistently, if its original copy is not online. -h|--help Show this message. </code>
esxcfg-volume -l on the original ESX i get :
VMFS3 UUID/label: 49d22e2e-996a0dea-b555-001f2960aed8/VMFS_1
Can mount: Yes
Can resignature: Yes
Extent name: naa.60a98000503349394f3450667a744245:1 range: 0 - 97023 (MB)
from what i understend from this is that lun resignature is turned on on the boot lun
does sxcfg-volume is the only way to set resignature on with VI4 ?
i didnt see another option to set it in the GUI like VI3
regarding your qestion on the F11 i will check and update soon
Okay, from the screenshot I can see we are probably on the right track here. The next service which should have started is mount-root, which is where the service console file system gets mounted. Up to this point I believe that you are seeing messages from vmkernel.
At this stage you may be able to view the contents of /vmfs/volumes - just type 'ls -l /vmfs/volumes' to see wether it has any VMFS filesystems or not. Or maybe there is no /vmfs at this point since (presumably) you aren't talking to the service console yet. This is new territory, right there
There is one thing you can test to check if the problem here is that you are using a cloned lun. You will need an ESX server which didn't boot from the same iSCSI SAN. What you want to do is to present one of your cloned LUNs to this ESX, using SCSI ID 0 (which I am assuming is the ID you created it on), and rescan that ESX servers iSCSI adapter.
If the ESX server discovers your LUN and lists it as a new datastore, using the volume name you assigned to the original VMFS volume, we have pretty much eliminated the volume header as the problem source.
If however your ESX does NOT display the datastore as a new volume, but you can see it in the target list, check your vmkwarning or vmkernel log. It should state that a VMFS was found with an inconsistent header. Alternatively, if your test ESX has been set to enable resignaturing, the volume will have been assigned an auto-created name of the form snap-<UUID>-origname.
Now; if you confirm that the LUN is considered by ESX to have an invalid volume header you have a problem. Even if you resignature the volume you will probably still not have a bootable service console, due to the fact that the COS VMDK location references /vmfs/volumes/<UUID>/.
The only workaround I can think of would be to set disallowsnapshotlun=0 in the advanced configuration for the ORIGINAL ESX installation before you clone it, so that the cloned ESX servers will disregard the inconsistent VMFS header when they boot. However, this is DANGEROUS since you lose the protection against accidentally presented clones of an existing volume. If you run production with this configuration, you WILL get data corruption if you ever present a clone of a VMFS volume alongside its original.
One last note - this final caution is based on ESX 3.5. There may be changes in the multipathing driver of ESX 4.0 which I am not yet aware of, which changes the whole "disallow snapshot lun" problem. But I doubt it.
esxcfg-volume command is the only option to resignature any specific volume and using this command you will be able to resignature the replica volume. Please let us know if this doesn't resolve your problem.
Thanks for the post,
As I said, when I'm botting the cloned ESX LUN I get some kind of prompt which doesn't have the command esxcfg-volume (see attached screenshot).
I did, however, mapped that lun to a different working ESX server which is not booted from SAN.
I run esxcfg-resignature on it and it still didnt help, furthermore, I tried to even use the CV GUI (host->configuration->storage->add storage...) and tried both option
keep existing signature and resignature still didn't help.
Can you please confirm whether the ESX detected the replica volume when you mapped the lun to the same ESX server. This can be found using the option -l
~ # esxcfg-volume -l
VMFS3 UUID/label: 49ee74f1-ed0d40ff-a418-00145e5a8bb7/vol-resignature
Can mount: No (the original volume is still online)
Can resignature: Yes
Extent name: naa.6005076305ffc4930000000000001111:1 range: 0 - 20223 (MB)
After resignature can you mount the replica (source volume is already mounted) and see whether both the volumes were accessible at the same time. And then try to boot the second ESX server using the resignatured replica volume.
sorry, had a little typo there: meant to write esxcfg-volume -r and not esxcfg-resignature...
anyways, I did esxcfg-volume -l prior to "resignaturing" to see its available and copy the UUID for the esxcfg-vilume -r command
I havent mounted the replica after doing esxcfg-volume -r. I will try it.
Using the command "esxcfg-volume -r .
Can you please provide the snippet of commands executed and the output.
(If you are unable to resignature and mount the volume using esxcfg-volume -r command we might have to open a SR and upload the vm-support and array logs)
Once you provide complete set of commands used to create the replica and resignature the volume, I can check with the team and let you know if we need to file the SR.
I used NetApp's ONTAP webUI to do all the NetApp related stuff.
1. Created a new volume and a new LUN on its root.
2. Mapped that LUN to a HDless Host (we'll call is BFS)
3. Installed a fresh ESX4 Server on that LUN.
4. Created a snapshot after making sure the ESX Server works.
5. Flexcloned the volume in which that LUN resided in using the ONTAP wizard.
6. Mapped the cloned LUN to a different (working) Host (we'll call is ESX).
7."Resignatured" the Snapshot
8. Mapped it back to BFS
9. Booted BFS2 with that LUN
Let me know if you still need more details.
There is already an open support request SR#1329029201
The ticket has been closed by VMWare suuport and has not been resolved.
Got a laconic answer that this feature is not supported and that they have no clue regarding a possible workaround.
Would be extremely happy to get some help here if possible.