IT_ST
Contributor
Contributor

Boot from SAN (iSCSI)isn't working for me on ESX4.0: "sh: can't access tty; job control turned off"

We are working with Boot from

San and Installed ESX4.0 GA on a

iSCSI LUN.

After Installation the ESX Server booted with no problems.

We are cloning the iscsi boot LUN in order to provision more

ESX servers.

When doing so and booting

it on a different physical server (exact same h/w though) I get

" sh: can't access tty; job control turned off" error message when

trying to boot.

And the server is not operational.

I must mention that the same process works flawlessly when installing ESX 3.5.

Problems started with ESX4..

FYI: I use QLogic with iSCSI HBA and work with NetAPP storage.

Any idea ?

0 Kudos
30 Replies
oschistad
Enthusiast
Enthusiast

At the top of my head I can think of one reason why this might be happening. One difference between ESX 3.X and 4.0 is that the service console no longer has a dedicated file system of its own, instead it resides on a VMFS volume.

Furthermore, VMFS has builtin safeguards against accidental presentation of CLONE or SNAPSHOT luns, eg it requires the volume signature to match the LUN ID. When the LUN is cloned it may get a different UUID from the one hardcoded into the VMFS label, in which case the vmkernel will refuse to touch it unless you either enable resignaturing or set disallowsnapshotlun to 1.

Can you see any warnings in the F11 view of the console?

paithal
VMware Employee
VMware Employee

I do not think cloning going to work. The cloned LUN is going to have new LUN uuid and VMFS would treat that as snapshot and would not register the FS. If you re-signature the volume from another system, it may work.

IT_ST
Contributor
Contributor

thanks for the quick reply.

can you elaborate on how the ESX get a new UUID since im using the same Lun ID on the original ESX and on the cloned ESX (alwayes lun id 0).

is there any way to resignature the lun from the cloned host and not exposing the lun to another host just for resignaturing ?

btw did anyone saw officicial VMware docomentation about boot from san with ESX 4 , i didnt found one.

0 Kudos
oschistad
Enthusiast
Enthusiast

Come to think of it, this is actually a very interesting topic.

Based on this blog I assumed that there are storage-system generated values in addition to the LUN ID as seen by the ESX host which go into the VMFS header.

however, according to Chad Sacaz from EMC (referenced in Duncan Eppings blog post here ) it may appear that only Host LUN ID and SCSI device type are used - which ought to be the same as long as the clone LUN gets the same LUN ID every time.

one question: Are you able to switch to the F11 console of the ESX server? This is the log dumper and errors such as an inconsistent VMFS label should get printed here. If the console window exists and contains no relevant entries we are probably barking up the wrong tree...

0 Kudos
IT_ST
Contributor
Contributor

thanks for the quick responce.

i tried to find how to set the enable resignature on VI4 and only saw this :

<code>[root@****]# esxcfg-volume --help
esxcfg-volume &lt;options&gt;
-l|--list                                List all volumes which have been
                                         detected as snapshots/replicas.
-m|--mount &lt;VMFS UUID|label&gt;             Mount a snapshot/replica volume, if
                                         its original copy is not online.
-u|--umount &lt;VMFS UUID|label&gt;            Umount a snapshot/replica volume.
-r|--resignature &lt;VMFS UUID|label&gt;       Resignature a snapshot/replica volume.
-M|--persistent-mount &lt;VMFS UUID|label&gt;  Mount a snapshot/replica volume
                                         persistently, if its original copy is
                                         not online.
-h|--help                                Show this message.
</code>

when running

esxcfg-volume -l on the original ESX i get :

VMFS3 UUID/label: 49d22e2e-996a0dea-b555-001f2960aed8/VMFS_1

Can mount: Yes

Can resignature: Yes

Extent name: naa.60a98000503349394f3450667a744245:1 range: 0 - 97023 (MB)

from what i understend from this is that lun resignature is turned on on the boot lun

does sxcfg-volume is the only way to set resignature on with VI4 ?

i didnt see another option to set it in the GUI like VI3

regarding your qestion on the F11 i will check and update soon

0 Kudos
IT_ST
Contributor
Contributor

I'm unable to switch to the F11 Console.

I'm getting this error/promt when choosing either ESX 4.0 or Recovery mode at boot (image attached)

5894_5894.JPG

Moreover, the esxcfg-volume does not exist when trying to run it at the aformentioned prompt.

0 Kudos
oschistad
Enthusiast
Enthusiast

Okay, from the screenshot I can see we are probably on the right track here. The next service which should have started is mount-root, which is where the service console file system gets mounted. Up to this point I believe that you are seeing messages from vmkernel.

At this stage you may be able to view the contents of /vmfs/volumes - just type 'ls -l /vmfs/volumes' to see wether it has any VMFS filesystems or not. Or maybe there is no /vmfs at this point since (presumably) you aren't talking to the service console yet. This is new territory, right there Smiley Happy

Anyhow.

There is one thing you can test to check if the problem here is that you are using a cloned lun. You will need an ESX server which didn't boot from the same iSCSI SAN. What you want to do is to present one of your cloned LUNs to this ESX, using SCSI ID 0 (which I am assuming is the ID you created it on), and rescan that ESX servers iSCSI adapter.

If the ESX server discovers your LUN and lists it as a new datastore, using the volume name you assigned to the original VMFS volume, we have pretty much eliminated the volume header as the problem source.

If however your ESX does NOT display the datastore as a new volume, but you can see it in the target list, check your vmkwarning or vmkernel log. It should state that a VMFS was found with an inconsistent header. Alternatively, if your test ESX has been set to enable resignaturing, the volume will have been assigned an auto-created name of the form snap-&lt;UUID&gt;-origname.

Now; if you confirm that the LUN is considered by ESX to have an invalid volume header you have a problem. Even if you resignature the volume you will probably still not have a bootable service console, due to the fact that the COS VMDK location references /vmfs/volumes/&lt;UUID&gt;/.

The only workaround I can think of would be to set disallowsnapshotlun=0 in the advanced configuration for the ORIGINAL ESX installation before you clone it, so that the cloned ESX servers will disregard the inconsistent VMFS header when they boot. However, this is DANGEROUS since you lose the protection against accidentally presented clones of an existing volume. If you run production with this configuration, you WILL get data corruption if you ever present a clone of a VMFS volume alongside its original.

One last note - this final caution is based on ESX 3.5. There may be changes in the multipathing driver of ESX 4.0 which I am not yet aware of, which changes the whole "disallow snapshot lun" problem. But I doubt it. Smiley Happy

0 Kudos
IT_ST
Contributor
Contributor

Hi,

thanks for the great explanation

i will check if i can present my boot lun to another ESX and see if that solved my problem.

how do i set set disallowsnapshotlun=0 on ESX4 ?

0 Kudos
Dave_Mishchenko
Immortal
Immortal

You get the options in the add storage wizard - see page 72 - http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf

0 Kudos
admin
Immortal
Immortal

Hi,

>Moreover, the esxcfg-volume does not exist when trying to run it at the aformentioned prompt.

"esxcfg-volume -r

Thanks

Sudhish

0 Kudos
admin
Immortal
Immortal

Hi,

esxcfg-volume command is the only option to resignature any specific volume and using this command you will be able to resignature the replica volume. Please let us know if this doesn't resolve your problem.

Thanks

0 Kudos
IT_ST
Contributor
Contributor

Thanks for the post,

As I said, when I'm botting the cloned ESX LUN I get some kind of prompt which doesn't have the command esxcfg-volume (see attached screenshot).

I did, however, mapped that lun to a different working ESX server which is not booted from SAN.

I run esxcfg-resignature on it and it still didnt help, furthermore, I tried to even use the CV GUI (host-&gt;configuration-&gt;storage-&gt;add storage...) and tried both option

keep existing signature and resignature still didn't help.

0 Kudos
admin
Immortal
Immortal

Can you please confirm whether the ESX detected the replica volume when you mapped the lun to the same ESX server. This can be found using the option -l

~ # esxcfg-volume -l

VMFS3 UUID/label: 49ee74f1-ed0d40ff-a418-00145e5a8bb7/vol-resignature

Can mount: No (the original volume is still online)

Can resignature: Yes

Extent name: naa.6005076305ffc4930000000000001111:1 range: 0 - 20223 (MB)

After resignature can you mount the replica (source volume is already mounted) and see whether both the volumes were accessible at the same time. And then try to boot the second ESX server using the resignatured replica volume.

Thanks

Sudhish

0 Kudos
IT_ST
Contributor
Contributor

sorry, had a little typo there: meant to write esxcfg-volume -r and not esxcfg-resignature...

anyways, I did esxcfg-volume -l prior to "resignaturing" to see its available and copy the UUID for the esxcfg-vilume -r command

I havent mounted the replica after doing esxcfg-volume -r. I will try it.

0 Kudos
IT_ST
Contributor
Contributor

Tried mounting (esxcfg-volume -m &lt;UUID&gt;)

Didn't help.

0 Kudos
admin
Immortal
Immortal

Using the command "esxcfg-volume -r .

Can you please provide the snippet of commands executed and the output.

(If you are unable to resignature and mount the volume using esxcfg-volume -r command we might have to open a SR and upload the vm-support and array logs)

Once you provide complete set of commands used to create the replica and resignature the volume, I can check with the team and let you know if we need to file the SR.

Thanks

Sudhish

0 Kudos
IT_ST
Contributor
Contributor

I used NetApp's ONTAP webUI to do all the NetApp related stuff.

1. Created a new volume and a new LUN on its root.

2. Mapped that LUN to a HDless Host (we'll call is BFS)

3. Installed a fresh ESX4 Server on that LUN.

4. Created a snapshot after making sure the ESX Server works.

5. Flexcloned the volume in which that LUN resided in using the ONTAP wizard.

6. Mapped the cloned LUN to a different (working) Host (we'll call is ESX).

7."Resignatured" the Snapshot

8. Mapped it back to BFS

9. Booted BFS2 with that LUN

Screenshots

Let me know if you still need more details.

BTW,

There is already an open support request SR#1329029201

0 Kudos
admin
Immortal
Immortal

Thanks for the detailed steps.

There is already an open support request SR#1329029201

If we have an open SR I think we should wait for the fix or an update on the SR.

Thanks

Sudhish

0 Kudos
IT_ST
Contributor
Contributor

The ticket has been closed by VMWare suuport and has not been resolved.

Got a laconic answer that this feature is not supported and that they have no clue regarding a possible workaround.

Would be extremely happy to get some help here if possible.

Thanks guys.

0 Kudos