VMware Cloud Community
IT_ST
Contributor
Contributor

Boot from SAN (iSCSI)isn't working for me on ESX4.0: "sh: can't access tty; job control turned off"

We are working with Boot from

San and Installed ESX4.0 GA on a

iSCSI LUN.

After Installation the ESX Server booted with no problems.

We are cloning the iscsi boot LUN in order to provision more

ESX servers.

When doing so and booting

it on a different physical server (exact same h/w though) I get

" sh: can't access tty; job control turned off" error message when

trying to boot.

And the server is not operational.

I must mention that the same process works flawlessly when installing ESX 3.5.

Problems started with ESX4..

FYI: I use QLogic with iSCSI HBA and work with NetAPP storage.

Any idea ?

Reply
0 Kudos
30 Replies
admin
Immortal
Immortal

I don't know why everyone assumed this has to do with volume resignaturing etc.

What do you see when you type ls -l /vmfs/volumes.. What about ls -l /vmfs/devices/disks/<>? Can you also paste the output from esxcfg-scsidevs -l? Do you see the LUN that you are using for your boot-from-SAN?

Can you upload your serial logs or vmkernel logs? There is no point speculating without further logs.

Reply
0 Kudos
IT_ST
Contributor
Contributor

hi,

just to clerify thing this is the current status

When installing ESX 4 on a boot from ISCSI lun everything work fine. Trying to use the installed lun on a different hardware also works fine.

When cloning the lun and trying to boot the Hardware from the cloned lun the os does not load correctly. (as far as we understend Because the LUN was cloned, it gets a new serial number, and ESX sees it as a snapshot and unable to load the Service Console)

a. The init process fails when trying to load ‘vsd' package. According to the messages.log It seems that the boot is stuck because the sysboot does not find the VMDK file.

b. It does not find it because it looks for /vmfs/originaluuid/originaluuid-esxconsole.vmdk while on the disk there is /vmfs/newuuid/originaluuid-esxconsole.vmdk This parameter ( CONSOLE BOOT ) is inside /etc/vmware/esx.conf .

c. Trying to change the esx.conf file did not help since the file is not persistent probably because it is a RAM drive (the change did not persist after reboot)

d. We tried to change the esx.conf file and continue the boot process from the point it stopped, but failed since one of the init script (switchroot) require pid 1 and process name to be ‘init' in order to run.

4. We looked at the file system and could not find the source boot files ('initrd.img'.) so we couldent change them.

5. Playing with the advance install options in the ESX installation wizard did not help.

can someone shed light on the where are 'initrd.img' located in such an install so we can changce them ?

vilayann i will post here the logs that you asked for in a few hours

Reply
0 Kudos
admin
Immortal
Immortal

THis comment from you below " It does not find it

because it looks for /vmfs/originaluuid/originaluuid-esxconsole.vmdk

while on the disk there is /vmfs/newuuid/originaluuid-esxconsole.vmdk

This parameter ( CONSOLE BOOT ) is inside /etc/vmware/esx.conf" indicates that resignauring changed the UUID of the volume and thus caused vsd script to not be able to find the cos vmdk and failed the bootup.

You can try the following steps, but it is obviously quite a hack and wouldn't be a supported trick. Be sure to backup a copy of the initrd.img before you do anything just in case.

You can find initrd.img (or initrd-2.6.*.img) under /boot.. It is usually mounted from a local hard disk. I have attached the sequence of steps I did on a local ESX host.

# ls -l /boot/initrd.img

lrwxrwxrwx 1 root root 25 Jun 8 13:16 /boot/initrd.img -> initrd-2.6.18-128.ESX.img

# file /boot/initrd.img

/boot/initrd.img: symbolic link to `initrd-2.6.18-128.ESX.img'

# file /boot/initrd-2.6.18-128.ESX.img

/boot/initrd-2.6.18-128.ESX.img: gzip compressed data, from Unix, last modified: Wed Jun 10 13:01:09 2009

# cp /boot/initrd-2.6.18-128.ESX.img /tmp/1.gz

# cd /tmp/

# gunzip 1.gz

# file 1

1: ASCII cpio archive (SVR4 with no CRC)

# mkdir blah

# cd blah

# cpio -id < ../1

133222 blocks

# ls -l

total 48

drwxr-xr-x 2 root root 4096 Jun 10 13:11 bin

drwxr-xr-x 2 root root 4096 Jun 10 13:11 dev

drwxr-xr-x 3 root root 4096 Jun 10 13:11 etc

-r-x------ 1 root root 2225 Jun 10 13:11 init

drwxr-xr-x 2 root root 4096 Jun 10 13:11 lib

drwxr-xr-x 2 root root 4096 Jun 10 13:11 proc

drwxr-xr-x 2 root root 4096 Jun 10 13:11 sbin

drwxr-xr-x 2 root root 4096 Jun 10 13:11 sys

drwxr-xr-x 2 root root 4096 Jun 10 13:11 sysroot

drwxr-xr-x 2 root root 4096 Jun 10 13:11 tmp

drwxr-xr-x 5 root root 4096 Jun 10 13:11 usr

drwxr-xr-x 3 root root 4096 Jun 10 13:11 var

# ls -l etc/vmware/esx.conf

-rw------- 1 root root 33420 Jun 10 13:11 etc/vmware/esx.conf

# vi etc/vmware/esx.conf <-- edit the esx.conf here.. Look for the line.

<-----"/boot/cosvmdk = "/vmfs/volumes/4a2d7020-b1d07370-e3b4-001b78bcd79e/esxconsole-4a2d6f6a-d1b1-e238-ea9b-001b78bcd79e/esxconsole.vmdk"

save that to the new path.

# ls -l

total 48

drwxr-xr-x 2 root root 4096 Jun 10 13:11 bin

drwxr-xr-x 2 root root 4096 Jun 10 13:11 dev

drwxr-xr-x 3 root root 4096 Jun 10 13:11 etc

-r-x------ 1 root root 2225 Jun 10 13:11 init

drwxr-xr-x 2 root root 4096 Jun 10 13:11 lib

drwxr-xr-x 2 root root 4096 Jun 10 13:11 proc

drwxr-xr-x 2 root root 4096 Jun 10 13:11 sbin

drwxr-xr-x 2 root root 4096 Jun 10 13:11 sys

drwxr-xr-x 2 root root 4096 Jun 10 13:11 sysroot

drwxr-xr-x 2 root root 4096 Jun 10 13:11 tmp

drwxr-xr-x 5 root root 4096 Jun 10 13:11 usr

drwxr-xr-x 3 root root 4096 Jun 10 13:11 var

# find . | cpio --quiet -c -o > /tmp/initrd-new

# file /tmp/initrd-new

/tmp/initrd-new: ASCII cpio archive (SVR4 with no CRC)

# gzip /tmp/initrd-new

# ls -l /tmp/initrd-new.gz

-rw-rr 1 root root 23663955 Jun 10 13:13 /tmp/initrd-new.gz <-- newly generated initrd.. You can now rename this as /boot/initrd-2.6.18-128.ESX-new.img and use this to boot.

# ls -l /boot/initrd-2.6.18-128.ESX.img

-rw-rr 1 root root 23662343 Jun 10 13:01 /boot/initrd-2.6.18-128.ESX.img

Hope this helps

Reply
0 Kudos
paithal
VMware Employee
VMware Employee

If your volume is re-signatured and if that vmfs volume appear in /vmfs/volumes, you can try 'cosvmdk=<vmdk path>' grub option and see if system boots. Once the system boots, you can change the esx.conf to point to the new vmdk reboot the host.

Reply
0 Kudos
IT_ST
Contributor
Contributor

.

Reply
0 Kudos
admin
Immortal
Immortal

that is odd.. Ls -l /vmfs/volumes is empty!

Can you do a "mount -t vmfs vmfs /vmfs" ? and then do an ls -l /vmfs/volumes?

I thought you mentioned that the volume was resignatured correctly. perhaps it just wasn't mounted? The above will confirm.

Reply
0 Kudos
IT_ST
Contributor
Contributor

It's not empty...

Reply
0 Kudos
admin
Immortal
Immortal

There are actually a couple of ways around this. The easiest is to add the argument:

cosvmdk=/vmfs/volumes/path/to/the/cos.vmdk

into your 'kernel' line of your /boot/grub/grub.conf file. The only thing you need to watch out for here is that you're not sharing COS VMDKs between multiple hosts. If two hosts use the same COS VMDK file, bad things will happen. You can even test this out by just editing the string in grub when ESX is booting up before making the modification to grub.conf.

The other way of doing this is to go along the same steps that you were doing by changing the esx.conf file. After you've corrected the path for /boot/cosvmdk you will need to re-run esxcfg-boot which will write out the new initial ramdisk with the correct parameters. This requires jumping through all of the hoops with esxcfg-volume though which is a pain when you're mass-cloning machines.

Reply
0 Kudos
Tpm73
Contributor
Contributor

exist there any solution if i have this issue with my BOOT LUN?

i am booting from SAN and now there has canged the WWN of the storage: so i get the same error:

1) after a storage virtualisation the WWN of the storage port has changed; and now the host is no more booting

2) i found out: on a other host; there has changed the path to the storage (path1 to the storage port1 is down/defect)

so i thought the host will boot from SAN with path 2 (set in the QLOGIC bios) from the storage port2; but this host also stops now with vsd-mount error!

so for me it seems that if the WWN will be changed to a differnet as this one which was active during the esx host installation the host cannot boot anymore!

any soluton?

thanks!

Reply
0 Kudos
admin
Immortal
Immortal

I don't believe that should make a difference. The path to the COS VMDK in /etc/vmware/esx.conf shouldn't have changed. Either one of two things happened:

- the LUN with your COS is no longer being mounted (you can check in /vmfs/volumes to make sure that it's still there)

OR:

- you're booting off of the wrong LUN which has a different copy of ESX on it.

Regardless, check the path of the COS VMDK in /etc/vmware/esx.conf and then make sure the path is valid in /vmfs/volumes.

Reply
0 Kudos
titaniumlegs
Enthusiast
Enthusiast

I have this figured out, and finally written up.

I would love any feedback.

Share and enjoy!

Share and enjoy! Peter If this helped you, please award points! Or beer. Or jump tickets.
Reply
0 Kudos