VMware Cloud Community
meistermn
Expert
Expert
Jump to solution

Scripted Install of ESX 3.0.1 with attached san and removed fc drivers

Hi all,

i read the following post

http://www.vmware.com/community/thread.jspa?threadID=65414&start=0&tstart=0

and come to the conclusion,that the solution for scripted install esx with attached san does not satisfy me.

1.) Solution one blocking ports, I don't like because the san team can

forget to block a port and so scripted install can format lun'S

2.) Solution two unpublish the LUN from your SAN Management Software

or change the zoning in the SAN switch same as 1)

3.) Change the device discovery order in BIOS will disable the HBA Boot

BIOS and makes sure the primary HDD is on the internal bus

is although not on option, because this can missed be the person how

installs the esx and it seems it will not work for all vendors and

softwares, like written by Vliegenmepper

"I know for example that when using HP RDP (Rapid Depolyment Pack)

disabling the BIOS is not enough and RDP scripts are still able to

enumerate the SAN LUN's

I agree this is an issue that should have been addressed long ago. We

have a completely automated install yet someone has to trudge out to

the datacenter to disconnect/connect fibre everytime we want to

install/reinstall a host"

4.) Precation to "disable" the HBA or to change the reenumeration of the

devices so local disk get enumerated first, is allthough error prone.

5.) Protecting the SAN by manually pull out the cables from the HBA.

This is the way I do it today by manual install of esx. But this allthough

error prone, because sometimes you forget to püll out the fibercabel

and than you pull out the fibercabels and restart the manual esx

installation, so that the reenumeration of the local harddisk devices

starts with sda.

So now i think the best way is to modify the esx cd sources. If I ran the command vmkload_mod -l like in kb articel

http://kb.vmware.com/KanisaPlatform/Publishing/341/1560391_f.SAL_Public.html

described, than i will find the fc driver. Now if i remove or (rename ??) the hba driver from the esx sources and make a scripted install,

than without the special hba driver for the qlogic cards no lun are seen during the esx install and we will have no reenumeration problems of the local harddisk devices

and the san luns.

In the %post% script the qlogic driver can be copy to the esx server and than installed like described in the above kb articel.

After the reboot the esx system we should see the lun'S

Anyone tried this or has optimzed this or seeing a problem

Meistermn

Reply
0 Kudos
1 Solution

Accepted Solutions
Michelle_Laveri
Virtuoso
Virtuoso
Jump to solution

User smikeyp has created a work around for this using the img files -

http://www.vmware.com/community/thread.jspa?threadID=43958&messageID=673600#673600

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com

View solution in original post

Reply
0 Kudos
18 Replies
meistermn
Expert
Expert
Jump to solution

Could it be that it can be simpler?

If it is possible to unload the fc driver in the init process, than there should

although no luns visibled during install.

So only modifying the init process script ???

Reply
0 Kudos
_the_dude_
Enthusiast
Enthusiast
Jump to solution

Another option may be to disable hardware probing during installation (noprobe on the boot commandline). You should then afterwards load the appropriate drivers during the installation phase by using the device command in your kickstart script.

Some (old) posts on several kickstart forums people also mention a 'nostorage' boot option, which only disables storagehardware probing.

I've never tested this, but it may be worth looking into.

ITQPG
Contributor
Contributor
Jump to solution

Hey dude, i think the 'nostorage' boot option works only with ESX 2.5.x.

It is rather strange that VMware has not yet patched this.

We lost an entire LUN during an upgrade from 2.5.2. The upgrade of the other farms have been postponed. We are now thinking of an other upgrade procedure.

We have an infrastructure with 192 ESX hosts and do installs and reinstalls with HP-RDP. Now, for every re-installation, we have to ask the Storage guys, via RFC, to unpresent the luns.

For enterprise environments i think this is unacceptable and should have been patched long ago.

Reply
0 Kudos
sbeaver
Leadership
Leadership
Jump to solution

If you have HP Servers and using RDP which we do if you define the drives in the cfg using cciss it will install on the local drives

Steve Beaver
VMware Communities User Moderator
VMware vExpert 2009 - 2020
VMware NSX vExpert - 2019 - 2020
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
Come check out my blog: [www.virtualizationpractice.com/blog|http://www.virtualizationpractice.com/blog/]
Come follow me on twitter http://www.twitter.com/sbeaver

**The Cloud is a journey, not a project.**
Reply
0 Kudos
ITQPG
Contributor
Contributor
Jump to solution

We indeed have HP blades and RDP.

We are using the cciss in the cfg, it works .. but not always.

Sometimes it sees the the first lun as sda instead of the internal harddisk.

It is not known when this happens, that is why vmware advices to detached the fiber or unpresent the luns..

This is the storage part of our .cfg, do you have somewhat the same ?

\# Clear Partitions

clearpart --all --initlabel --drives=cciss/c0d0

\# Partitioning

part /boot --fstype ext3 --size 250 --ondisk cciss/c0d0

part / --fstype ext3 --size 4997 --ondisk cciss/c0d0

part swap --size 544 --ondisk cciss/c0d0

part None --fstype vmfs3 --size 10000 --grow --ondisk cciss/c0d0

part None --fstype vmkcore --size 94 --ondisk cciss/c0d0

part /var/log --fstype ext3 --size 1992 --ondisk cciss/c0d0

Reply
0 Kudos
sbeaver
Leadership
Leadership
Jump to solution

Mine is just a little different

clearpart --all --initlabel --drives=cciss/c0d0

part /boot --fstype ext3 --size 256 --asprimary --ondisk cciss/c0d0

part / --fstype ext3 --size 8192 --asprimary --ondisk cciss/c0d0

part swap --size 1600 --asprimary --ondisk cciss/c0d0

part /opt --fstype ext3 --size 2048 --ondisk cciss/c0d0

part /tmp --fstype ext3 --size 2048 --ondisk cciss/c0d0

part /var --fstype ext3 --size 2048 --ondisk cciss/c0d0

part None --fstype vmfs3 --size 10240 --grow --ondisk cciss/c0d0

part None --fstype vmkcore --size 100 --ondisk cciss/c0d0

Steve Beaver
VMware Communities User Moderator
VMware vExpert 2009 - 2020
VMware NSX vExpert - 2019 - 2020
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
Come check out my blog: [www.virtualizationpractice.com/blog|http://www.virtualizationpractice.com/blog/]
Come follow me on twitter http://www.twitter.com/sbeaver

**The Cloud is a journey, not a project.**
Reply
0 Kudos
ITQPG
Contributor
Contributor
Jump to solution

Just about the same indeed....

According to vmware it is a fault in the qlogic driver, they advice to always disconnect fiber or unpresent the luns.

Do you have the qlogic hba's ?

So you better watchout, you never know when disaster is going to strike. Smiley Sad

Reply
0 Kudos
noteregis
Contributor
Contributor
Jump to solution

Hi all and thanks for this interesting discutions.

I come with the same problem and also no to want San teams to be involved in the install process or even the rescue process.

I use RDP and as far as I can see, the problem is done when the Grub job is playing.

This job is based on rdeploy binary that have to deploy an image to the first found partition. By default, this job is done in a Linux environment.

In previous comments, every one seem to converge to the fact that linux environment when it start is able to probe de Lun and affect in some situation the first lun to /dev/sba. It is sure that rdeploy binary will have no intelligence and will use the device on sda without knowing if it is local or san. In this situation, this is so a Linux problem.

For this grub process, we can run it in DOS with a dos boot menu option. This boot menu is not very sofisticated and it wil only present local hd. It is tested and is working well.

Winpe have the same issue than linux pe and present the san first (even with correct bios options).

Disabling PCI devices in bios have not been choose because we have the linux booting process that hang when pci are disabled.

Conclusion to solve the problem :

\- add a DOS boot menu option in your RDP

\- modify your script: Deploy disk Image(GRUB.IMG) to be launch in dos mode in place of linux mode.

We have HP hardware with qlogics. If some customers with other hardware can test on Dell or other and share....

Regards

Régis

Reply
0 Kudos
sbeaver
Leadership
Leadership
Jump to solution

Where did you add the dos boot menu?

Steve Beaver
VMware Communities User Moderator
VMware vExpert 2009 - 2020
VMware NSX vExpert - 2019 - 2020
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
Come check out my blog: [www.virtualizationpractice.com/blog|http://www.virtualizationpractice.com/blog/]
Come follow me on twitter http://www.twitter.com/sbeaver

**The Cloud is a journey, not a project.**
Reply
0 Kudos
noteregis
Contributor
Contributor
Jump to solution

Sorry,

I go further but i am blocking at a new step during the preparation of the grub menu. At this step, this is still a linux menu that present incorrect drive order.

The dos boot menu is to be add to PXE before it can be use by a job.

I try to see if the vmesx.sh give by Altiris can be change to identify with more certitudes if drive is local or san. Current is just looking in /proc/partitions.

regards

Reply
0 Kudos
tsugliani
VMware Employee
VMware Employee
Jump to solution

Hi,

I actually removed fiber channel (qla*) drivers from the install disk to ensure that luns won't be harmed with my scripted DVD or PXE installation. (i can so far use sda as primary server disk without any fear of destroying my SAN prod luns)

The used method isn't something beautiful, but works nice for my needs though.

I hope they will upgrade to redhat 4 installation, as it got a nice keyword at boot to disable storage detection "nostorage", and in kickstart script you have another line to add your scsi adapter as "device scsi mptscih:cciss" and so on.

meistermn
Expert
Expert
Jump to solution

Can you make a howto about removing qla?

where are this drivers located in /lib?

Reply
0 Kudos
meistermn
Expert
Expert
Jump to solution

The location is of the qla driver is:

/lib/modules/2.4.21-37.0.2.ELvmnix/kernel/drivers/scsi.

Did you modify the init.rd?

Reply
0 Kudos
meistermn
Expert
Expert
Jump to solution

Anyone has a solution for removing the qla drivers???

Reply
0 Kudos
Michelle_Laveri
Virtuoso
Virtuoso
Jump to solution

I recently experimented with "nostorage" "noprobe" - using a drive disk boot image, or trying to modify the modules.img/tgz files in order to remove the qla drivers used by anaconda. Results were poor. Although I haven't given up completely.

I found a work around which I blogged about here:

http://www.rtfm-ed.co.uk/?p=389

Interesting the redhat linux link there - showed a script to workout which LUNs were visable to the HBA, to exclude them. Unfortunately, it wouldn't execute. The "Dude" thinks this maybe simply be an "interpreter" problem - and simply telling the kickstart file which interpretor use would fix this problem. This attractive because it would be away to stop SAN access without having to manipulate the ESX source code - which cause major support issue/politics for most corporates.

I've also logged a post on the private "VMware Education" trainers site, which as I an instructor I have access too - in the hope to garner some thoughts/attitudes from VMware.

To me we seem to moving into another phase of virtualisation - automation. This is being to become a major bug bear for people who work with remote datacenters where physical access is limited - we definitely need a soft-solution to this problem which is dependable as yanking out a cable...

I will keep this thread posted on any future developments...

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com
Reply
0 Kudos
meistermn
Expert
Expert
Jump to solution

I will try this:

http://www-1.ibm.com/support/docview.wss?uid=tss1wp100564&aid=5

Look at side 45 through the steps 1-23 Red Hat Boot Diskette Modification Process. This is a example for remove network drivers. This should be equal to scsi drivers.

Step 13 - 15 describes removing the old network drivers.

Message was edited by:

meistermn

Reply
0 Kudos
ITQPG
Contributor
Contributor
Jump to solution

Regis,

Using DOSpe to push the GRUB.IMG to the local disk is NOT an option since there are no DOS drivers for our scsi controller (BL465c blades).

So we have to use LinuxPe..

Our solution so far is:

Disable the HBA in the BIOS (no IRQ) and delete the qla drivers in the rootfs.gz so LinuxPe cannot load the HBA drivers.

So far no problems yet.

Paul

Reply
0 Kudos
Michelle_Laveri
Virtuoso
Virtuoso
Jump to solution

User smikeyp has created a work around for this using the img files -

http://www.vmware.com/community/thread.jspa?threadID=43958&messageID=673600#673600

Regards

Mike

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com
Reply
0 Kudos