Hi all,
i read the following post
http://www.vmware.com/community/thread.jspa?threadID=65414&start=0&tstart=0
and come to the conclusion,that the solution for scripted install esx with attached san does not satisfy me.
1.) Solution one blocking ports, I don't like because the san team can
forget to block a port and so scripted install can format lun'S
2.) Solution two unpublish the LUN from your SAN Management Software
or change the zoning in the SAN switch same as 1)
3.) Change the device discovery order in BIOS will disable the HBA Boot
BIOS and makes sure the primary HDD is on the internal bus
is although not on option, because this can missed be the person how
installs the esx and it seems it will not work for all vendors and
softwares, like written by Vliegenmepper
"I know for example that when using HP RDP (Rapid Depolyment Pack)
disabling the BIOS is not enough and RDP scripts are still able to
enumerate the SAN LUN's
I agree this is an issue that should have been addressed long ago. We
have a completely automated install yet someone has to trudge out to
the datacenter to disconnect/connect fibre everytime we want to
install/reinstall a host"
4.) Precation to "disable" the HBA or to change the reenumeration of the
devices so local disk get enumerated first, is allthough error prone.
5.) Protecting the SAN by manually pull out the cables from the HBA.
This is the way I do it today by manual install of esx. But this allthough
error prone, because sometimes you forget to püll out the fibercabel
and than you pull out the fibercabels and restart the manual esx
installation, so that the reenumeration of the local harddisk devices
starts with sda.
So now i think the best way is to modify the esx cd sources. If I ran the command vmkload_mod -l like in kb articel
http://kb.vmware.com/KanisaPlatform/Publishing/341/1560391_f.SAL_Public.html
described, than i will find the fc driver. Now if i remove or (rename ??) the hba driver from the esx sources and make a scripted install,
than without the special hba driver for the qlogic cards no lun are seen during the esx install and we will have no reenumeration problems of the local harddisk devices
and the san luns.
In the %post% script the qlogic driver can be copy to the esx server and than installed like described in the above kb articel.
After the reboot the esx system we should see the lun'S
Anyone tried this or has optimzed this or seeing a problem
Meistermn
User smikeyp has created a work around for this using the img files -
http://www.vmware.com/community/thread.jspa?threadID=43958&messageID=673600#673600
Regards
Mike
Could it be that it can be simpler?
If it is possible to unload the fc driver in the init process, than there should
although no luns visibled during install.
So only modifying the init process script ???
Another option may be to disable hardware probing during installation (noprobe on the boot commandline). You should then afterwards load the appropriate drivers during the installation phase by using the device command in your kickstart script.
Some (old) posts on several kickstart forums people also mention a 'nostorage' boot option, which only disables storagehardware probing.
I've never tested this, but it may be worth looking into.
Hey dude, i think the 'nostorage' boot option works only with ESX 2.5.x.
It is rather strange that VMware has not yet patched this.
We lost an entire LUN during an upgrade from 2.5.2. The upgrade of the other farms have been postponed. We are now thinking of an other upgrade procedure.
We have an infrastructure with 192 ESX hosts and do installs and reinstalls with HP-RDP. Now, for every re-installation, we have to ask the Storage guys, via RFC, to unpresent the luns.
For enterprise environments i think this is unacceptable and should have been patched long ago.
If you have HP Servers and using RDP which we do if you define the drives in the cfg using cciss it will install on the local drives
We indeed have HP blades and RDP.
We are using the cciss in the cfg, it works .. but not always.
Sometimes it sees the the first lun as sda instead of the internal harddisk.
It is not known when this happens, that is why vmware advices to detached the fiber or unpresent the luns..
This is the storage part of our .cfg, do you have somewhat the same ?
\# Clear Partitions
clearpart --all --initlabel --drives=cciss/c0d0
\# Partitioning
part /boot --fstype ext3 --size 250 --ondisk cciss/c0d0
part / --fstype ext3 --size 4997 --ondisk cciss/c0d0
part swap --size 544 --ondisk cciss/c0d0
part None --fstype vmfs3 --size 10000 --grow --ondisk cciss/c0d0
part None --fstype vmkcore --size 94 --ondisk cciss/c0d0
part /var/log --fstype ext3 --size 1992 --ondisk cciss/c0d0
Mine is just a little different
clearpart --all --initlabel --drives=cciss/c0d0
part /boot --fstype ext3 --size 256 --asprimary --ondisk cciss/c0d0
part / --fstype ext3 --size 8192 --asprimary --ondisk cciss/c0d0
part swap --size 1600 --asprimary --ondisk cciss/c0d0
part /opt --fstype ext3 --size 2048 --ondisk cciss/c0d0
part /tmp --fstype ext3 --size 2048 --ondisk cciss/c0d0
part /var --fstype ext3 --size 2048 --ondisk cciss/c0d0
part None --fstype vmfs3 --size 10240 --grow --ondisk cciss/c0d0
part None --fstype vmkcore --size 100 --ondisk cciss/c0d0
Just about the same indeed....
According to vmware it is a fault in the qlogic driver, they advice to always disconnect fiber or unpresent the luns.
Do you have the qlogic hba's ?
So you better watchout, you never know when disaster is going to strike.
Hi all and thanks for this interesting discutions.
I come with the same problem and also no to want San teams to be involved in the install process or even the rescue process.
I use RDP and as far as I can see, the problem is done when the Grub job is playing.
This job is based on rdeploy binary that have to deploy an image to the first found partition. By default, this job is done in a Linux environment.
In previous comments, every one seem to converge to the fact that linux environment when it start is able to probe de Lun and affect in some situation the first lun to /dev/sba. It is sure that rdeploy binary will have no intelligence and will use the device on sda without knowing if it is local or san. In this situation, this is so a Linux problem.
For this grub process, we can run it in DOS with a dos boot menu option. This boot menu is not very sofisticated and it wil only present local hd. It is tested and is working well.
Winpe have the same issue than linux pe and present the san first (even with correct bios options).
Disabling PCI devices in bios have not been choose because we have the linux booting process that hang when pci are disabled.
Conclusion to solve the problem :
\- add a DOS boot menu option in your RDP
\- modify your script: Deploy disk Image(GRUB.IMG) to be launch in dos mode in place of linux mode.
We have HP hardware with qlogics. If some customers with other hardware can test on Dell or other and share....
Regards
Régis
Where did you add the dos boot menu?
Sorry,
I go further but i am blocking at a new step during the preparation of the grub menu. At this step, this is still a linux menu that present incorrect drive order.
The dos boot menu is to be add to PXE before it can be use by a job.
I try to see if the vmesx.sh give by Altiris can be change to identify with more certitudes if drive is local or san. Current is just looking in /proc/partitions.
regards
Hi,
I actually removed fiber channel (qla*) drivers from the install disk to ensure that luns won't be harmed with my scripted DVD or PXE installation. (i can so far use sda as primary server disk without any fear of destroying my SAN prod luns)
The used method isn't something beautiful, but works nice for my needs though.
I hope they will upgrade to redhat 4 installation, as it got a nice keyword at boot to disable storage detection "nostorage", and in kickstart script you have another line to add your scsi adapter as "device scsi mptscih:cciss" and so on.
Can you make a howto about removing qla?
where are this drivers located in /lib?
The location is of the qla driver is:
/lib/modules/2.4.21-37.0.2.ELvmnix/kernel/drivers/scsi.
Did you modify the init.rd?
Anyone has a solution for removing the qla drivers???
I recently experimented with "nostorage" "noprobe" - using a drive disk boot image, or trying to modify the modules.img/tgz files in order to remove the qla drivers used by anaconda. Results were poor. Although I haven't given up completely.
I found a work around which I blogged about here:
http://www.rtfm-ed.co.uk/?p=389
Interesting the redhat linux link there - showed a script to workout which LUNs were visable to the HBA, to exclude them. Unfortunately, it wouldn't execute. The "Dude" thinks this maybe simply be an "interpreter" problem - and simply telling the kickstart file which interpretor use would fix this problem. This attractive because it would be away to stop SAN access without having to manipulate the ESX source code - which cause major support issue/politics for most corporates.
I've also logged a post on the private "VMware Education" trainers site, which as I an instructor I have access too - in the hope to garner some thoughts/attitudes from VMware.
To me we seem to moving into another phase of virtualisation - automation. This is being to become a major bug bear for people who work with remote datacenters where physical access is limited - we definitely need a soft-solution to this problem which is dependable as yanking out a cable...
I will keep this thread posted on any future developments...
Regards
Mike
I will try this:
http://www-1.ibm.com/support/docview.wss?uid=tss1wp100564&aid=5
Look at side 45 through the steps 1-23 Red Hat Boot Diskette Modification Process. This is a example for remove network drivers. This should be equal to scsi drivers.
Step 13 - 15 describes removing the old network drivers.
Message was edited by:
meistermn
Regis,
Using DOSpe to push the GRUB.IMG to the local disk is NOT an option since there are no DOS drivers for our scsi controller (BL465c blades).
So we have to use LinuxPe..
Our solution so far is:
Disable the HBA in the BIOS (no IRQ) and delete the qla drivers in the rootfs.gz so LinuxPe cannot load the HBA drivers.
So far no problems yet.
Paul
User smikeyp has created a work around for this using the img files -
http://www.vmware.com/community/thread.jspa?threadID=43958&messageID=673600#673600
Regards
Mike