VMware Cloud Community
David1975
Contributor
Contributor

ESX 3.5 : Kickstart install doesn't see dhcp server

Hello,

We are trying to deploy our new hp blade c-class servers (blade 680c G5) using a kickstart script which connects to a ftp server.

The ftp service is running on our new VC server and is accessible anonymously.

We work with vlan's but the default is the one where the ftp/dhcp server is one. (pvid 101)

Portfast is enabled on our cisco and hp switches.

Wee boot the esx 3.5 cd via the ILO and launch the command to start the network install :

esx ks=ftp://192.168.1.39/vmware/esxinstall/config/esx01.cfg

We don't mention the nic inside the command (we tried it also with nic specified) so it asks which of the 4 nic's we want to use to connect to the network.

But which we decide, no one can access the network (no dhcp respond).

When I enter a manual ip, we receive after a while that it cannot connect to the ftp server.

We are also running Altiris RDP for our physical servers and I tried to boot up the server in pxe mode and all went fine. It receives an ip and it is waiting to start an install.

Does anyone here deployed esx on hp blade c-class using ftp or knows to what it could be linked? The dhcp server is a windows 2003 which allows either dhcp and bootp clients.

Regards,

David

0 Kudos
21 Replies
konabumm
Contributor
Contributor

i think there needs to be a

ksdevice=eth0

or eth1 something like that - problem is your not sure what esx is going to pick as eth0 - start with cards they seem to get choosen first. I've had the same problem in the past.

0 Kudos
espi3030
Expert
Expert

I just completed my scripted installs. I have to specify all of the following for everything to go "just" right: (at the ESX Server 3 install screen)

1) IP address

2) netmask

3) gateway

4) nameserver

5) ksdevice=eth0

6) ks=http://192.168.1.100/esx302/ks.cfg

All one line, hope this helps.

0 Kudos
David1975
Contributor
Contributor

If you don't put the nic, then you'll get a prompt within the gui after it discovered all nics. We see the 4 nics in the gui and when we select one it will come up that it can't find the dhcp. When you enter a manual ip here, you'll get the same result as when entering the ip address in the begin of the kickstart command

Failed to log into x.x.x.x.: Failed to connect to ftp server.

0 Kudos
espi3030
Expert
Expert

I know this shouldn't matter but is the ESX server you are trying to kickstart on the same subnet as your DHCP server? Also what is the speed of the ports? I had the same issue communicating with my IIS server when not on the same subnet and or not on a GB supported switch. After you type "esx ks=<KS.CFG location>" at some point you can hit F3 and F4 to see some more helpful information.

0 Kudos
grantjo78
Contributor
Contributor

I've just come across teh same problem with some HP BL460c's. Would love to get it resolved!

0 Kudos
lamw
Community Manager
Community Manager

Are you using pxeboot to intially boot up the image to specify where to find the kickstart configuration? Or you booting off an iso and then passing in the arguments? I know with pxeboot & kickstarts, if there's more than 1 physical NIC interface, if it decides to pxeboot off one interface, you're not guaranteed that it will boot off that same interface to locate the kickstart whether that is ftp/nfs source. What I've found with working with 385,585,bl460c,bl680c is if you have more than one interface, either disable it, pull the expansion or have it network up so that all interfaces can see the pxeboot/dhcp server. This will guarantee, that if your kickstart specifies a specific interface, which I believe by default you need to say or else you'll have to go through the menu and then know which interface is active for it to be able to boot into the pxe server & ks source.

Oh, also make sure on the switchport you have STP (Spanning Tree Portfast) enabled, or you'll see timeouts for the port. I've also read about a new feature in the kickstart configuration file that takes a delay time to help with ports that do not have STP. I believe the syntax is "esx delay=n" where n is the number of seconds.

0 Kudos
ac57846
Hot Shot
Hot Shot

This is a know issue with HP C class blades.

The physical NIC changes between the kickstart hase and the redhat kernel phase that follows it.

A typical symptom is that the .ks file can be downloaded from an FTP site but the install media cannot be downloaded from the same FTP site.

There are three resolutions:

1. Use HP Rapid Deploy Pack V3.8 to deploy the ESX server, this is the official HP answer

2. Disable all but one NIC in the server BIOS until the reboot at the end of the install, only usable if you don't have NICs in the Mez slots as the MEZ NICs can't be disabled in the BIOS.

3. Get the .ks file of the FTP server but the install media from the CD, yes it is extremely slow since the media is presented through the ILO remote media.

I have spent several days fighting this one and kept loosing, I haven't been able to pull install media of a nice fast FTP server with HP C class blades, only through the ILO media redirection. Install time has been around four hours rather than 30 minutes.

Al.

0 Kudos
lamw
Community Manager
Community Manager

Yes, you're correct this is a known issue, but I believe it goes beyond just the blades. Even the normal proliant series has this issue, we use kickstart over NFS, so disabling all but one internal NIC and remove any add-on NICs is probably the cleanest solution. You could also get additional network cables and plug them into all the internal NICs the blade sees, I believe by default the bl460c, get up to 2 internal and 2 add-ons. So technically you could just cable up the other internal, disable the add-on and you can build over nfs/ftp. It shouldn't take 4hrs to do a base build, I would look at one of the alternatives describe in the previous post.

0 Kudos
ac57846
Hot Shot
Hot Shot

I haven't kickstart built ESX on a recent Proliant, I wouldn't be surprised if they had he same issues. Disappointed but not surprised.

Since the blades are at a remote site it is not possible to pull out mez cards every time we want to rebuild, so I can't get down to a single NIC.

All six NICs are cabled and I am getting the .ks file downloaded from the FTP server, so physical & IP connectivity is good. The first error on the F3 console is about not connecting to the FTP server for one of the .img files. This happens whichever of the NICs I choose to use as KSDevice.

Smiley Happy

0 Kudos
lamw
Community Manager
Community Manager

Interesting, I'm wondering if theres something wrong with the ks.cfg? I've never done an FTP build, but I assume it should be similiar to an NFS build, possibly something on the FTP side? Have you checked the logs? Does alt+f4 give any additional information, possibly permission or access denied. Do you specify which interface you default want to use in the "ksdevice" if you have all NICs hooked up and they're all on the same VLAN, technically your build should be okay going through either FTP/NFS. Yea it's very unfortunate this is an issue with pxeboot/kickstart + hp hardware, solaris jumpstart's don't run into this issue with their unatteneded installation, and if you do, it's relatively painless to resolve.

0 Kudos
ac57846
Hot Shot
Hot Shot

The FTP server necer seesthe request for the .img file, it does show the .ks being downloaded. The .ks is fine, if change from using a URL to download source to using CDROM it is fine & the URL is good too. The issue is specific to the HP Blades, it's been confirmed by the local product manager who provided two of the soultions I listed.

Glad to know it doesn't hurt Solaris, but I'd expect it to hurt Redhat.

Unfortunately I don't have a C class of my own so can't do more testing.

0 Kudos
EXPRESS
Enthusiast
Enthusiast

Okay, i am having this same problem, but i have not seen any resolutions.... What was done to get it fixed. I have HP 460c, i tried floppy, ftp, http and nothing. how did you get it to work PLEASE share it with me. VM support can't give me an answer either, this is very frustrating....

Thanks for your help in advance....

Thank you, Express
0 Kudos
ac57846
Hot Shot
Hot Shot

My resolution was to use the CDROM as the source for the install files, this was very slow over remote media, and only putt the Kickstart file from the FTP server.

The official HP answer is to use the Rapid Deploy pack which is HP's branded version of Altiris deployment.

Al.

0 Kudos
EXPRESS
Enthusiast
Enthusiast

Thanks for the reply ac57846, but thats where I am having the problem. I use the cdrom for the install and have been trying to use either ftp, http, a: drive but to no avail. I can run the install from the cdrom and it runs fairly quick, but i am unable to get the ks to go.

Thank you, Express
0 Kudos
ac57846
Hot Shot
Hot Shot

If the ks file is not being used from any source I'd be concerned that the file is corrupt.

I would suggest doing a manual build and then recovering the file /root/anaconda-ks.cfg from the built server. This is the actula ks file used buring the manual build.

Other things to look at are network configuration and errors on the third & fourth pseudo terminals, use <ALT><F3> and <ALT><F4> to switch to these terminals while the build is progressing.

0 Kudos
EXPRESS
Enthusiast
Enthusiast

I have manually installed ESX 3.5 with update1, and setup the ks.cfg from there. I have tried all the different methods, floppy, ftp, http but nothing connects. I have installed the ESX3.5 without the update as well with the same results. I am think it must be the way the server are setup or the network but i dont know where to look, the help here at my place isnt that great so I am reaching out to you guys for anything you can give me.

The servers are HP Proliant BL 460c G1, I am attaching to them via the ILO Intergrated Remote Console. The CDROM attaches with no problem, the a:\ seems to connect I see the light reading on my hardware, but the ESX doesn't seem to see it to mount the drive. If it is a network issue where should I look the HP server.

This is driving me nuts....

Thanks again in advance for you help...

Thank you, Express
0 Kudos
lamw
Community Manager
Community Manager

You might want to verify that the DHCP Server is configured correctly, and also verify the ESX Server is sitting on the same subnet that can talk to the DHCP Server. One easy way to test if you can, bring your laptop and plug the cable the ESX is plugged into to verify that your laptop gets a DHCP address, if you do then it should be setup properly and that points back to how your ks.cfg is configured. Also when you do a kickstart, you'll want to boot from your iLO the "boot.iso" this is the initird that has the micro image from where you pass in the kickstart parameters and whether you specify static or dhcp address for the build. You might want to provide your ks.cfg as an attachment to verify that it looks good and if you've been editing from a windows machine, you might want to run "dos2unix" on UNIX/Linux box to verify you don't have any weird space/return carriages from Windows envrionemtn becuase this is parsed in a UNIX/Linux environment. We also use BL460c blades with 3.5u1 and it works great and our ks.cfg is based on the 3.x ks.cfg, there are no differences with respect to the vanilla installation of ESX.

0 Kudos
EXPRESS
Enthusiast
Enthusiast

Unfortunately I can't do the laptop testing since the servers are located offsite. Our servers group stated to me that they enabled the DHCP on the severs just now. I just tried doing an ftp and I got "Failed to log into 10.10.10.10: Failed to connect to FTP server". Attached is my ks file.

Thanks

Thank you, Express
0 Kudos
lamw
Community Manager
Community Manager

I would say it might be a networking issue ... I would rather verify that you can boot up and actually obtain an address, download something like knoppix and mount up the iso and just boot the blade. The system should boot the live cd and see if you get network connectivity from the system ... it'll be a little slow via the iLO but at least you'll know that DHCP is either working or not working. Also I see that in your ks.cfg, you're passing in static IP vs. using DHCP to obtain the address for the initial build. Either case, verify that DHCP is working before proceeding, could save you more time.

0 Kudos