VMware Cloud Community
DFATAnt
Enthusiast
Enthusiast

Scripted kickstart installation using NFS

Hi,

I am setting up a scripted kickstart install using NFS. At the initial connection to the NFS share to get the ks.cfg file, I am being prompted that the network connection cannot be found. After pressing OK several times, I am able to connect to the NFS share and the installation continues without any further problems.

It seems to me that there seems to be a delay in getting a DHCP address. I know that the NFS share is setup correctly as I can connect to it and mount the share from other ESX servers.

I am using a Windows 2003 DHCP server and a Windows 2003 NFS share. I don't have problems with either server at any other times for both the functions mentioned.

Can anyone tell me what I need to do so that I don't get prompted about the network connection.

Thanks

Ant

Reply
0 Kudos
29 Replies
MayurPatel
Expert
Expert

Folks,

I am testing how to deploy bare metal VI3 servers using the ks.cfg method but the focus is on using a pure Windows Deployment solution. I have developed solutions using Altiris to acheive this. What I am doing is putting together a WDS/RIS+NFS setup together but I just can't figure out what I am doing wrong because I just cannot get the installation to proceed with the file copy of the ESX CD contents from the Windows 2003 NFS share. The PXE boot image is able to connect to the NFS share and find and excecute the KS.CFG which is on the NFS share and in the same share I have copied the ESX CD contents.

In the KS.CFG file I have the line nfs --server 192.168.31.44 --dir /nfsshare

I receive a error message:

The VMware ESX Server 3 installation tree in that directory does not seem to match your boot media.

I know I could make it work using the IIS/FTP but I want to use NFS because I have never tried using NFS before. PS. This is all running from a VM running on my MAC Fusion setup.

I have set the relevant NTFS file permissions to Annoymous Logon Users and enable Annoymous Access on the NFS share also.

Any idea what I have forgotton to do?

Reply
0 Kudos
dinny
Expert
Expert

Hiya,

The only time I have seen that message is when I was working on removing the HBA drivers from the ESX install media - so I didn't need to pull the SAN cables each time I rebuilt.

It was because one of the file copies that I had done had omitted some hidden files (ones preceded by a "." - that "cp" defaults to ignoring)

It would surprise me if any of the files visible directly on the CD were of that format - but maybe they are? Worth a check...

Failing that have you pulled anything out of the iso and copied it elsewhere to get your PXE stuff to work?

Maybe that is missing the odd hidden file?

(Out of interest - I'm intrigued as to how you are using a RIS server to PXE build ESX anyway - but I've never looked at the later WDS stuff - so maybe that is how?)

Dinny

Reply
0 Kudos
MayurPatel
Expert
Expert

What I did was simply mount the ESX.iso image file in my VM and copied the contents of the ISO into the nfsshare directory. I can confirm the NFS share contains an exact copy of the ESX ISO image which I have previously used to deploy a ESX server in a VM.

Basically it is not able to excecute the hdstg2.img and netstg2.img files for some reason. I think because the PXE image is able to load and connect to the share and actually run the KS.CFG file there is probably a problem with either the ESX CD files have to be mounted and copied to the folder much like it is done under Linux which might be modifying some attributes of the files or it needs some additional settings in the KS.CFG.

I will post the details of my setup once I have it all working. The WDS I have configured it to deploy Windows OS to VM's and added the RIS to deploy the Syslinux PXE image because at present it is not possible to convert a .iso image into a .wim file which WDS uses to deploy to the client via PXE.

MP

Reply
0 Kudos
dinny
Expert
Expert

Yup - it would have been a problem that I encountered when I rebuilt the netstg2.img file.

I'm not sure it will help in your case - but if you're interested there's a whitepaper of what I did on the xtravirt.com site.

http://www.xtravirt.com/index.php?option=com_remository&Itemid=75&func=fileinfo&id=13

Where I mention doing a tar (as opposed to cp) to copy hidden files was where I found that particular issue.

But as you say - I suspect you have the same error message but a different cause?

Dinny

Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

I have seen this message when I was using a ESX 3.01 boot image and trying to load an ESX 3.02 iso from the NFS share.

It sounds like this isn't your problem though

Reply
0 Kudos
MayurPatel
Expert
Expert

After trying different things I decided to drop the Windows NFS option and configured IIS instead. I can now delivery the ESX files and build the ESX server using RIS. Now I have to overcome how to update the ESX hostname, ip address, nameserver config details because I can't inject a ks.cfg file dynamically with these details. The only way it can be done is to pre-configure ks.cfg files for each new server and place it in the . This I think is very messy.

I was looking at providing these details by adding changing the network --device eth0 --bootproto query and adding a %pre section in the ks.cfg file which opens a new console and prompts for these details. As I am not an expert bash/perl scripter I will have to search for examples which I can modify to acheive this. I tried the autostep and interactive options in ks.cfg but this fails in anaconda.

Reply
0 Kudos
mikef123
Contributor
Contributor

you are probably using the wrong initrd.img and vmlinuz file

Reply
0 Kudos
stumpr
Virtuoso
Virtuoso

I've been working on this recently and found your posting in a google search.

I believe this issue is more prone in a trunked configuration with a switch (vlan tagging). It's possible certain vendor combinations will not see the issue if their negotiation process is fast enough.

In my case, it's Broadcom (bnx) to Cisco Catalyst. The NIC is initialized several times during the PXE-NetInstall process. Essentially anytime the interface is "pumped" by the loader program (bin/loader in the initrd.img), a new trunk negotiation occurs with the upstream switch port. This can take 30-40 seconds. You can see this trunk negotiation if you watch the Cisco switch.

This issue was addressed by other linux vendors via a nicdelay parameter being added in some Anaconda builds ( I found reference to this being resolved in later RHEL3 and RHEL4 updates, but not in RHEL5). However, I reviewed the ESX anaconda source code and nicdelay is not supported in the loader used by ESX's PXE images. You can verify this by performing a 'strings loader' after extracting the PXE initrd.img. As a side note, the linksleep parameter waits for the link status to be "ok". In the trunk negotation issue I'm encountering the link layer is "ok" almost immediately, but the trunk negotation process prevents higher layer protocols until it is complete. In essence, linksleep will not address this issue and is useless.

I will likely resolve my automation build goals by moving to a CDROM based Kickstart install delivered by the onboard remote management card virtual CDROM functionality, though I did submit a bug report to VMware and do hope that it is eventually added to the PXE loader.

I believe this is likely a common issue in larger network environments with ESX. VLAN tagging (trunking) becomes almost impossible to avoid when dealing with nearly a dozen vlans. Also, in my case my ESX environment will be in a remote data center and options such as using a non-trunking switch or reconfiguring the switch port for the installation are not viable options. My goals are to have a minimal installation / rebuild process so cluster nodes can be quickly and consistently rebuilt remotely with no physical intervention on the cluster hardware or the network infrastructure.

I wanted to share what I learned while investigating this issue should someone else travel down this dark road. Smiley Happy

Reuben

Reuben Stump | http://www.virtuin.com | @ReubenStump
Reply
0 Kudos
bill260
Contributor
Contributor

Thought I was having the same problem, followed the link but got a missing page error message. Looks like they moved it close by.

Did some further digging and found that while the symptoms were similar, the cause was different. I was trying to nfs kickstart install x86_64 fedora 7 onto a vmware server vm. Got stalled at the same place with similar errors re: dhcp. Turns out the problem with F7 was not the switch (mine are cheap dumb linksys's) but with the F7 initrd that pxe was installing. Seems the e1000 nic driver was broken. Found this bug on red-hat bugzilla - Anaconda netboot fails to DHCP which lead here

Hope this helps...

Reply
0 Kudos
mikef123
Contributor
Contributor

I have found also that a 2.4 kernel will enumerate the nic cards differently than a 2.6 kernel. I have a Dell 1950 with 2 onboard Broadcom GE nics and 2 add on GE Intel Pro nic's I also have an HP DL360 with the same configuration. A 2.6 kernel will number the nics as "onboard" eth0,eth1 add on pci "eth3 eth4. The 2.4 kernel (esx 2x,3x,rh3) will number the nics "add on" eth0,eth1 onboard eth2,eth3. This drove me crazy for a day. I changed my ks file to reflect this and it works fine.

Mike

Reply
0 Kudos