VMware Cloud Community
DFATAnt
Enthusiast
Enthusiast

Scripted kickstart installation using NFS

Hi,

I am setting up a scripted kickstart install using NFS. At the initial connection to the NFS share to get the ks.cfg file, I am being prompted that the network connection cannot be found. After pressing OK several times, I am able to connect to the NFS share and the installation continues without any further problems.

It seems to me that there seems to be a delay in getting a DHCP address. I know that the NFS share is setup correctly as I can connect to it and mount the share from other ESX servers.

I am using a Windows 2003 DHCP server and a Windows 2003 NFS share. I don't have problems with either server at any other times for both the functions mentioned.

Can anyone tell me what I need to do so that I don't get prompted about the network connection.

Thanks

Ant

Reply
0 Kudos
29 Replies
VirtualKenneth
Virtuoso
Virtuoso

I've experienced the same things while doing FTP installs.

What type of switches are you connected to?

Cisco? Try enabling "PortFast"

Other switches? Try disabling "STP"

While starting, ESX loads the NIC driver and this causes the switch port to reset. This port reset initializes STP again and this can take up to 20/30 seconds while the NFS transfer tries to start. So in first it will timeout and after the 20/30 seconds period it will work

Reply
0 Kudos
virtech
Expert
Expert

I am having this exact same problem and its driving me crazy!

IBM server using broadcom nics connected to Cisco switch.

Portfast (STP) enabled on port.

Portfast has helped but I still am getting a DHCP time out. I have also tired adding the following in my isolinux.cfg.

ksdevice=eth0 nicdelay=50 linksleep=50 but this hasn't help either.

It would appear the the nic is recycled up to three time during the setup phase before connecting to the NFS share.

Anything else I can try? scratching my head with this one.

Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

We are using Cisco switches and we have PortFast enabled.

I have experienced the same problem when using ftp to source the installation files (which is why I am trying NFS, hoping that the problem will go away - unfortunately it hasn't).

I am wondering whether other people (other than Mooihoek) have experienced this problem, and if so, what did they do to resolve the problem (other than enabling PortFast).

Ant

Reply
0 Kudos
VirtualKenneth
Virtuoso
Virtuoso

Interesting, I recently created a new deployment with RDP again (running Cisco switches) and it worked like with default settings.

Another site didn't worked with default settings (I left the problem at the network department) so it could also have something to do with networking "behind" the first switch I guess

Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

The ESX server and the DHCP and NFS server are all connected to the same switch, so there is nothing "behind" the first switch.

I have been doing all this in my dev lab as a "proof of concept", so that when I get it right I can do it on my production network (which will have multiple switches and VLANs). We will be using juniper switches in the prod network, so I am trying to source a couple of servers to do a test there to see if the problem is only with the cisco switch in the dev lab.

Ant

Reply
0 Kudos
virtech
Expert
Expert

What position are you getting stuck with your NFS install?

Reply
0 Kudos
virtech
Expert
Expert

There seems to be some trunking in place behind the cisco switch I am connected to so can only assume this is causing the DHCP time out problems I am seeing. If I connect both IBM servers to a local switch all is ok.

Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

I am getting prompted that the NFS share cannot be found when the installation looks for the ks.cfg file. If I wait for around 30 seconds and click OK, the NFS share is found and the installation continues without anymore prompts. I am fairly sure that DHCP is the issue.

Ant

Reply
0 Kudos
virtech
Expert
Expert

Yep Im pretty sure you have the same problem as myself! Dont have an answer for you at the moment im still looking into it.

Things I have tried.

Enable Portfast/STP

Disable Keepalive

Set port to 1000GB Full Duplex

Check out the following link with is useful.

http://fedoraproject.org/wiki/AnacondaNetworkIssues

BUGCHK
Commander
Commander

I have a dim memory that this problem is documented in VMware's KB:

No current workaround except for skipping it py pressing OK.

A fix is planned for a future version.

Unfortunately I could not find the article again Smiley Sad

Reply
0 Kudos
skearney
Enthusiast
Enthusiast

Check out the following link with is useful.

http://fedoraproject.org/wiki/AnacondaNetworkIssues

Nice link. thanks!

Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

Thanks for the link. It has some information that could be helpful.

It mentions a bug with the Broadcom nics on the HP servers. It just so happens that I am using HP DL380 G4 and G5 servers that have the Broadcom nics.

Reply
0 Kudos
Joe_Wulf
Contributor
Contributor

I am having a similiar problem. The OS I'm using to put out via kickstart is RHEL AS4u5 32bit. Though it doesn't matter which OS I use as the OS to put out via kickstart. Same problem with any version of the 32 and 64 bit versions of RHEL AS4, as well as Fedora 7 and RHEL5.

The anaconda portions of the boot (with DHCP and tftp) go very well. I can see in VC3 and VC4 that the booted system gets all the right network attributes and have carefully verified them (hostname, IP, GW, dns, NM, broadcast, network, next_server, and domain). In VC3, eventually, I get the error message "reverse name lookup failed", then it says "failed to mount nfs source". It displays a url and file location, both of which are, in fact, correct. Efforts at toying with various anaconda options for the pxeconfig isolinux file provide no improvement.

On the kickstart server, I configure it for use of a MAC address for another system that I've successfully kickstarted before (within dhcp and the tftp/pxeconfig link), boot that other system and the installation goes perfectly to completion. The nfs mount occurs instantly here, where it times out and fails on the other server. I switch the MAC address back on the kickstart server to the problem box (no other changes to this world) and the NFS fails at the same point again. When manually attempting to continue the build interactively, the information is given for the NFS server (192.168.10.2) and the directory (/kickstart/ks-files/ks.cfg) but always results in the error message "That directory could not be mounted from the server". Yet a manually built VM, on the same physical server where the kickstarted VM is can immediately mount the exact same path without fail.

The physical network is a 4-port Linksys hub (yes hub), 1 PC with Fedora 7 installed and configured for the kickstart server, one Dell XPS laptop with VMware Workstation v6 installed (32 bit VM's kickstarted into here always install successfully) and a MacPro with Windoze XP 64 bit installed and it also has VMware Workstation v6 installed (32/64bit VM's kickstarted into here ALWAYS exhibit the problems outlined above). Regardless of isolating this hub from the internet or making it standalone, the results are the same.

To do other troubleshooting, I've 'shared' the C: drive of the MacPro, mapped it into a drive letter on the Dell, pointed VMware on the Dell to the mounted filesystem on the Mac for the 'problem' VM and it boots, installs to completion 3 times in a row. I try it natively on the Mac again, with failure as before. I've temporarily replaces Windoze XP on the Mac with Windoze 2003, and the VM died at the same place.

Also, I can successfully manually build, via ISO image any RHEL AS4, RHEL5, Fedora 7 OS I want, either 32 and/or 64 bit on the MacPro. Subsequently to the build, NFS mounts work correctly.

Unfortunately, entries to the RedHat kickstart mailing list didn't provide any useful help. One suggestion was to switch over to squid and apache on the kickstart server and 'try' that. Not good suggestions for me in resolving an NFS problem within anaconda. My customer has a pretty widely installed base deployed via NFS based kickstart servers. Nor am I familiar with squid/apache and their complexities.

Reviews of /var/log/messages on the kickstart server show only normal DHCP handshacking. No DNS, DHCP or NFS errors. Any suggestions on what I can do to temporarily elevate the volume of messages those services provide into syslog, or some other log file for review for problems that I'm not aware of yet??? Any suggestions on what I can do to further troubleshoot this NFS mount problem?

R,

-Joe

R, -Joe Wulf, RHCSA(RHEL6), FITSP-D, CISSP, VCP3, CPO(USN/RET) Senior IA Design Engineer/ISSE
Reply
0 Kudos
Joe_Wulf
Contributor
Contributor

I was wondering........ if anyone else has a thought on this, please chime in....... is it possible that Fedora Core 5, an older OS could be more stable or 'better' for doing this than Fedora 7?

R,

-Joe

R, -Joe Wulf, RHCSA(RHEL6), FITSP-D, CISSP, VCP3, CPO(USN/RET) Senior IA Design Engineer/ISSE
Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

It's definitely a timeout issue. The funny thing is that it is now intermittent for me. Sometimes it works without failing (using the same scripts and NFS share). Other times it fails and I have to press ENTER several times (using the same information that it failed with) to get the automated process moving again.

I have given up trying to resolve the issue and just watch my scripted build for the first 5 minutes until it reaches the point where it could fail. If it fails, I get it started again and walk away. If it doesn't fail, I scratch my head and wonder why it didn't fail as I walk away.

Cheers

Ant

Reply
0 Kudos
Joe_Wulf
Contributor
Contributor

Thank you.

For me it is looking more and more like a DNS timeout issue, but not intermittent. A few weeks ago, I did get a couple of kickstarted VM's to build to completion, but NEVER again. Now EVERYone fails with "reverse lookup name failed" as seen on VC3.

I've examined my DNS config. Its on Fedora 7, manually built, so I might have an error or two in there. Am working that angle at the moment, in my spare time, of course. What is funny is that I can do a nslookup and sometimes the domain is there, other times it times out.

R, -Joe Wulf, RHCSA(RHEL6), FITSP-D, CISSP, VCP3, CPO(USN/RET) Senior IA Design Engineer/ISSE
Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

It sounds like you're using the hostname. I ended up using the IP address of my NFS share to ensure that I didn't have any DNS issues.

Reply
0 Kudos
Joe_Wulf
Contributor
Contributor

My "/tftpboot/linux_install/pxelinux.cfg/isolinux.cfg" file is below:

default linux

label linux

kernel RHELAS4u5_x32/vmlinuz

append {elements below are actually all on THIS line, seperated by whitespace}

console=tty0

ip=dhcp

initrd=RHELAS4u5_x32/initrd.img

vga=788

ks=nfs:192.168.10.2:/kickstart/ks-files/ks.cfg

ksdevice=eth0

noipv6

ramdisk_size=16384

Again I can see in VC3 that the IP address and general resolution are succeeding, but then it just HANGS there for a while, apparently doing nothing until the "reverse name lookup failed" error appears.

R, -Joe Wulf, RHCSA(RHEL6), FITSP-D, CISSP, VCP3, CPO(USN/RET) Senior IA Design Engineer/ISSE
Reply
0 Kudos
DFATAnt
Enthusiast
Enthusiast

I think my problems are caused by the switches that we use (which are Nortel). I built an ESX server yesterday and the script went through without stopping, yet a couple of days ago using the same script process, the script stopped saying the NFS mount point could not be found. Like usual, I pressed ENTER several times accepting the same NFS information and the script continued on.

Having pointed the finger at the Nortel switches, the same problem occurred when we using Cisco switches, so maybe I should be pointing the finger at the switches in combination with the Anaconda process.

Reply
0 Kudos