VMware Communities
trodemaster
Hot Shot
Hot Shot
Jump to solution

Fusion 3.01 breaks OS X Server netboot

With Fusion 2.x and 3.0.0 you could configure a Mac OS X Server virtual machine to boot from the network by configuring the com.apple.boot.plist. When I attempt to boot one of these vm's under fusion 3.0.1 the system fails to boot. It appears that the system stalls the boot process just before the networking would normally load.

I have updated the vm's vmware tools and verified this behavior on two os X 10.6.2 systems. Uninstalling 3.0.1 and reinstalling 3.0.0 returns the vm's to a working state on both systems. I'm passing the following kernel arguments to the system via com.apple.boot.plist

-v rd=*enet rp=nfs:10.0.0.28:/private/tftpboot/NetBoot/NetBootSP0:netboot.nbi/NetInstall.dmg

Boot progress stalls here "finding root"

Please Advise,

Blake

Reply
0 Kudos
1 Solution

Accepted Solutions
HPReg
VMware Employee
VMware Employee
Jump to solution

Setting up a OS X netboot server is fairly easy.

...

Thanks for the pointer.

4) see attachment

Looks like the kernel is unable to find the nic based on this output.

I agree. The key thing in the successful output are these lines:

AppleIntel8254XEthernet: Ethernet address 00:0c:29:4a:58:71

Got boot device = IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/P2P0@11/IOPCI2PCIBridge/S3F0@2/AppleIntel8254XEthernet/IOEthernetInterface

BSD root: en0

netboot: using network interface 'en0'

It shows that in the successful case (Fusion 3), the AppleIntel8254XEthernet driver (which drives the virtual e1000 NIC) loads, whereas in the failing case (Fusion 3.0.1) it does not. Based on this valuable information, I strongly suspect the issue is with #1, i.e. our virtual EFI change.

Let's check this theory:

o Install a fresh Fusion 3.0.1

o Download the EFI ROMs from Fusion 3 which I have made available to you at:

http://ftpsite.vmware.com/download/bgarner/EFI32.ROM

http://ftpsite.vmware.com/download/bgarner/EFI64.ROM

o Modify your VM to use the downloaded EFI ROMs (from Fusion 3) instead of the default ones (from Fusion 3.0.1), by adding these lines to your .vmx file:

efi32.filename = ""

o Power on the VM

If I'm correct, then this time NetBoot will work.

So what did we change in our virtual EFI? In the new EFI, we improved boot time by allowing the Apple bootloader (boot.efi) to load the single .mkext file (kernel + all kexts in one file) whereas in the old EFI the Apple bootloader was loading all these component as individual files.

So I suspect the following: on the small filesystem that you initially boot in your VM, the folder /System/Library/Extensions/IONetworkingFamily.kext/Contents/PlugIns/AppleIntel8254XEthernet.kext (which is being picked up by the bootloader running on top of the old EFI) exists, but for some reason it is not part of the file /System/Library/Extensions.mkext (which is being picked up by the bootloader running on top of the new EFI). You can check the contents of an .mkext file with the mkextunpack utility.

Interestingly, on my physical MacPro2,1 here, the kextstat shows that the AppleIntel8254XEthernet.kext driver is loaded, but for some reason it is not part of /System/Library/Extensions.mkext either. Forcing a rebuild of that file by touching /System/Library/Extensions does not help. I suspect this is because in the AppleIntel8254XEthernet.kext, the Info.plist has its key OSBundleRequired set to Network-Root instead of Root, and to rebuild the .mkext cache, the system calls kextcache with the -l option (local disk boot) only.

So I see 2 possible ways to solve this issue, please let me know if they work for you (don't forget to re-install Fusion 3.0.1 and remove the 2 lines from your .vmx file before you test them!):

1) Change /System/Library/Extensions/IONetworkingFamily.kext/Contents/PlugIns/AppleIntel8254XEthernet.kext/Contents/Info.plist so it says Root instead of Network-Root. Check that the system has re-built /System/Library/Extensions.mkext and reboot the VM. This will only work until a system update resets the content of the Info.plist file (maybe that never happens in your case).

2) Re-build /System/Library/Extensions.mkext yourself, but pass the -n option (network boot) on top of the -l option to kextcache, then reboot the VM. This will only work until the system re-builds /System/Library/Extensions.mkext (maybe that never happens in your case).

To conclude: If, as I believe, you are responsible for generating the .mkext file in this kind of setup, then I believe that on a real Mac, you would need the -n option as well for this setup to work. So far (Fusion 2 and Fusion 3), our virtualization ignored this .mkext file to boot Mac OS, so you have been lucky it worked without the -n option. But as Fusion 3.0.1 improved and now honors the .mkext file, you need the -n option.

I hope it helps,

--

hpreg

View solution in original post

Reply
0 Kudos
5 Replies
HPReg
VMware Employee
VMware Employee
Jump to solution

Blake,

So far I thought only the bootloader could provide NetBoot functionality (i.e. fetch a kernel + kexts from a tftp server, then boot that). And because I know our BIOS-based bootloader (in Fusion 2) and our EFI (in Fusion 3) don't provide any NetBoot infrastructure, we have never tried NetBoot in a VM in-house. So you are in untested territory, and I'm not surprised if we have inadvertently broken that setup.

Your post made me realize that a minimal kernel can also NetBoot, using a scheme similar to boot!=root, which is awesome. Do you have a URL for me to get more info on how to set this up? I would like to reproduce this in-house. Assume that I know nothing about NetBoot: How do I create NetInstall.dmg, how do I setup the tftp server, ...?

Now to your problem: you are telling me this is a regression between Fusion 3 and Fusion 3.0.1. There are only a few things which have changed between these 2 versions, so let's find out which one is the culprit:

1) We changed some boot logic in our virtual EFI so Mac OS X Server guests boot faster. It should not matter because the entity which NetBoots here is the kernel, not the bootloader, so at the time of NetBoot the firmware should be irrelevant. But who knows. So my first question is "do you use EFI", i.e. do you have firmware = "efi" in your .vmx file? If yes, I will give you further instructions to use Fusion 3.0.1 with the virtual EFI from Fusion 3, which will tell us if EFI is the culprit.

2) We changed the network daemon executables on the host from thin 32-bit to fat 32-bit/64-bit. If your box has 64-bit processors, the 64-bit executables are used in Fusion 3.0.1 whereas the 32-bit executables were used in Fusion 3. One thing you can do is install Fusion 3.0.1, and use lipo (a tool which is part of an Xcode install) to replace the fat executables with their respective 32-bit versions. Do this as root:

cd /Library/Application\ Support/VMware\ Fusion

for i in vmnet-cli vmnet-natd vmnet-sniffer vmnet-bridge vmnet-dhcpd vmnet-netifup; do

lipo -thin i386 -output "$i".i386 "$i"

mv -f "$i".i386 "$i"

done

Also which type of networking are you using in your VMs (NAT, bridge, ...)? Did you test with the exact same VM on Fusion 3 and Fusion 3.0.1?

3) We also improved the locking in the vmnet kext, but that should not change the content of the network packets seen by VMs. We will worry about this one if everything else fails.

4) The image you posted is that of a stalled boot. Can you capture the log of a successful boot too? I'm would like to know if the "Waiting on" line is the same. To capture the log of a successful boot, you can let the VM boot, then retrieve the log with "sudo dmesg". If that does not work, you can add these flags to the boot flags "debug=0x14a serial=1" in com.apple.Boot.plist and capture the output of virtual serial port serial0 to a file.

Thanks for reporting this issue, and for helping us to narrow it down.

trodemaster
Hot Shot
Hot Shot
Jump to solution

Setting up a OS X netboot server is fairly easy. I recommend starting with a 10.6.2 Server vm configured with bridge mode networking. Configure netboot services and build a netinstall image based on the 10.6 installer dvd dmg. This guide should cover the configuration details. http://images.apple.com/server/macosx/docs/System_Imaging_and_SW_Update_Admin_v10.6.pdf

For the netboot client use another 10.6.2 server vm with bridged mode networking. Modify the /Library/Preferences/SystemConfiguration/com.apple.Boot.plist to include the kernel flags from the previous post. Make sure you update the IP and the path to the dmg to match what you have setup on the server. You can test the path by mounting it via nfs in the finder.

Answers to your questions.

1) These VM's where created fresh with Fusion 3.0. firmware = "efi"

2) I thinned the vmnet-* to 32bit and tested without success. Verified that they where running 32bit via Activity Monitor. Bridged always as tftp/nfs and nat don't get along. I used the same clean vm to start with on both test systems with 3.0 and 3.0.1

3) possible suspect

4) see attachment

Looks like the kernel is unable to find the nic based on this output.

Thanks for taking a look! The SR# is 1466158191

Blake

Reply
0 Kudos
HPReg
VMware Employee
VMware Employee
Jump to solution

Setting up a OS X netboot server is fairly easy.

...

Thanks for the pointer.

4) see attachment

Looks like the kernel is unable to find the nic based on this output.

I agree. The key thing in the successful output are these lines:

AppleIntel8254XEthernet: Ethernet address 00:0c:29:4a:58:71

Got boot device = IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/P2P0@11/IOPCI2PCIBridge/S3F0@2/AppleIntel8254XEthernet/IOEthernetInterface

BSD root: en0

netboot: using network interface 'en0'

It shows that in the successful case (Fusion 3), the AppleIntel8254XEthernet driver (which drives the virtual e1000 NIC) loads, whereas in the failing case (Fusion 3.0.1) it does not. Based on this valuable information, I strongly suspect the issue is with #1, i.e. our virtual EFI change.

Let's check this theory:

o Install a fresh Fusion 3.0.1

o Download the EFI ROMs from Fusion 3 which I have made available to you at:

http://ftpsite.vmware.com/download/bgarner/EFI32.ROM

http://ftpsite.vmware.com/download/bgarner/EFI64.ROM

o Modify your VM to use the downloaded EFI ROMs (from Fusion 3) instead of the default ones (from Fusion 3.0.1), by adding these lines to your .vmx file:

efi32.filename = ""

o Power on the VM

If I'm correct, then this time NetBoot will work.

So what did we change in our virtual EFI? In the new EFI, we improved boot time by allowing the Apple bootloader (boot.efi) to load the single .mkext file (kernel + all kexts in one file) whereas in the old EFI the Apple bootloader was loading all these component as individual files.

So I suspect the following: on the small filesystem that you initially boot in your VM, the folder /System/Library/Extensions/IONetworkingFamily.kext/Contents/PlugIns/AppleIntel8254XEthernet.kext (which is being picked up by the bootloader running on top of the old EFI) exists, but for some reason it is not part of the file /System/Library/Extensions.mkext (which is being picked up by the bootloader running on top of the new EFI). You can check the contents of an .mkext file with the mkextunpack utility.

Interestingly, on my physical MacPro2,1 here, the kextstat shows that the AppleIntel8254XEthernet.kext driver is loaded, but for some reason it is not part of /System/Library/Extensions.mkext either. Forcing a rebuild of that file by touching /System/Library/Extensions does not help. I suspect this is because in the AppleIntel8254XEthernet.kext, the Info.plist has its key OSBundleRequired set to Network-Root instead of Root, and to rebuild the .mkext cache, the system calls kextcache with the -l option (local disk boot) only.

So I see 2 possible ways to solve this issue, please let me know if they work for you (don't forget to re-install Fusion 3.0.1 and remove the 2 lines from your .vmx file before you test them!):

1) Change /System/Library/Extensions/IONetworkingFamily.kext/Contents/PlugIns/AppleIntel8254XEthernet.kext/Contents/Info.plist so it says Root instead of Network-Root. Check that the system has re-built /System/Library/Extensions.mkext and reboot the VM. This will only work until a system update resets the content of the Info.plist file (maybe that never happens in your case).

2) Re-build /System/Library/Extensions.mkext yourself, but pass the -n option (network boot) on top of the -l option to kextcache, then reboot the VM. This will only work until the system re-builds /System/Library/Extensions.mkext (maybe that never happens in your case).

To conclude: If, as I believe, you are responsible for generating the .mkext file in this kind of setup, then I believe that on a real Mac, you would need the -n option as well for this setup to work. So far (Fusion 2 and Fusion 3), our virtualization ignored this .mkext file to boot Mac OS, so you have been lucky it worked without the -n option. But as Fusion 3.0.1 improved and now honors the .mkext file, you need the -n option.

I hope it helps,

--

hpreg

Reply
0 Kudos
trodemaster
Hot Shot
Hot Shot
Jump to solution

So the OS X Server vm booted properly with the efi .rom files you provided. Using your suggestion #2 I was able to build a kextcache that included the needed nic driver. I'm back in business with fusion 3.0.1!!

Thanks a ton for your help on this issue!!

Blake

Reply
0 Kudos
HPReg
VMware Employee
VMware Employee
Jump to solution

You are welcome (and thanks for the beer Smiley Happy

Reply
0 Kudos