7 Replies Latest reply on Nov 19, 2018 3:25 PM by patpat

    TFTP problem PXE installing ESXi 6 on BIOS targets.

    patpat Enthusiast

      When PXE installing ESXi 6 on BIOS clients it seems the TFTP requests are not correctly handled.

       

      kernel_bios  = /NWA_PXE/$HEAD_DIR$/mboot.c32
      append_bios  = -c /NWA_PXE/$HEAD_DIR$/BOOT.CFG
      ipappend_bios = 2

       

      For every needed file a TFTP client usually perform a "Read Request" asking for the size of the file (tsize) answered by the

      server with an "Option Acknowledgment" returning "tsize=xxxx", then the client "Aborts" the transfer and checks that the returned

      file size is OK, then the client performs a second "Read Request" this time really transferring the requested file.

       

      On BIOS targets the VMware provided mboot.c32 is the TFTP client in charge of transferring the components listed at BOOT.CFG.

      mboot.c32 consecutively requests the 151 files listed at BOOT.CFG but it "forgets to Abort" those request; next it requests
      those 151 files "again" this time for really retrieving them.

       

      This presents several problems with TFTP servers (i.e. Serva ) that control the number of orphan TFTP transfers in order to prevent
      resource abusive TFTP clients.

       

      On the other hand installing ESXi 6 on UEFI targets presents the classic TFTP initial sequence not leaving behind orphan transfers.

      "Read Request"->

                                    <- "Option Acknowledgment"

      "Abort" ->


      In this case it is VMware BOOTX64.efi (not mboot.c32) the one in charge of the client side of the TFTP transfers.

       

      kernel_efi64   = /NWA_PXE/$HEAD_DIR$/EFI/BOOT/BOOTX64.efi

      append_efi64   = -c /NWA_PXE/$HEAD_DIR$/BOOT.CFG

      ipappend_efi64 = 2

       

      See the attached Wireshark traffic capture

       

      Best,

      Patrick

        • 1. Re: TFTP problem PXE installing ESXi 6 on BIOS targets.
          dariusd Virtuoso
          VMware EmployeesUser Moderators

          Hi again patpat!

           

          Thanks for the awesome detailed post!  It looks like we're using an older version of pxelinux... its PXE close_file function has the following comment attached to it:

          ; XXX: We should check to see if this file is still open on the server

          ; side and send a courtesy ERROR packet to the server.

           

          ... which would explain the lack of any TFTP ERROR packets to abort the connection. 

           

          RFC 1350 - The TFTP Protocol (Revision 2) section 7 clearly that any ERROR packet is only a courtesy and won't be reliably received (no retransmits, etc.).  That doesn't excuse us not even trying to send the ERROR packet -- the RFC does say "... an ERROR packet (opcode 5) is sent." -- but it does mean that the server could reasonably be expected to be resilient against not receiving them.

           

          You might be able to work around the problem by using a newer version of pxelinux.  We bundle (and support) version 3.86, but I believe mboot is at least compatible with pxelinux version 4.0x (although not officially supported at all), and my research here suggests that pxelinux 4.0x does seem to close all of its TFTP connections.

           

          Cheers,

          --

          Darius

          • 2. Re: TFTP problem PXE installing ESXi 6 on BIOS targets.
            patpat Enthusiast

            Hi Darius;

            Nice to see you again.

            RFC1350

            7. Premature Termination

             

              If a request can not be granted, or some error occurs during the

              transfer, then an ERROR packet (opcode 5) is sent. This is only a

              courtesy since it will not be retransmitted or acknowledged, so it

              may never be received. Timeouts must also be used to detect errors.

            The spirit of the paragraph stresses the point of not forgetting the unavoidable timeout control in favor of (unreliably delivered) error messages.
            But not using error messages aborting TFTP transfers when initially asking for file size on TFTP transfers can very quickly lead to resource starvation at server side.

             

            You might be able to work around the problem by using a newer version of pxelinux

            I believe mboot is at least compatible with pxelinux version 4.0x

            Using mixed versions of Syslinux is never good; it lead to all sorts of very hard to troubleshoot issues.

             

            I have tried genuine mboot.c32/pxelinux.0 v3.86 booting XenServer 6.5 and the initial not aborted request is not present at all.
            see the attached Wireshark capture.

             

            Could you please check your
            \VMware-syslinux-3.86\com32\mboot\mboot.c

            Please see if you guys are doing something else besides the main do{} calling zloadfile(*argp, &mp->data, &mp->len);  ?

             

            Best,

            Patrick

            • 3. Re: TFTP problem PXE installing ESXi 6 on BIOS targets.
              TimMann Enthusiast
              VMware Employees

              In case you're still interested...

               

              VMware's bootloader opens all the boot modules to get their sizes, then later opens each one again to actually read it. In the bootloader code, we do close the file, but the pxelinux 3.86 back end does not send a TFTP ERROR packet when that happens; calling close() just makes pxelinux forget about the file.

               

              I'm attaching a patch for pxelinux 3.86 that fixes this issue. It should apply fine to either the syslinux 3.86 source from our ODP ISO (see Download VMware vSphere) or the upstream 3.86.

              • 4. Re: TFTP problem PXE installing ESXi 6 on BIOS targets.
                patpat Enthusiast

                Hi Tim,

                 

                Thanks for providing the patch.

                At the time of reporting this bug I coded a patch myself but I found that the real problem is not the patch,

                but finding the real source code used for creating the actual binaries offered by VMWare.

                For what I was able to see the provided binaries are not made out of compiling the published sources.

                 

                If you have a minute, please provide the binaries that you created when testing your patch.
                Thanks again.

                 

                Best,

                Patrick

                • 5. Re: TFTP problem PXE installing ESXi 6 on BIOS targets.
                  TimMann Enthusiast
                  VMware Employees

                  My patch is for the base pxelinux. It should work on upstream syslinux-3.86 sources, though it's true that I tested it on a very lightly modified syslinux-3.86 tree that we use internally. I believe the sources for that are on our open-source disclosure package ISO image available from https://my.vmware.com/group/vmware/details?downloadGroup=ESXI670-OSS&productId=742 if you want to look.

                   

                  In your earlier message you discovered that our mboot.c32 is not the same as the one in syslinux-3.86. That's because it's actually a very different program that confusingly has the same name. I didn't need to change our mboot to fix this issue. Our mboot itself calls "close" on all the files it opens to check their sizes; it is the base pxelinux that treats the close as mostly a no-op and fails to send an abort. My patch fixes that.

                   

                  If you are interested, you can find recent sources for our mboot here: GitHub - vmware/esx-boot: The ESXi bootloader

                  • 6. Re: TFTP problem PXE installing ESXi 6 on BIOS targets.
                    TimMann Enthusiast
                    VMware Employees

                    p.s. Attached is the pxelinux.0 binary I tested and that we're using on own internal network at VMware.

                    • 7. Re: TFTP problem PXE installing ESXi 6 on BIOS targets.
                      patpat Enthusiast

                      Hi Tim,

                       

                      1)

                      I just tested your pxelinux.0 booting VMware-VMvisor-Installer-6.7.0-8169922.x86_64 using its included mboot.c32

                      Now the install process for every file (jumstrt.gz, useropts.gez, features.gz, ..., imgpayld.tgz) it seems:

                      1. it opens each file 3 times,
                      2. it closes it  once,
                      3. it opens it once more for the real transfer.

                       

                      Please see the attached Wireshark capture section.

                       

                      2)

                      your patch (when working) will still rely on patching pxelinux.0; but

                      1. VMware does not include this file in its distribution.
                      2. It would require for users to use the patched pxelinux.0 probably experiencing some sort of regression or
                      3. just patch and recompile a very old source.

                       

                       

                      When I tried fixing this issue I focused my effort on mboot.c32 instead but I found (as I mentioned before) that the VMware published source did not match
                      the distributed binary (as you just confirmed).

                      I really think having a patched mboot.c32 binary would be the best solution in this case.

                       

                      Best,

                      Patrick