6 Replies Latest reply on Sep 4, 2015 3:03 PM by patpat

    UEFI PXE boot; Extremely low TFTP performance

    patpat Enthusiast

      Test scenario:


      Host Hardware:HP Elitebook 8470p
      Intel(R) Core(TM) i7-3630QM CPU @ 2.40 GHz, 16GB RAM
      Host OS:Windows 8.1 Pro
      VMware:Workstation 9, 10, 11, 12
      PXE server:Serva
      Boot Manager:Syslinux 6.03
      Client Preboot Environment:EFI 64 (VMware FW)


      When PXE booting under Workstation EFI guests OSs like i.e. ubuntu-15.04-desktop-amd64.iso

      syslinux.efi initially TFTP transfers vmlinuz (6.5 MB) and initrd.lz (22.5 MB).
      These transfers (even when they do not present any error) are extremely slow:


      v9v10v11v12HP 2570p (EFI hardware)
      vmlinuz 6.5 MB8s16s9s9s2s
      initrd.lz 22.5 MB62s121s115s116s27s

      The last column represents a hardware client booting exactly the same OS

      files from the same PXE server.


      I can see that under VMware EFI FW all the TFTP transfers (when handled by syslinux.efi) are very slow.

      Of course syslinux.efi uses EFI FW resources for these transfers, like instances of


      paradoxically the VM Workstation with best UEFI PXE performance is the old v9 but it is

      still far slower than i.e. an EFI notebook like the HP Elitebook 2570p booting the same test OS

      files against the same PXE server.


      I have run Wireshark traffic captures (i.e. Workstation 11)


      43128    235.451388000    TFTP    1454    Data Packet, Block: 16388             Delta 0.015367

      43129    235.466755000    TFTP    46    Acknowledgement, Block: 16388        Delta 0.000128

      43130    235.466883000    TFTP    1454    Data Packet, Block: 16389             Delta 0.019316

      43131    235.486199000    TFTP    46    Acknowledgement, Block: 16389


      were the pattern shows a high delay between the arrived data packet and the sent ACK.

      the reasons for this behavior from an EFI FW point of view could be i.e. not handling correctly

      priority of events; syslinux.efi relies on these events for knowing when a data packet has arrived and

      then send the corresponding ACK


      Please let me know if you guys need more info




        • 1. Re: UEFI PXE boot; Extremely low TFTP performance
          dariusd Virtuoso
          User ModeratorsVMware Employees

          Hi again patpat!


          Which virtual NIC were you using for the tests?  Also, due to a quirk in the way we need to share resources with the host, some vNICs perform better in UEFI when the virtual machine has only one virtual CPU, so it might be informative to try with one vCPU and with more than one vCPU and compare the results.





          • 2. Re: UEFI PXE boot; Extremely low TFTP performance
            patpat Enthusiast

            Hi Darius !

            1. It looks the behavior is independent of the virtual NIC.
              The test was performed using a default ethernet0.virtualDev ("vlance") but I have also tested clients with ethernet0.virtualDev = "e1000e" with similar results.
            2. The test was always performed on single CPU VMware clients. Now I've tested v12 with 2 CPUs and gets better figures on the big transfer when it should've been the other way around right?

            v12 (1 vCPU)
            v12 (2 vCPU)
            vmlinuz 6.5 MB9s10s
            initrd.lz 22.5 MB116s71s


            The results are repeatable.

            Let me know.




            • 3. Re: UEFI PXE boot; Extremely low TFTP performance
              dariusd Virtuoso
              VMware EmployeesUser Moderators

              Hmmm... I did get that the wrong way around: 1 vCPU VMs can perform worse than 2 vCPU VMs during UEFI PXE boot.


              If you add to the VM's configuration file:


                  vnet.recvClusterSize = "1"

              does the 1 vCPU VM's performance improve?

              It probably still won't approach the performance of a physical machine, though.  The architecture of UEFI does not permit us to use interrupts for NIC events -- we must use polling -- and polling is very bad when we need to steal CPU time from the host OS in a shared environment.  We've made a few compromises to achieve a balance between TFTP throughput and not causing too much host CPU usage while the firmware is running (i.e. limiting the polling rate), and those compromises do make it somewhat unlikely that we will be able to match the performance of a native firmware implementation (which realistically can use just as much CPU as it wants... no one will care).

              There's probably more that we can do here to improve our virtual UEFI firmware's TFTP performance if we have the time and opportunity to dig deeper...




              • 4. Re: UEFI PXE boot; Extremely low TFTP performance
                patpat Enthusiast

                Hi there


                                                       v12 (1 vCPU)
                v12 (2 vCPU)
                vmlinuz 6.5 MB9s10s
                initrd.lz 22.5 MB116s71s


                vnet.recvClusterSize = "1"v12 (1 vCPU)
                v12 (2 vCPU)
                vmlinuz 6.5 MB11s13s
                initrd.lz 22.5 MB129s74s


                Unfortunately vnet.recvClusterSize = "1" does not help.


                I have also seen other vnet variables like

                vnet.bwInterval =

                vnet.dontClusterSize =

                vnet.maxRecvClusterSize =

                vnet.maxSendClusterSize =

                vnet.noPollCallback =

                vnet.pollInterval =

                vnet.recalcInterval =

                vnet.recvClusterSize =

                vnet.recvThreshold =

                vnet.sendClusterSize =

                vnet.sendThreshold =

                vnet.useSelect =


                Some of them (i.e. vnet.pollInterval) sound interesting but I wonder if a description and their default value could be available.


                I understand what you say about the need of compromising when considering EFI/NIC/Polling/CPU time/ etc

                but probably the approach can still be improved; today EFI PXE is just too slow.

                i.e. I think probably those FW compromises should not be the same before and after calling ExitBootServices()
                I do not know if you guys today already consider that difference.




                • 5. Re: UEFI PXE boot; Extremely low TFTP performance
                  dariusd Virtuoso
                  User ModeratorsVMware Employees

                  I don't have a ready reference with a description of all the options and their defaults, sorry.


                  After ExitBootServices, the OS "owns" the platform and can do as it chooses, and it is not constrained by the UEFI restrictions -- it is free to reinitialize the PIC/APIC/LAPIC and MSI/MSI-X and use hardware interrupts as it chooses.  The compromises are made solely in the firmware (in the DXE environment, where the firmware's own NIC drivers are in use) and are only relevant prior to ExitBootServices.


                  I've been contemplating setting the firmware up to use interrupts in a way that is "concealed" from the rest of the DXE environment, so that we can achieve better performance while not visibly breaking the UEFI specification...    Haven't actually implemented anything along those lines yet.





                  • 6. Re: UEFI PXE boot; Extremely low TFTP performance
                    patpat Enthusiast

                    well w/o more variables to tweak I can only hope you guys can some how do something about this issue.



                    Something I have noticed, when bootmgfw.efi has to TFTP transfer (windowsize=8) Boot.wim (270 MB)


                    v12HP 2570p (EFI hardware)
                    Boot.wim (270 MB)47s36s


                    OK windowsize=8 means 8 times fewer ACKs but in this case the difference in performance is not so big.

                    I wonder if it could be something else besides driver performance like i.e. the use of


                    (bootmgfw.efi does not use these protocols)