This wiki is written by Shiva Katta and Krishnaprasad K from Dell Hypervisor Engineering team.

This wiki talks about a specific problem and it’s solutions while using gPXE for operating system deployment over the network.

What's gPXE?

Preboot eXecution Environment (PXE) provides the ability to boot computers using network interface. gPXE is an open-source Preboot Execution Environment implementation and network boot loader. It replaces proprietary PXE ROMs with many functionalities like retrieving data through protocols like HTTP, iSCSI etc. Some of the latest operating systems and hypervisors like VMware ESXi 5.x need gPXE as a pre-requisite for deploying via network. If you already have a legacy PXE implementation, then you can migrate to gPXE by placing gPXE executable on your TFTP server. The PXE capable machines download gPXE via TFTP and instantly become gPXE capable machines.

This document is intended towards datacenter administrators who would be interested in automating the Operating System deployment on multiple servers in parallel.

Setting up gPXE

Follow the white paper posted in Dell Tech Center to setup gPXE for network deployment of operating systems. Though it talks about setting up gPXE specifically for VMware ESXi 5, it is useful for setting up the gPXE environment and thus meeting pre-requisites for deploying various operating systems over the network.

All set to boot from gPXE... What’s next?

While booting servers to PXE and then gPXE, some NICs may fail with an error as below.

Now there may be a question raised "why again requesting for a DHCP IP since it has already got one at the beginning?”. The reason is when the chain loaded gPXE starts up (PXE capable NICs downloads gPXE via TFTP), it issues a fresh DHCP request because it dont see the previous DHCP IP issued for the legacy PXE. The DHCP connection timeout is due to a timing issue w.r.t the NIC firmware when the chainload of gPXE occurs.

Thinking what to do? ... Here are the workarounds!

There are few workarounds to approach this problem. couple of them are described below:-

  1. Modifying the PXE-chainloadable gPXE image
  2. Using gPXE shell prompt

Elaborating the workarounds

 1. Modifying the PXE-Chainloadable gPXE image

This workaround is to recreate the gPXE image with a custom script. The custom script is nothing but a sleep before the NIC queries for the DHCP IP again after gPXE chain load. The detailed steps are as below:-

a. Download the latest gPXE source (1.0.1) tar ball.

b. Uncompress the source as below:-

   ~# tar xvfz gpxe-1.0.1.tar.gz #Assuming gpxe-1.0.1.tar.gz is the downloaded filename.
   ~# cd gpxe-1.0.1/

c. Change '#undef TIME_CMD' to '#define TIME_CMD' in src/config/general.h

d. Create a custom script(say sleep.gpxe) in the src directory with the content as below:-

         echo "Greetings... Running through the custom script..."
         sleep 10 # For some NICs, a sleep of 5 seconds may be good enough. We tested it on couple of Broadcom NICs which required a 10 sec. delay to get an IP
         echo "Fetching DHCP IP for the network adapter"
         ifopen net0
         dhcp net0
e. Recompile the source to create a custom gPXE image with the script included as below:-
   ~# make clean
   ~# make bin/undionly.kpxe EMBEDDED_IMAGE=sleep.gpxe

    Copy the undionly.kpxe created under bin to the TFTP server.

2a. Using gPXE shell

This workaround is to pass the commands via the gPXE shell to fetch the DHCP IP and boot into the gPXE Menu. The below screenshot describes the solution. Press CTRL-B when the screen prompts it right after the connection timeout.That brings you the below gPXE> shell.

2b. Using gPXE Shell prompt - Second Option

This workaround is again based on gPXE shell. If there is no menu.cfg created in your webserver, you may need to manually enter the OS details via the shell as below:-

 gPXE>dhcp net0

 gPXE>kernel -n mboot.c32 http://< WebserverIP >/mboot.c32

 gPXE>imgargs mboot.c32 -c http://< WebserverIP >/boot.cfg

 gPXE>boot mboot.c32

Note that the above commands are specific to VMware ESXi 5.x. It may be slightly different for Linux.

Test your gPXE setup 

If you have chosen woraround 1, then you may see the result as below:-

The intent of this article is to provide workarounds for the specific timeout issue that we see while using gPXE.