mirror of
https://github.com/danderson/netboot.git
synced 2025-10-15 17:41:38 +02:00
pixiecore: update the boot walkthrough to include iPXE and UEFI.
This commit is contained in:
parent
fef14c66e8
commit
7929a15c6a
210
pixiecore/README.booting.md
Normal file
210
pixiecore/README.booting.md
Normal file
@ -0,0 +1,210 @@
|
||||
# How it works
|
||||
|
||||
Pixiecore implements four different, but related protocols in one
|
||||
binary, which together can take a PXE ROM from nothing to booting
|
||||
Linux. They are: ProxyDHCP, PXE, TFTP, and HTTP. Let's walk through
|
||||
the boot process for a PXE ROM.
|
||||
|
||||
"];
|
||||
|
||||
ProxyDHCP1 -> PXE [label=< <i>UEFI</i> >, fontsize=11];
|
||||
ProxyDHCP1 -> TFTP [label=< <i>BIOS</i> >, fontsize=11];
|
||||
PXE -> TFTP;
|
||||
TFTP -> ProxyDHCP2;
|
||||
ProxyDHCP2 -> HTTP;
|
||||
}
|
||||
)
|
||||
|
||||
## Step 1: DHCP/ProxyDHCP
|
||||
|
||||
The first thing a PXE ROM does is request a configuration through
|
||||
DHCP, with some additional PXE options set to indicate that it wants
|
||||
to netboot. It expects a reply that mirrors some of these options, and
|
||||
includes boot instructions in addition to network configuration.
|
||||
|
||||
The normal way of providing these options is to edit your DHCP
|
||||
server's configuration to provide them to clients that identify
|
||||
themselves as PXE clients. Unfortunately, reconfiguring your network's
|
||||
DHCP server is tedious at best, and impossible if you DHCP server is
|
||||
built into a consumer router, or managed by someone else.
|
||||
|
||||
Pixiecore instead uses a feature of the PXE specification called
|
||||
_ProxyDHCP_. As you might guess from the name, ProxyDHCP is not a
|
||||
proxy at all (yeah, the PXE spec is like that), but a second DHCP
|
||||
server that only provides PXE configuration.
|
||||
|
||||
When the PXE ROM sends out a `DHCPDISCOVER`, it gets two replies back:
|
||||
one containing only network configuration from the primary DHCP server
|
||||
(no PXE options), and one containing only PXE DHCP options from the
|
||||
ProxyDHCP server. The PXE firmware combines the two, and continues as
|
||||
if the primary server had provided all of the configuration.
|
||||
|
||||
The client will finish network configuration with the primary DHCP
|
||||
server (we're not involved with that), and will then proceed with the
|
||||
next steps of booting.
|
||||
|
||||
## Step 1.5: PXE-ish
|
||||
|
||||
For classic BIOS clients, the ProxyDHCP response points to a TFTP
|
||||
server and filename, and we go straight to step 2. For UEFI firmwares,
|
||||
however, there's an additional step.
|
||||
|
||||
Sadly, many UEFI firmwares in the wild don't implement PXE properly,
|
||||
and fail to chainload correctly if you send then a ProxyDHCP response
|
||||
pointing directly to a TFTP server.
|
||||
|
||||
To get UEFI clients to boot reliably, we need to send them a ProxyDHCP
|
||||
response that is invalid according to the PXE
|
||||
specification. Specifically, a reply that lacks DHCP option 43 (PXE
|
||||
Vendor Options).
|
||||
|
||||
Once the UEFI client has configured its network, it will then send a
|
||||
DHCPREQUEST packet to port 4011 of our ProxyDHCP server. This is the
|
||||
"PXE Boot Server" port, another relatively obscure part of the PXE
|
||||
specification that allows PXE firmwares to display boot menus
|
||||
natively, among other things.
|
||||
|
||||
Like our ProxyDHCP response, the PXE boot request and response in this
|
||||
exchange are not valid according to the PXE specification, since they
|
||||
both lack DHCP option 43 but include other PXE-specific options. Our
|
||||
response to this request is essentially what we told BIOS clients in
|
||||
step 1: here's a TFTP server and filename, go boot that.
|
||||
|
||||
So, UEFI clients need to do this little indirection before catching up
|
||||
with its BIOS cousin.
|
||||
|
||||
### What is this strange protocol?
|
||||
|
||||
I haven't fully verified this yet, but the protocol seems to be
|
||||
"BINL", a Microsoft proprietary fork of PXE that was introduced in the
|
||||
early days of EFI.
|
||||
|
||||
There's no public specification for this protocol, but there is an
|
||||
open-source implementation of a BINL client in the form of the
|
||||
TianoCore EDK2 UEFI firmware. We can also examine packet captures of
|
||||
machines being booted by "Windows Deployment Services", the service
|
||||
that performs network installation of windows, and see that they use
|
||||
this protoocl.
|
||||
|
||||
Both of these secondary sources strongly indicate that what we're
|
||||
actually doing here is telling the UEFI client to use BINL in our
|
||||
ProxyDHCP response, and then telling it to use TFTP in our BINL
|
||||
response.
|
||||
|
||||
Modern UEFI firmwares (e.g. OVMF, derived from the TianoCore codebase)
|
||||
support both standard PXE and this BINL variant, if BINL is what this
|
||||
is. However, many firmwares that are still shipping in new devices
|
||||
seem to only support BINL, which makes BINL the lowest common
|
||||
denominator that has the best chance of booting all UEFI clients.
|
||||
|
||||
This is a somewhat sad state of affairs given that Intel provides an
|
||||
open-source reference UEFI implementation that has supported PXE for a
|
||||
long time. However, industry practice seems to be to maintain
|
||||
seldom-to-never updated private forks of TianoCore, with extensive
|
||||
non-public modifications. As a result, it's likely we'll be stuck in
|
||||
this situation for a long time to come.
|
||||
|
||||
## Step 2: TFTP
|
||||
|
||||
TFTP is, as the name suggests, a trivial protocol for transferring
|
||||
files. I have found some PXE ROMs that manage to add unnecessary
|
||||
complexity even to that, but by and large, this step is
|
||||
straightforward.
|
||||
|
||||
However, TFTP is quite slow, because it doesn't support transfer
|
||||
windows (well, it does, but it's an extension defined in an RFC
|
||||
published in 2015, so guess how many PXE ROMs implement it...). As a
|
||||
result, you must pay one round-trip per ~1500 bytes transferred, and
|
||||
even on a gigabit network, that slows things down.
|
||||
|
||||
Given that some netboot images are quite large (CoreOS clocks in at
|
||||
almost 200MB), what we really want is to switch to a more efficient
|
||||
protocol. That's where iPXE comes in.
|
||||
|
||||
iPXE is a small bootloader that knows how to boot Linux kernels, and
|
||||
can speak HTTP. iPXE is between 50kB and 900kB (depending on the
|
||||
architecture and BIOS vs. UEFI), which even over TFTP is very fast to
|
||||
transfer.
|
||||
|
||||
Thus, Pixiecore uses TFTP only to transfer iPXE, and from there steers
|
||||
to HTTP for the rest of the loading process.
|
||||
|
||||
## Step 3: ProxyDHCP, again
|
||||
|
||||
Unlike some other bootloaders like PXELINUX, iPXE does not reuse the
|
||||
firmware's preexisting network settings. Instead, it starts the
|
||||
process all over again with a DHCP request. Again, we send it a
|
||||
ProxyDHCP response.
|
||||
|
||||
To break the infinite loop here, we can detect in the DHCP request
|
||||
that the client is iPXE, and so we serve up a different response, one
|
||||
that just points to an HTTP URL as the boot filename. iPXE interprets
|
||||
this as a script (a sequence of iPXE commands, with minimal control
|
||||
flow) that it should download and run.
|
||||
|
||||
One more catch is that iPXE has a race condition: when configuring
|
||||
DHCP, if it receives the regular DHCP response before the ProxyDHCP
|
||||
response, it will quickly finish configuring the network... and then
|
||||
complain that it has no boot instructions. To counteract this, we
|
||||
embed an iPXE script in the iPXE binary itself, telling it to retry
|
||||
network configuration until it gets a boot filename out of it. So,
|
||||
we're actually chainloading from one iPXE script (embedded) to another
|
||||
(from HTTP).
|
||||
|
||||
## HTTP
|
||||
|
||||
We've finally crawled our way up to the late nineties - we can speak
|
||||
HTTP! Pixiecore's HTTP server is wonderfully familiar and normal. It
|
||||
just serves up a trivial iPXE script telling it to boot a Linux
|
||||
kernel, and the user-provided kernel and initrd files.
|
||||
|
||||
iPXE grabs all of that, and finally, Linux boots.
|
||||
|
||||
## Recap
|
||||
|
||||
This is what the whole boot process looks like on the wire.
|
||||
|
||||
### Dramatis Personae
|
||||
|
||||
- **PXE ROM**, a brittle firmware burned into the network card.
|
||||
- **DHCP server**, a plain old DHCP server providing network configuration.
|
||||
- **Pixieboot**, the Hero and server of ProxyDHCP, PXE, TFTP and HTTP.
|
||||
- **iPXE**, an open source [bootloader](http://ipxe.org).
|
||||
|
||||
### Timeline
|
||||
|
||||
- PXE ROM starts, broadcasts `DHCPDISCOVER`.
|
||||
- DHCP server responds with a `DHCPOFFER` containing network configs.
|
||||
- Pixiecore's ProxyDHCP server responds with a `DHCPOFFER` listing a TFTP file (BIOS) or BINL options (UEFI).
|
||||
- PXE ROM does a `DHCPREQUEST`/`DHCPACK` exchange with the DHCP server to get a network configuration.
|
||||
- (UEFI only) PXE ROM sends a `DHCPREQUEST` to Pixiecore's "PXE" server, asking for boot instructions.
|
||||
- (UEFI only) Pixiecore's "PXE" server responds with a `DHCPACK` listing a TFTP file.
|
||||
- PXE ROM downloads iPXE from Pixiecore's TFTP server, and hands off to iPXE.
|
||||
- iPXE starts, broadcasts `DHCPDISCOVER`.
|
||||
- DHCP server responds with a `DHCPOFFER` containing network configs.
|
||||
- Pixiecore's ProxyDHCP server responds with a `DHCPOFFER` listing an HTTP URL.
|
||||
- iPXE does a `DHCPREQUEST`/`DHCPACK` exchange with the DHCP server to get a network configuration.
|
||||
- iPXE fetches its boot script from Pixiecore's HTTP server.
|
||||
- iPXE fetches a kernel and ramdisk from Pixiecore's HTTP server, and boots Linux.
|
||||
|
||||
# Known deviations from specifications
|
||||
|
||||
Pixiecore aims to be compliant with the relevant specifications for
|
||||
TFTP, DHCP, and PXE. This section lists the places where Pixiecore
|
||||
deliberately deviates from the spec to support buggy clients.
|
||||
|
||||
## Missing Client Machine Identifier (GUID) option
|
||||
|
||||
Some PXE ROMs don't send DHCP option 97, "Client Machine Identifier
|
||||
(GUID)", in their DHCP and PXE requests. According to the PXE 2.1
|
||||
specification and RFC 4578, this makes the requests non-compliant:
|
||||
|
||||
> This option MUST be present in all DHCP and PXE packets sent by PXE-compliant clients and servers.
|
||||
|
||||
Pixiecore's behavior implements "SHOULD" instead of "MUST": if a
|
||||
client request has a GUID, Pixiecore's response will respond with a
|
||||
GUID. If the client request has no GUID, Pixiecore omits option 97 in
|
||||
its response.
|
@ -18,6 +18,9 @@ into a single binary that can cooperate with your network's existing
|
||||
DHCP server. You don't need to reconfigure anything else in the
|
||||
network.
|
||||
|
||||
If you're curious about the whole process that Pixiecore manages, you
|
||||
can read the details in [README.booting](README.booting.md).
|
||||
|
||||
## Installation
|
||||
|
||||
Install Pixiecore via `go get`:
|
||||
@ -117,148 +120,3 @@ the host network stack.
|
||||
```shell
|
||||
sudo docker run -v .:/image --net=host danderson/pixiecore boot /image/coreos_production_pxe.vmlinuz /image/coreos_production_pxe_image.cpio.gz
|
||||
```
|
||||
|
||||
## How it works
|
||||
|
||||
Pixiecore implements four different, but related protocols in one
|
||||
binary, which together can take a PXE ROM from nothing to booting
|
||||
Linux. They are: ProxyDHCP, PXE, TFTP, and HTTP. Let's walk through
|
||||
the boot process for a PXE ROM.
|
||||
|
||||
### DHCP/ProxyDHCP
|
||||
|
||||
The first thing a PXE ROM does is request a configuration through
|
||||
DHCP, waiting for a DHCP reply that includes PXE vendor options. The
|
||||
normal way of providing these options is to edit your DHCP server's
|
||||
configuration to provide them to clients that identify themselves as
|
||||
PXE clients. Unfortunately, reconfiguring your network's DHCP server
|
||||
is tedious at best, and impossible if you DHCP server is built into a
|
||||
consumer router, or managed by someone else.
|
||||
|
||||
Pixiecore instead uses a feature of the PXE specification called
|
||||
_ProxyDHCP_. As you might guess from the name, ProxyDHCP is not a
|
||||
proxy at all (yeah, the PXE spec is like that), but a second DHCP
|
||||
server that only provides PXE configuration.
|
||||
|
||||
When the PXE ROM sends out a `DHCPDISCOVER`, it gets two replies back:
|
||||
one containing network configuration from the primary DHCP server, and
|
||||
one containing only PXE DHCP options from the ProxyDHCP server. The
|
||||
PXE firmware combines the two, and continues as if the primary server
|
||||
had provided all the configuration.
|
||||
|
||||
### PXE
|
||||
|
||||
In theory, you'd expect the ProxyDHCP server to just provide a TFTP
|
||||
server IP and a filename to the PXE firmware, and it would proceed to
|
||||
download and boot that just like the BOOTP of old.
|
||||
|
||||
Sadly, the average quality of PXE ROM implementations is abysmal, and
|
||||
many of them fail to chainload correctly if you try to do this from a
|
||||
ProxyDHCP server.
|
||||
|
||||
So, instead, we make use of the spec's "PXE menu" functionality, which
|
||||
lets you tell the PXE firmware to display a boot menu. Just like
|
||||
everything else in PXE, this is quite brittle, so nobody actually uses
|
||||
it to display menus - instead, they just push a more fully featured
|
||||
bootloader over PXE, and let that bootloader do the fancy work.
|
||||
|
||||
However, PXE menus seem to work reliably when combined with
|
||||
ProxyDHCP... And the PXE configuration can provide a timeout after
|
||||
which the first menu entry is booted... And that timeout can be set to
|
||||
zero.
|
||||
|
||||
So, we can just provide a single-entry menu, with a zero timeout, and
|
||||
chainload that way! But wait, there's more terribleness. PXE menu
|
||||
entries don't just list a TFTP server and file to load, because that
|
||||
would be too simple. Instead, each menu entry maps to a "Boot Server
|
||||
Type", and yet another DHCP option maps that boot server type to a set
|
||||
of IP addresses.
|
||||
|
||||
Those IP addresses aren't TFTP servers, but PXE boot servers. PXE boot
|
||||
servers listen on port 4011. They use the DHCP packet format, but only
|
||||
as a way of conveying a DHCP option that says "please tell me how to
|
||||
boot the following Boot Server Type". It's quite possibly the least
|
||||
efficient protocol encoding ever devised.
|
||||
|
||||
At long last, when the PXE server receives that request, it can reply
|
||||
with a BOOTP-ish packet that specified next-server and a filename. And
|
||||
_those_ are, at long last, TFTP.
|
||||
|
||||
### TFTP
|
||||
|
||||
After navigating the eldritch horror of PXE, TFTP is a breath of fresh
|
||||
air. It is indeed a trivial protocol for transferring files. I have
|
||||
found some PXE ROMs that manage to add unnecessary complexity even to
|
||||
that, but by and large, this step is straightforward.
|
||||
|
||||
However, TFTP is quite slow, because it doesn't support transfer
|
||||
windows (well, it does, but it's an extension defined in an RFC
|
||||
published in 2015, so guess how many PXE ROMs implement it...). As a
|
||||
result, you must pay one round-trip per ~1500 bytes transferred, and
|
||||
even on a gigabit network, that slows things down.
|
||||
|
||||
Given that some netboot images are quite large (CoreOS clocks in at
|
||||
almost 200MB), what we really want is to switch to a more efficient
|
||||
protocol. That's where PXELINUX comes in.
|
||||
|
||||
PXELINUX is a small bootloader that knows how to boot Linux kernels,
|
||||
and it comes in a variant that can speak HTTP. PXELINUX is 90kB, which
|
||||
even over TFTP is very fast to transfer.
|
||||
|
||||
Thus, Pixiecore uses TFTP only to transfer PXELINUX, and from there
|
||||
steers it to HTTP for the rest of the loading process.
|
||||
|
||||
### HTTP
|
||||
|
||||
We've finally crawled our way up to the late nineties - we can speak
|
||||
HTTP! Pixiecore's HTTP server is wonderfully familiar and normal. It
|
||||
just serves up a support file that PXELINUX needs (`ldlinux.c32`), a
|
||||
trivial PXELINUX configuration telling it to boot a Linux kernel, and
|
||||
the user-provided kernel and initrd files.
|
||||
|
||||
PXELINUX grabs all of that, and finally, Linux boots.
|
||||
|
||||
### Recap
|
||||
|
||||
This is what the whole boot process looks like on the wire.
|
||||
|
||||
#### Dramatis Personae
|
||||
|
||||
- **PXE ROM**, a brittle firmware burned into the network card.
|
||||
- **DHCP server**, a plain old DHCP server providing network configuration.
|
||||
- **Pixieboot**, the Hero and server of ProxyDHCP, PXE, TFTP and HTTP.
|
||||
- **PXELINUX**, an open source bootloader of the [Syslinux project](http://www.syslinux.org).
|
||||
|
||||
#### Timeline
|
||||
|
||||
- PXE ROM starts, broadcasts `DHCPDISCOVER`.
|
||||
- DHCP server responds with a `DHCPOFFER` containing network configs.
|
||||
- Pixiecore's ProxyDHCP server responds with a `DHCPOFFER` containing a PXE boot menu.
|
||||
- PXE ROM does a `DHCPREQUEST`/`DHCPACK` exchange with the DHCP server to get a network configuration.
|
||||
- PXE ROM processes the PXE boot menu, decides to boot menu entry 0.
|
||||
- PXE ROM sends a `DHCPREQUEST` to Pixiecore's PXE server, asking for a boot file.
|
||||
- Pixiecore's PXE server responds with a `DHCPACK` listing a TFTP
|
||||
server, a boot filename, and a PXELINUX vendor option to make it use
|
||||
HTTP.
|
||||
- PXE ROM downloads PXELINUX from Pixiecore's TFTP server, and hands off to PXELINUX.
|
||||
- PXELINUX fetches its configuration from Pixiecore's HTTP server.
|
||||
- PXELINUX fetches a kernel and ramdisk from Pixiecore's HTTP server, and boots Linux.
|
||||
|
||||
## Known deviations from specifications
|
||||
|
||||
Pixiecore aims to be compliant with the relevant specifications for
|
||||
TFTP, DHCP, and PXE. This section lists the places where Pixiecore
|
||||
deliberately deviates from the spec to support buggy clients.
|
||||
|
||||
### Missing Client Machine Identifier (GUID) option
|
||||
|
||||
Some PXE ROMs don't send DHCP option 97, "Client Machine Identifier
|
||||
(GUID)", in their DHCP and PXE requests. According to the PXE 2.1
|
||||
specification and RFC 4578, this makes the requests non-compliant:
|
||||
|
||||
> This option MUST be present in all DHCP and PXE packets sent by PXE-compliant clients and servers.
|
||||
|
||||
Pixiecore's behavior implements "SHOULD" instead of "MUST": if a
|
||||
client request has a GUID, Pixiecore's response will respond with a
|
||||
GUID. If the client request has no GUID, Pixiecore omits option 97 in
|
||||
its response.
|
||||
|
Loading…
x
Reference in New Issue
Block a user