From c3052f317f30468927801f63a86c8e9babcb6db4 Mon Sep 17 00:00:00 2001 From: David Anderson Date: Tue, 9 Aug 2016 00:32:45 -0700 Subject: [PATCH] pixiecore: copy READMEs over. The documentation doesn't match the current code at all, but it's the target to aim for. --- pixiecore/README.api.md | 201 ++++++++++++++++++++++++++++ pixiecore/README.md | 282 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 483 insertions(+) create mode 100644 pixiecore/README.api.md create mode 100644 pixiecore/README.md diff --git a/pixiecore/README.api.md b/pixiecore/README.api.md new file mode 100644 index 0000000..4e0909b --- /dev/null +++ b/pixiecore/README.api.md @@ -0,0 +1,201 @@ +# API server + +Pixiecore supports two modes of operation: static and API-driven. + +In static mode, you just pass it a -kernel and a -initrd, and +Pixiecore will boot any PXE client that it sees. + +In API mode, requests made by PXE clients will be translated into +calls to an HTTP API, essentially asking "Should I boot this client? +If so, what should I boot it with?" This lets you implement fancy +dynamic booting things, as well as construct per-machine commandlines +and whatnot. + +## API specification + +Note that Pixiecore is the _client_ of this API. It's your job to +implement it and point Pixiecore at the right URL prefix. + +The API consists of a single endpoint: +`/v1/boot/`. Pixiecore calls this endpoint +to learn whether/how to boot a machine with a given MAC address. + +Any non-200 response from the server will cause Pixieboot to ignore +the requesting machine. + +A 200 response will cause Pixiecore to boot the requesting machine. A +200 response must come with a JSON document conforming to the +following specification, with **_italicized_** entries being optional: + +- **kernel** (string): the URL of the kernel to boot. +- **_initrd_** (list of strings): URLs of initrds to load. The kernel + will flatten all the initrds into a single filesystem. +- **_cmdline_** (object): commandline parameters for the kernel. Each + key/value pair maps to key=value, where value can be: + - **string**: the value is passed verbatim to the kernel + - **true**: the value is omitted, only the key is passed to the + kernel. + - **object**: the value is a URL that Pixiecore will rewrite such + that it proxies the request (see below for why you'd want that). + - **url** (string): any URL. Pixiecore will rewrite the URL such + that it proxies the request. +- **_message_** (string): A message to display before booting the + provided configuration. Note that displaying this message is on + a _best-effort basis only_, as particular implementations of the + boot process may not support displaying text. + +Malformed 200 responses will have the same result as a non-200 +response - Pixiecore will ignore the requesting machine. + +### Kernel, initrd and cmdline URLs + +As described above, the kernel and initrds are specified as URLs, +enabling you to host them as you please - you could even link directly +to a distro's download links if you wanted. + +URLs provided by the API server can be absolute, or just a naked +path. In the latter case, the path is resolved with reference to the +API server URL that Pixiecore is using - although note that the path +is _not_ rooted within Pixiecore's API path. For example, if you +provide `/foo` as a URL to Pixiecore running with `-api +http://bar.com/baz`, Pixiecore will fetch `http://bar.com/foo`, _not_ +`http://bar.com/baz/foo`. + +In addition to `http` and `https` URLs, Pixiecore supports `file://` +URLs to serve files from the filesystem of the machine running +Pixiecore. You can use this to host large OS images near the target +machines, while still deciding what to boot from a central but remote +location. Pixiecore uses the "path" segment of the URL, so all +`file://` URLs are absolute filesystem paths. + +Pixiecore will not point booting machines directly at the given +URLs. Instead, it will point the booting machines to a proxy URL on +Pixiecore's HTTP server, and proxy the transfer. + +This is done for two reasons: one, the booting machine may be in a +restricted network environment. For example, you may have a policy +that machines must do 802.1x authentication to get full network +access, else they get dropped on a "remediation" vlan. Proxying the +downloads through Pixiecore means you need only one set of edge ACLs +on the remediation vlan, regardless of _what_ you're booting: just +whitelist Pixiecore's IP:port, and from there your API server can boot +whatever you want. + +Second, the booting machine is limited to using HTTP to fetch +images. This is probably okay (though not ideal, admittedly - but then +again, PXE forces us to TFTP anyway, so we're already screwed for +security) on the machine's local ethernet broadcast domain, but is +definitely not okay for retrieval over the internet. Proxying through +Pixiecore means that your API server can provide HTTPS URLs, and +everything but the very last mile between Pixiecore and the machine +will be secure. + +The exact URLs visible to the booting machine are an implementation +detail of Pixiecore and are subject to breaking change at any +time. + +For the curious, the current implementation translates API server +provided URLs into `/f/`. The signed URL blob is a base64-encoding of running NaCL's +secretbox authenticated encryption function over the server-provided +URL, using an ephemeral key generated when Pixiecore starts. This +steers the booting machine through Pixiecore for the fetch, and lets +Pixiecore verify that it's only proxying for URLs that the API server +gave it, so it's not an open proxy on your remediation vlan. + +### Multiple calls + +Pixiecore in API mode is stateless. Due to the unique way that PXE +works, the API server may receive multiple requests for a single +machine boot. Unfortunately, there is no good way to reliably provide +a 1:1 mapping between a machine boot and an API server request. + +If you want to implement "single-shot" boot behavior (i.e. "netboot +this MAC once, then go back to ignoring it"), you'll need to add a +signalling backchannel to the OS image, so that it signals your API +server when it's booted. Responding only to the first request for a +MAC address will not have the desired effect. + +### Example responses + +Boot into CoreOS stable. **WARNING**: this example is **unsafe**, +because the images are linked to over HTTP, and we're not doing GPG +verification of the image signatures. This is an example only. + +```json +{ + "kernel": "http://stable.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz", + "initrd": ["http://stable.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz"] +} +``` + +Boot from API server provided files. Pixiecore will grab kernel and +initrd from `/kernel` and `/initrd.[01]`. + +```json +{ + "kernel": "/kernel", + "initrd": ["/initrd.0", "/initrd.1"] +} +``` + +Boot from HTTPS, with extra commandline flags. + +```json +{ + "kernel": "https://files.local/kernel", + "initrd": ["https://files.local/initrd"], + "cmdline": { + "selinux": "1", + "coreos.autologin": true + } +} +``` + +Boot from Pixiecore's local filesystem. + +```json +{ + "kernel": "file:///mnt/data/kernel", + "initrd": ["file:///mnt/data/initrd"], +} +``` + +Provide a proxied cloud-config and an unproxied other URL. + +```json +{ + "kernel": "https://files.local/kernel", + "initrd": ["https://files.local/initrd"], + "cmdline": { + "cloud-config-url": { + "url": "https://files.local/cloud-config" + }, + "non-proxied-url": "https://files.local/something-else" + } +} +``` + +### Example API server + +There is a very small example API server implementation in the +`example` subdirectory. This sample server is not production-quality +code (e.g. it uses panic for error handling), but should be a +reasonable starting point nonetheless. It implements a reduced form of +Pixiecore's static mode: you give it a kernel, initrd and commandline +as flags, and it serves those for all boot requests it +receives. Unlike Pixiecore's builtin static mode, the sample server +can only boot one initrd image. + +## Deprecated features + +### Kernel commandline as a string + +The `cmdline` parameter returned by the API server can also be a plain +string instead of an object. That string is the full verbatim +commandline to be passed to the booting kernel. + +This form was replaced by the object form to allow Pixiecore to do +additional processing of the commandline before passing it to the +booting kernel - specifically to allow for URL translation and +proxying. diff --git a/pixiecore/README.md b/pixiecore/README.md new file mode 100644 index 0000000..2977587 --- /dev/null +++ b/pixiecore/README.md @@ -0,0 +1,282 @@ +NOTE THAT THIS IS NOT YET READY AT ALL, THIS README WAS JUST COPIED +OVER TO THIS NEW VERSION, THE CODE SUPPORTS NONE OF THIS RIGHT +NOW. Come back later if you want working software. + +[![software](https://img.shields.io/badge/software-unstable-red.svg)](https://github.com/google/netboot/pixiecore) +[![software2](https://img.shields.io/badge/software-unready-red.svg)](https://github.com/google/netboot/pixiecore) +[![production](https://img.shields.io/badge/production-avoid-red.svg)](https://github.com/google/netboot/pixiecore) + +# Pixiecore, PXE booting for people in a hurry + +``` +There once was a protocol called PXE, +Whose specification was overly tricksy. +A committee refined it +Into a big Turing tarpit, +And now you're using it to boot your PC. +``` + +Booting a Linux system over the network is quite tedious. You have to +set up a TFTP server, configure your DHCP server to recognize PXE +clients, and send them the right set of magical options to get them to +boot, often fighting rubbish PXE ROM implementations. + +Pixiecore aims to simplify this process, by packing the whole process +into a single binary that can cooperate with your network's existing +DHCP server. + +Pixiecore can be used either as a simple "just boot into this OS +image" tool, or as a building block of a machine management system +with its API mode. + +[![Build Status](https://travis-ci.org/danderson/pixiecore.svg?branch=master)](https://travis-ci.org/danderson/pixiecore) + +## Pixiecore in static mode ("I just want to boot 5 machines") + +Run the pixiecore binary, passing it a kernel and initrd, and +optionally some extra kernel commandline arguments. + +Here's a couple of examples. If you feel like a screencast instead, +there's a +[very short demo](https://www.youtube.com/watch?v=xjdTOt5YDQM). + +### Tiny Core Linux + +Tiny Core Linux is a positively tiny distro, clocking in at 10M in the +configuration we'll be using (it can go lower than that). Let's set +ourselves up such that any PXE booting machine on the network boots +into a TinyCore ramdisk: + +```shell +# Fetch the kernel and the 2 cpio files that form the filesystem. +wget http://tinycorelinux.net/7.x/x86/release/distribution_files/{vmlinuz64,modules64.gz,rootfs.gz} + +# In the real world, you would AUTHENTICATE YOUR DOWNLOADS here. TCL sadly +# only distributes images over HTTP, so it's anyone's guess what you +# just downloaded. + +# Go! +pixiecore -kernel vmlinuz64 -initrd rootfs.gz,core.gz,modules64.gz +``` + +That's it. Any machine that tries to netboot on this network will now +boot into a TinyCore ramdisk. + +Notice that we passed multiple cpio archives to `-initrd`. All +provided archives will be merged on boot to form the final +ramdisk. This is quite handy for things like providing OEM +configuration without having to respin the upstream initrd image. + +### CoreOS + +Pixiecore was originally written as a component in an automated +installation system for CoreOS on bare metal. For this example, let's +set up a netboot for the alpha CoreOS release: + +```shell +# Grab the PXE images and verify them +wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz +wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz + +# In the real world, you would AUTHENTICATE YOUR DOWNLOADS +# here. CoreOS distributes image signatures, but that only really +# helps if you already know the right GPG key. + +# Go! +pixiecore -kernel coreos_production_pxe.vmlinuz -initrd coreos_production_pxe_image.cpio.gz --cmdline coreos.autologin +``` + +Notice that we're passing an extra commandline argument to make CoreOS +automatically log in once it's booted. + +## Pixiecore in API mode + +Think of Pixiecore in API mode as a "PXE to HTTP" translator. Whenever +Pixiecore sees a machine trying to PXE boot, it will ask a remote HTTP +API (which you implement) what to do. The API server can tell +Pixiecore to ignore the machine, or tell it to boot into a given +kernel/initrd/commandline. + +Effectively, Pixiecore in API mode lets you pretend that your machines +speak a simple JSON protocol when trying to netboot. This makes it +_far_ easier to play with netbooting in your own software. + +To start Pixiecore in API mode, pass it the HTTP API endpoint through +the `-api` flag. The endpoint you provide must implement the Pixiecore +boot API, as described in the [API spec](README.api.md). + +You can find a sample API server implementation in the `example` +subdirectory. The code is not production-grade, but gives a short +illustration of how the protocol works by reimplementing a subset of +Pixiecore's static mode as an API server. + +## Running in Docker + +Pixiecore is available as a Docker image called +`danderson/pixiecore`. It's an automatic Docker Hub build that tracks +the repository. + +Because Pixiecore needs to listen for DHCP traffic, it has to run with +the host network stack. + +```shell +sudo docker run -v .:/image --net=host danderson/pixiecore -kernel /image/coreos_production_pxe.vmlinuz -initrd /image/coreos_production_pxe_image.cpio.gz +``` + +## How it works + +Pixiecore implements four different, but related protocols in one +binary, which together can take a PXE ROM from nothing to booting +Linux. They are: ProxyDHCP, PXE, TFTP, and HTTP. Let's walk through +the boot process for a PXE ROM. + +### DHCP/ProxyDHCP + +The first thing a PXE ROM does is request a configuration through +DHCP, waiting for a DHCP reply that includes PXE vendor options. The +normal way of providing these options is to edit your DHCP server's +configuration to provide them to clients that identify themselves as +PXE clients. Unfortunately, reconfiguring your network's DHCP server +is tedious at best, and impossible if you DHCP server is built into a +consumer router, or managed by someone else. + +Pixiecore instead uses a feature of the PXE specification called +_ProxyDHCP_. As you might guess from the name, ProxyDHCP is not a +proxy at all (yeah, the PXE spec is like that), but a second DHCP +server that only provides PXE configuration. + +When the PXE ROM sends out a `DHCPDISCOVER`, it gets two replies back: +one containing network configuration from the primary DHCP server, and +one containing only PXE DHCP options from the ProxyDHCP server. The +PXE firmware combines the two, and continues as if the primary server +had provided all the configuration. + +### PXE + +In theory, you'd expect the ProxyDHCP server to just provide a TFTP +server IP and a filename to the PXE firmware, and it would proceed to +download and boot that just like the BOOTP of old. + +Sadly, the average quality of PXE ROM implementations is abysmal, and +many of them fail to chainload correctly if you try to do this from a +ProxyDHCP server. + +So, instead, we make use of the spec's "PXE menu" functionality, which +lets you tell the PXE firmware to display a boot menu. Just like +everything else in PXE, this is quite brittle, so nobody actually uses +it to display menus - instead, they just push a more fully featured +bootloader over PXE, and let that bootloader do the fancy work. + +However, PXE menus seem to work reliably when combined with +ProxyDHCP... And the PXE configuration can provide a timeout after +which the first menu entry is booted... And that timeout can be set to +zero. + +So, we can just provide a single-entry menu, with a zero timeout, and +chainload that way! But wait, there's more terribleness. PXE menu +entries don't just list a TFTP server and file to load, because that +would be too simple. Instead, each menu entry maps to a "Boot Server +Type", and yet another DHCP option maps that boot server type to a set +of IP addresses. + +Those IP addresses aren't TFTP servers, but PXE boot servers. PXE boot +servers listen on port 4011. They use the DHCP packet format, but only +as a way of conveying a DHCP option that says "please tell me how to +boot the following Boot Server Type". It's quite possibly the least +efficient protocol encoding ever devised. + +At long last, when the PXE server receives that request, it can reply +with a BOOTP-ish packet that specified next-server and a filename. And +_those_ are, at long last, TFTP. + +### TFTP + +After navigating the eldritch horror of PXE, TFTP is a breath of fresh +air. It is indeed a trivial protocol for transferring files. I have +found some PXE ROMs that manage to add unnecessary complexity even to +that, but by and large, this step is straightforward. + +However, TFTP is quite slow, because it doesn't support transfer +windows (well, it does, but it's an extension defined in an RFC +published in 2015, so guess how many PXE ROMs implement it...). As a +result, you must pay one round-trip per ~1500 bytes transferred, and +even on a gigabit network, that slows things down. + +Given that some netboot images are quite large (CoreOS clocks in at +almost 200MB), what we really want is to switch to a more efficient +protocol. That's where PXELINUX comes in. + +PXELINUX is a small bootloader that knows how to boot Linux kernels, +and it comes in a variant that can speak HTTP. PXELINUX is 90kB, which +even over TFTP is very fast to transfer. + +Thus, Pixiecore uses TFTP only to transfer PXELINUX, and from there +steers it to HTTP for the rest of the loading process. + +### HTTP + +We've finally crawled our way up to the late nineties - we can speak +HTTP! Pixiecore's HTTP server is wonderfully familiar and normal. It +just serves up a support file that PXELINUX needs (`ldlinux.c32`), a +trivial PXELINUX configuration telling it to boot a Linux kernel, and +the user-provided kernel and initrd files. + +PXELINUX grabs all of that, and finally, Linux boots. + +### Recap + +This is what the whole boot process looks like on the wire. + +#### Dramatis Personae + +- **PXE ROM**, a brittle firmware burned into the network card. +- **DHCP server**, a plain old DHCP server providing network configuration. +- **Pixieboot**, the Hero and server of ProxyDHCP, PXE, TFTP and HTTP. +- **PXELINUX**, an open source bootloader of the [Syslinux project](http://www.syslinux.org). + +#### Timeline + +- PXE ROM starts, broadcasts `DHCPDISCOVER`. +- DHCP server responds with a `DHCPOFFER` containing network configs. +- Pixiecore's ProxyDHCP server responds with a `DHCPOFFER` containing a PXE boot menu. +- PXE ROM does a `DHCPREQUEST`/`DHCPACK` exchange with the DHCP server to get a network configuration. +- PXE ROM processes the PXE boot menu, decides to boot menu entry 0. +- PXE ROM sends a `DHCPREQUEST` to Pixiecore's PXE server, asking for a boot file. +- Pixiecore's PXE server responds with a `DHCPACK` listing a TFTP + server, a boot filename, and a PXELINUX vendor option to make it use + HTTP. +- PXE ROM downloads PXELINUX from Pixiecore's TFTP server, and hands off to PXELINUX. +- PXELINUX fetches its configuration from Pixiecore's HTTP server. +- PXELINUX fetches a kernel and ramdisk from Pixiecore's HTTP server, and boots Linux. + +## Known deviations from specifications + +Pixiecore aims to be compliant with the relevant specifications for +TFTP, DHCP, and PXE. This section lists the places where Pixiecore +deliberately deviates from the spec to support buggy clients. + +### Missing Client Machine Identifier (GUID) option + +Some PXE ROMs don't send DHCP option 97, "Client Machine Identifier +(GUID)", in their DHCP and PXE requests. According to the PXE 2.1 +specification and RFC 4578, this makes the requests non-compliant: + +> This option MUST be present in all DHCP and PXE packets sent by PXE-compliant clients and servers. + +Pixiecore's behavior implements "SHOULD" instead of "MUST": if a +client request has a GUID, Pixiecore's response will respond with a +GUID. If the client request has no GUID, Pixiecore omits option 97 in +its response. + +## Development + +You can use [Vagrant](https://www.vagrantup.com/) to quickly setup a test environment: + + (HOST)$ vagrant up --provider=libvirt pxeserver + (HOST)$ vagrant ssh pxeserver + (PXESERVER)$ wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz + (PXESERVER)$ wget http://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz + (PXESERVER)$ pixiecore -debug -kernel coreos_production_pxe.vmlinuz -initrd coreos_production_pxe_image.cpio.gz --cmdline coreos.autologin + ### In another terminal + (HOST)$ vagrant up --provider=libvirt pxeclient1 +