mirror of
https://github.com/danderson/netboot.git
synced 2025-10-24 14:01:52 +02:00
iPXE has an annoying race condition where it sometimes doesn't notice the ProxyDHCP response when booting, and fails. So we embed a boot script in the builtin iPXE binaries that implements the retry loop recommended in the documentation. Empirically, this has resolved flaky boots on my test machine, usually no more than a single retry is needed.
57 lines
1.9 KiB
Plaintext
57 lines
1.9 KiB
Plaintext
#!ipxe
|
|
#
|
|
# This is the iPXE boot script that we embed into the iPXE binary.
|
|
#
|
|
# The entire reason for the existence of this script is that iPXE very
|
|
# eagerly configures DHCP as soon as it gets a DHCP response, and
|
|
# because of this it might miss the ProxyDHCP response that tells it
|
|
# how to boot. In this situation, `autoboot` (the default command)
|
|
# just fails and falls out of the PXE boot codepath, so we end up with
|
|
# machines that sometimes fail to "catch" the network boot.
|
|
#
|
|
# This script implements what the ipxe documentation recommends, which
|
|
# is to just retry the `dhcp` command a bunch until ipxe does see a
|
|
# ProxyDHCP response. It's quite ugly, and a proper fix should really
|
|
# get upstreamed to ipxe, but for right now, this works.
|
|
|
|
set attempts:int32 10
|
|
set x:int32 0
|
|
|
|
# Try to get a filename from ProxyDHCP, retrying a couple of times if
|
|
# we fail.
|
|
:loop
|
|
dhcp && isset ${filename} || goto retry
|
|
goto boot
|
|
:retry
|
|
iseq ${x} ${attempts} && goto fail ||
|
|
inc x
|
|
echo No ProxyDHCP response, retrying (attempt ${x}/${attempts})
|
|
goto loop
|
|
|
|
# Got a filename from ProxyDHCP, that's the actual boot script,
|
|
# off we go!
|
|
:boot
|
|
chain ${filename}
|
|
|
|
# Failure at this point probably means Pixiecore changed its mind
|
|
# about whether this machine should be booted in the middle of the
|
|
# boot cycle, so we had already handed off to iPXE, but now we're
|
|
# no longer serving a boot script for it.
|
|
#
|
|
# Reboot the machine to restart the whole cycle (and presumably skip
|
|
# PXE completely this time).
|
|
#
|
|
# It's also possible we just got horribly unlucky and the network
|
|
# environment is such that we're consistently missing the ProxyDHCP
|
|
# reply. That really sucks, so give people pointers to bug filing
|
|
# here.
|
|
:fail
|
|
echo Failed to get a ProxyDHCP response after ${attempts} attempts
|
|
echo
|
|
echo If you are sure that Pixiecore is still trying to boot this machine,
|
|
echo please file a bug at https://github.com/google/netboot .
|
|
echo
|
|
echo Rebooting in 5 seconds...
|
|
sleep 5
|
|
reboot
|