mirror of
https://github.com/danderson/netboot.git
synced 2025-08-07 07:07:17 +02:00
pixiecore: Add a retry loop in the iPXE boot.
iPXE has an annoying race condition where it sometimes doesn't notice the ProxyDHCP response when booting, and fails. So we embed a boot script in the builtin iPXE binaries that implements the retry loop recommended in the documentation. Empirically, this has resolved flaky boots on my test machine, usually no more than a single retry is needed.
This commit is contained in:
parent
b5e956b9fc
commit
ff7a0b56c6
56
pixiecore/boot.ipxe
Normal file
56
pixiecore/boot.ipxe
Normal file
@ -0,0 +1,56 @@
|
||||
#!ipxe
|
||||
#
|
||||
# This is the iPXE boot script that we embed into the iPXE binary.
|
||||
#
|
||||
# The entire reason for the existence of this script is that iPXE very
|
||||
# eagerly configures DHCP as soon as it gets a DHCP response, and
|
||||
# because of this it might miss the ProxyDHCP response that tells it
|
||||
# how to boot. In this situation, `autoboot` (the default command)
|
||||
# just fails and falls out of the PXE boot codepath, so we end up with
|
||||
# machines that sometimes fail to "catch" the network boot.
|
||||
#
|
||||
# This script implements what the ipxe documentation recommends, which
|
||||
# is to just retry the `dhcp` command a bunch until ipxe does see a
|
||||
# ProxyDHCP response. It's quite ugly, and a proper fix should really
|
||||
# get upstreamed to ipxe, but for right now, this works.
|
||||
|
||||
set attempts:int32 10
|
||||
set x:int32 0
|
||||
|
||||
# Try to get a filename from ProxyDHCP, retrying a couple of times if
|
||||
# we fail.
|
||||
:loop
|
||||
dhcp && isset ${filename} || goto retry
|
||||
goto boot
|
||||
:retry
|
||||
iseq ${x} ${attempts} && goto fail ||
|
||||
inc x
|
||||
echo No ProxyDHCP response, retrying (attempt ${x}/${attempts})
|
||||
goto loop
|
||||
|
||||
# Got a filename from ProxyDHCP, that's the actual boot script,
|
||||
# off we go!
|
||||
:boot
|
||||
chain ${filename}
|
||||
|
||||
# Failure at this point probably means Pixiecore changed its mind
|
||||
# about whether this machine should be booted in the middle of the
|
||||
# boot cycle, so we had already handed off to iPXE, but now we're
|
||||
# no longer serving a boot script for it.
|
||||
#
|
||||
# Reboot the machine to restart the whole cycle (and presumably skip
|
||||
# PXE completely this time).
|
||||
#
|
||||
# It's also possible we just got horribly unlucky and the network
|
||||
# environment is such that we're consistently missing the ProxyDHCP
|
||||
# reply. That really sucks, so give people pointers to bug filing
|
||||
# here.
|
||||
:fail
|
||||
echo Failed to get a ProxyDHCP response after ${attempts} attempts
|
||||
echo
|
||||
echo If you are sure that Pixiecore is still trying to boot this machine,
|
||||
echo please file a bug at https://github.com/google/netboot .
|
||||
echo
|
||||
echo Rebooting in 5 seconds...
|
||||
sleep 5
|
||||
reboot
|
4
third_party/Makefile
vendored
4
third_party/Makefile
vendored
@ -5,7 +5,9 @@ ipxe:
|
||||
git clone git://git.ipxe.org/ipxe.git
|
||||
(cd ipxe && git rev-parse HEAD >COMMIT-ID)
|
||||
rm -rf ipxe/.git
|
||||
(cd ipxe/src && make bin/undionly.kpxe bin-x86_64-efi/ipxe.efi bin-i386-efi/ipxe.efi)
|
||||
(cd ipxe/src && make bin/undionly.kpxe EMBED=../../../pixiecore/boot.ipxe)
|
||||
(cd ipxe/src && make bin-x86_64-efi/ipxe.efi EMBED=../../../pixiecore/boot.ipxe)
|
||||
(cd ipxe/src && make bin-i386-efi/ipxe.efi EMBED=../../../pixiecore/boot.ipxe)
|
||||
(cd ipxe && rm -rf bin && mkdir bin)
|
||||
mv -f ipxe/src/bin/undionly.kpxe ipxe/bin/undionly.kpxe
|
||||
mv -f ipxe/src/bin-x86_64-efi/ipxe.efi ipxe/bin/ipxe-x86_64.efi
|
||||
|
6
third_party/ipxe/ipxe-bin.go
vendored
6
third_party/ipxe/ipxe-bin.go
vendored
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue
Block a user