Due to the race, main goroutine might consume all the errors from
`errCh` and close `nodesCh`, so node goroutine might hit panic on send
to closed channel.
```
panic: send on closed channel
goroutine 40 [running]:
github.com/talos-systems/talos/internal/pkg/provision/providers/firecracker.(*provisioner).createNodes.func1(0x26ab668, 0xc00025a000, 0xc0005a83c0, 0xc00029d540, 0xc000536120, 0xc000464540, 0xc000041d80, 0x18, 0xc0006d406c, 0x4, ...)
/src/internal/pkg/provision/providers/firecracker/node.go:55 +0x1fa
created by github.com/talos-systems/talos/internal/pkg/provision/providers/firecracker.(*provisioner).createNodes
/src/internal/pkg/provision/providers/firecracker/node.go:50 +0x1ca
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Since the `--once` option of `extlinux` seems to only work with BIOS, we
needed to change to remove any reliance on this option. Instead of
booting the upgraded version once, and then making it the default after
a successful boot, we now make it the default, and then revert on any
boot error.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This is a rename of the osctl binary. We decided that talosctl is a
better name for the Talos CLI. This does not break any APIs, but does
make older documentation only accurate for previous versions of Talos.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR introduces a new strategy for upgrades. Instead of attempting to
zap the partition table, create a new one, and then format the
partitions, this change will only update the `vmlinuz`, and
`initramfs.xz` being used to boot. It introduces an A/B style upgrade
process, which will allow for easy rollbacks. One deviation from our
original intention with upgrades is that this change does not completely
reset a node. It falls just short of that and does not reset the
partition table. This forces us to keep the current partition scheme in
mind as we make changes in the future, because an upgrade assumes a
specific partition scheme. We can improve upgrades further in the
future, but this will at least make them more dependable. Finally, one
more feature in this PR is the ability to keep state. This enables
single node clusters to upgrade since we keep the etcd data around.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Every node is reset, rebooted and it comes back up again except for the
init node due to known issues with init node boostrapping etcd cluster
from scratch when metadata is missing (as node was wiped).
Planned workaround is to prohibit resetting init node (should be coming
next).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This extracts admin kubeconfig generation out of bootkube, now based on
Talos x509 library. On each API request for `kubeconfig`, config is
generated on the fly and sent back on the wire.
This fixes two issues:
* any master node can now generate `kubeconfig` (worker nodes can do
that too, but that should probably change in the future)
* after upgrade-and-wipe the disk scenario, `osctl kubeconfig` still
works
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
With the fix#1904, it's now possible to upgrade 0.4.x with
`machine.File` extra files (caused by registry mirror for
registry.ci.svc).
Bump resources for upgrade tests in attempt to speed it up.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This makes use of the external procfs pacakge that is based on the
pacakge we are removing here.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
When inner function was added, `return nil` was not aborting launch
sequence, but rather leading to VM restart. `cluster destroy` still
worked fine, as it removes state directory and launcher exits on
failure.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Firecracker launches tries to open VM disk image before every boot,
parses partition table, finds boot partition, tries to read it as FAT32
filesystem, extracts uncompressed kernel from `bzImage` (firecracker
doesn't support `bzImage` yet), extracts initramfs and passes it to
firecracker binary.
This flow allows for extended tests, e.g. testing installer, upgrade and
downgrade tests, etc.
Bootloader emulation is disabled by default for now, can be enabled via
`--with-bootloader-emulation` flag to `osctl cluster create`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR contains generic simple TCP loadbalancer code, and glue code for
firecracker provisioner to use this loadbalancer.
K8s control plane is passed through the load balancer, and Talos API is
passed only to the init node (for now, as some APIs, including
kubeconfig, don't work with non-init node).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Cleaning things up so that IP network can be re-used with another
network name (and inteface name).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements `osctl cluster destroy` for Firecracker, adds
new utility command `osctl cluser show`.
Firecracker mode now has control process for firecracker VMs, allowing
clean reboots and background operations.
Lots of small fixes to Firecracker mode, clean CNI shutdown, cleaning up
netns, etc.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is initial PR to push the initial code, it has several known
problems which are going to be addressed in follow-up PRs:
1. there's no "cluster destroy", so the only way to stop the VMs is to
`pkill firecracker`
2. provisioner creates state in `/tmp` and never deletes it, that is
required to keep cluster running when `osctl cluster create` finishes
3. doesn't run any controller process around firecracker to support
reboots/CNI cleanup (vethxyz interfaces are lingering on the host as
they're never cleaned up)
The plan is to create some structure in `~/.talos` to manage cluster
state, e.g. `~/.talos/clusters/<name>` which will contain all the
required files (disk images, file sockets, VM logs, etc.). This
directory structure will also work as a way to detect running clusters
and clean them up.
For point number 3, `osctl cluster create` is going to exec lightweight
process to control the firecracker VM process and to simulate VM reboots
if firecracker finishes cleanly (when VM reboots).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>