By default, build outside of Drone works the same and builds only amd64
version, loads images back into dockerd, etc.
If multiple platforms are used, multi-arch images are built which can't
be exported to docker or to `.tar` image, they're always pushed to the
registry (even for PR builds to our internal CI registry).
Artifacts as files (initramfs, kernel) now have `-arch` suffix:
`vmlinuz-amd64`, `initramfs-amd64.xz`. "Magic" script normalizes output
paths depending on whether single platform or multiple platforms were
given.
VM provisioners accept magic `${ARCH}` in initramfs/kernel paths which
gets replaced by cluster architecture.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Add Kubernetes upgrade as part of the provisioning (upgrade tests):
first K8s control plane is upgraded, then Talos is upgraded (with
kubelet), and e2e test is run last.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Add sonobuoy runner code with log fetching on failure. Use hand-picked
set of e2e tests to run: verify basic pod functionality, verify service
connectivity.
Add option `--run-e2e` to the `talosctl health` to run quick e2e test to
verify cluster health.
Add option to run provision tests with custom CNI, run one track of
provision tests with Cilium.
Bump Cilium to 1.8.2.
Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as
0.5 doesn't put annotation. Workaround that in upgrade tests.
Bump upgrade test version to 0.6.0 release.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves to using grub instead of syslinux.
BREAKING CHANGE: Single node upgrades will fail in this change. This
will also break the A/B fallback setup since this version introduces
an entirely new partition scheme, that any fallback will not know about.
We plan on addressing these issues in a follow up change.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This change is only moving packages and updating import paths.
Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.
As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.
Other changes were direct dependencies discovered by `importvet` which
were updated.
Public packages (useful, general purpose packages with stable API):
* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`
Private packages (used only on provisioning library internally):
* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2363#2364#2370#2371
Several changes packed together:
* use compressed `vmlinuz` everywhere, firecracker provisioner
uncompresses it before first use, drop `vmlinux`
* handle reboots in qemu launcher to support reset API case, update
empty disk check to handle reset behavior (erasing partition table)
* make bootloader support default in provisioners, and flag to disable
that
* early support for target architecture for qemu provisioner
This should allow us to use `qemu` in CI/CD (not included into this PR):
integration test passes with qemu.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This makes `pkg/config` directly importable from other projects.
There should be no functional changes.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This bumps next version to the latest 0.6 alpha and latest 0.5.
This also enables single node preserve test.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Handling of multiple endpoints has already been implemented in #2094.
This PR enables round-robin policy so that grpc picks up new endpoint
for each call (and not send each request to the first control plane
node).
Endpoint list is randomized to handle cases when only one request is
going to be sent, so that it doesn't go always to the first node in the
list.
gprc handles dead/unresponsive nodes automatically for us.
`talosctl cluster create` and provision tests switched to use
client-side load balancer for Talos API.
On the additional improvements we got:
* `talosctl` now reports correct node IP when using commands without
`-n`, not the loadbalancer IP (if using multiple endpoints of course)
* loadbalancer can't provide reliable handling of errors when upstream
server is unresponsive or there're no upstreams available, grpc returns
much more helpful errors
Fixes#1641
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.
A few highlights are:
- no more event bus
- functional approach to tasks (no more types defined for each task)
- the task function definition now offers a lot more context, like
access to raw API requests, the current sequence, a logger, the new
state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
install force option
Additionally, this pulls various packes in under machined to make the
code easier to navigate.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This extracts health & crashdump features which were specific to
provisioning code into separate package which can be used standalone.
Everything else is just new glue.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This updates upgrade tests to run two flows with 3+1 clusters:
1. 0.3 -> current (testing upgrade with partition wiping)
2. 0.4-alpha.7 -> current (testing upgrade without partition wiping,
boot-a/boot-b)
And small upgrade with preserve enabled for single-node cluster.
Provision tests are now split into two parallel tracks in Drone.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a rename of the osctl binary. We decided that talosctl is a
better name for the Talos CLI. This does not break any APIs, but does
make older documentation only accurate for previous versions of Talos.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR introduces a new strategy for upgrades. Instead of attempting to
zap the partition table, create a new one, and then format the
partitions, this change will only update the `vmlinuz`, and
`initramfs.xz` being used to boot. It introduces an A/B style upgrade
process, which will allow for easy rollbacks. One deviation from our
original intention with upgrades is that this change does not completely
reset a node. It falls just short of that and does not reset the
partition table. This forces us to keep the current partition scheme in
mind as we make changes in the future, because an upgrade assumes a
specific partition scheme. We can improve upgrades further in the
future, but this will at least make them more dependable. Finally, one
more feature in this PR is the ability to keep state. This enables
single node clusters to upgrade since we keep the etcd data around.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This keeps backwards compatibility with `osctl` CLI binary with the
exception of `osctl config generate` which was renamed to `osctl
gen config` to avoid confusion with other `osctl config`
commands which operate on client config, not Talos server config.
Command implementation and helpers were split into subpackages for
cleaner code and more visible boundaries. The resulting binary still
combines commands from both sections into a single binary.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This extracts admin kubeconfig generation out of bootkube, now based on
Talos x509 library. On each API request for `kubeconfig`, config is
generated on the fly and sent back on the wire.
This fixes two issues:
* any master node can now generate `kubeconfig` (worker nodes can do
that too, but that should probably change in the future)
* after upgrade-and-wipe the disk scenario, `osctl kubeconfig` still
works
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
With the fix#1904, it's now possible to upgrade 0.4.x with
`machine.File` extra files (caused by registry mirror for
registry.ci.svc).
Bump resources for upgrade tests in attempt to speed it up.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This class of tests is included/excluded by build tags, but as it is
pretty different from other integration tests, we build it as separate
executable. Provision tests provision cluster for the test run, perform
some actions and verify results (could be upgrade, reset, scale up/down,
etc.)
There's now framework to implement upgrade tests, first of the tests
tests upgrade from latest 0.3 (0.3.2 at the moment) to current version
of Talos (being built in CI). Tests starts by booting with 0.3
kernel/initramfs, runs 0.3 installer to install 0.3.2 cluster, wait for
bootstrap, followed by upgrade to 0.4 in rolling fashion. As Firecracker
supports bootloader, this boots 0.4 system from boot disk (as installed
by installer).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>