Kubeconfig is merged into `~/.kube/config` with rename option
(existing configuration is never overwritten).
If endpoint was used, it is automatically put into the `kubeconfig`.
This should make OS X experience literally `talosctl cluster create`
followed by any `kubectl get ...`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.
Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Kubeconfig merge was completely rewritten to be "smarter":
* automatically apply renames done at previous stages to avoid asking
over and over again (in general should ask just once)
* skip checks if parts of the config match exactly
* allow overwrite as an option
* flexible way to control the output
* activating context in the end
* custom merged context name
Fixes#2578Fixes#2587Fixes#2577
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Adds the ability to apply (replace) an existing node configuration with
a new one via the Machine API.
Fixes#2345
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This bug is sometimes reproducible with QEMU/arm64, as it runs really
slow. Looks like multiple concurrent image unpacks sharing some layers
might fail unexpectedly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This uses go-retry feature
(https://github.com/talos-systems/go-retry/pull/3) to print errors being
retried.
If image is not found in the index, abort retries immediately.
Don't pull installer image twice (if already pulled by the validation
code before).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves to using grub instead of syslinux.
BREAKING CHANGE: Single node upgrades will fail in this change. This
will also break the A/B fallback setup since this version introduces
an entirely new partition scheme, that any fallback will not know about.
We plan on addressing these issues in a follow up change.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
In order to perform upgrades the way we would like, it is important that
we avoid any bind mounts into containers. This change ensures that all
system services get their config via stdin.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This change is only moving packages and updating import paths.
Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.
As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.
Other changes were direct dependencies discovered by `importvet` which
were updated.
Public packages (useful, general purpose packages with stable API):
* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`
Private packages (used only on provisioning library internally):
* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This integrates [importvet](https://github.com/talos-systems/importvet)
into `lint` target.
First rule file was added for public packages `pkg/` which shouldn't
depend on other parts of Talos tree (except for the API definitions).
Only one change: `internal/cis` was moved under single user -
`pkg/config/internal/cis` to satisfy the rules.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This make the config provider a pure interface definition by removing
all concrete internal types, and making them an interface.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This includes better machine args, support for UEFI parallel flash
images required as low-level bootloader, and miscallenous cleanups.
Qemu support was enabled for mapping host random source to the guest as
entropy source to prevent stalls on the boot waiting for the entropy.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2363#2364#2370#2371
Several changes packed together:
* use compressed `vmlinuz` everywhere, firecracker provisioner
uncompresses it before first use, drop `vmlinux`
* handle reboots in qemu launcher to support reset API case, update
empty disk check to handle reset behavior (erasing partition table)
* make bootloader support default in provisioners, and flag to disable
that
* early support for target architecture for qemu provisioner
This should allow us to use `qemu` in CI/CD (not included into this PR):
integration test passes with qemu.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This makes `pkg/config` directly importable from other projects.
There should be no functional changes.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Starts and stops qemu VMs, has some initial configuration subset.
Sets up networking through CNI tools, sets up DHCP server which gives IP
addresses to nodes.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
We're not using load balancer for `apid` (always using client-side load
balancing), so we can remove this safely.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos depends on accurate time for many actions, so many services depend
on timed successful health check. If timed fails to do initial sync, it
enters pretty long wait loop for the next attempt which might not come
in time for the boot timeout. Instead, fail timed service on initial
sync and rely on service restart for another attempt.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This ensures that the generated kubeconfig has a namespace. This fixes
an edge case when a user attempts to use the kubeconfig from within a
pod of a different kubernetes cluster. If the kubeconfig does not have a
namespace, kubectl will use the "in cluster namespace" which is
unexpected, especially if the "in cluster namespace" does not exist in
the target cluster.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
Due to the race between `Read()` and context cancellation, error might
be returned which we can safely ignore.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Second part of refactoring to split common logic for VM provisioners
from Firecracker provisioner.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Firecracker never executes the bootloader, so kernel args passed to the
installer aren't used, but if the same disk image is used to boot Talos
e.g. in `qemu`, it fails to set up console properly for example.
This PR simply provides those kernel args to the installer so that
they're persisted in the image.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Created base provisioner struct for all VM based provisioners.
Moved state.go and reflect.go to the common module.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Fixes#2272
`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.
This contains fixes from:
* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.
Summary of changes:
* new `cluster` API
* `apid` now listens without auth on local file socket
* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate
* `talosctl health` by default now does server-side health check
UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos will mark node as schedulable if it was previously cordoned by
Talos (for upgrade, reset, etc.)
If user marked node as not schedulable, Talos won't change it on boot.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Handling of multiple endpoints has already been implemented in #2094.
This PR enables round-robin policy so that grpc picks up new endpoint
for each call (and not send each request to the first control plane
node).
Endpoint list is randomized to handle cases when only one request is
going to be sent, so that it doesn't go always to the first node in the
list.
gprc handles dead/unresponsive nodes automatically for us.
`talosctl cluster create` and provision tests switched to use
client-side load balancer for Talos API.
On the additional improvements we got:
* `talosctl` now reports correct node IP when using commands without
`-n`, not the loadbalancer IP (if using multiple endpoints of course)
* loadbalancer can't provide reliable handling of errors when upstream
server is unresponsive or there're no upstreams available, grpc returns
much more helpful errors
Fixes#1641
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Include kube-apiserver in the list of daemon sets to be checked, and
for each daemon set verify number of pods running and ready, as when
control plane is damaged daemon set properties are not updated properly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
There's a global timeout for all services to be up: it's 5 minutes. We
need to make sure each service startup takes less than that, otherwise
boot sequence is aborted and there's no way to see the error message for
each particular service.
Also propagate contexts correctly and set some default timeouts to make
sure API operations are not hanging forever.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
I had to copy over some oci stuff from newer package version, but as we
for a long time use newer oic, we don't need a copy anymore.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2243
These tests rely on some kind of sync between readers and writers, as if
circular buffer is overrun, test no longer runs as expected.
We use time-sensitive rate limiter to limit write speed to make sure
readers can always catch up. Lowering the rate should slow down writers
and make tests more likely to succeed.
For #2243, the failure was from buffer overrun: when overrun is
detected, `Watch` function closes the channel (and test "receives" zero
element).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds the `/system` directory to provide a dedicated
directory for all system related runtime files.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This replaces logging to files with inotify following to pure in-memory
circular buffer which grows on demand capped at specified maximum
capacity.
The concern with previous approach was that logs on tmpfs were growing
without any bound potentially consuming all the node memory.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves our test scripts to using the bootstrap API. Some
automation around invoking the bootstrap API was also added
to give the same ease of use when creating clusters with the
CLI.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This adds a sentinel error for a missing partition table. This error
is used to detect if a partition table already exists when setting
up user defined disks.
In addition to the fix, this removes a legacy parameter from the
`PartitionTable` method that indicated that the partition table
should be read. It is safer to just read it every time. Also, I
can't think of a case when the block device partition table is nil
and we want to read.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Using this `LoggingManager` all the log flows (reading and writing) were
refactored. Inteface of `LoggingManager` should be now generic enough to
replace log handling with almost any implementation - log rotation,
sending logs to remote destination, keeping logs in memory, etc.
There should be no functional changes.
As part of changes, `follow.Reader` was implemented which makes
appending file feel like a stream. `file.NewChunker` was refactored to
use `follow.Reader` and `stream.NewChunker` to do the actual work. So
basically now we have only a single instance of chunker - stream
chunker, as everything is represented as a stream.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>