* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
columns in the table output
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This allows to put keys to META partition.
META contents can be viewed with `talosctl get metakeys`.
There is not real usecase for it yet, but the next PRs will introduce
two special keys which can be written:
* platform network config for `metal`
* `${code}` variable
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This allows to safely recover out of space quota issues, and perform
degragmentation as needed.
`talosctl etcd status` command provides lots of information about the
cluster health.
See docs for more details.
Fixes#4889
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add a controller that provides the etcd member id as a resource
and change the etcd related commands to support member ids next to
hostnames.
Fixes: #6223
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos client can connect to Talos API via a proxy with basic auth.
Additionally it is now optional to specify a TLS CA,key or crt. Optionally
Developers can build talosctl with WITH_DEBUG=1 to allow insecure
connections when http:// endpoints are specified.
Fixes#5980
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Overview: deprecate existing Talos resource API, and introduce new COSI
API.
Consequences:
* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Add a new function EventsWatchV2 that blocks until receiving the first event then switches to non-blocking mode.
Also add new API functions to return responses of the lifecycle actions `reboot`, `reset` and `shutdown`.
Required for the client-side part of siderolabs/talos#5499.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Users can now add a port suffix to the endpoints used by talosctl. Either
in the CLI flag or the ~/.talos/config. The default port is still 50000.
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
When message is sent via the proxy, `metadata.error` carries only string
representation which can't be unmarshalled back into an `error` which we
can match against. A similar fix was already done for "unary" responses,
but we missed the streaming case.
This fixes a spurious failure in integration tests when calling
`talosctl pcap --duration 1s`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.
There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is the follow-up fix to the PR #5129.
1. Correctly catch only expected errors in the tests.
2. Rewind the snapshot each time the upload is retried.
3. Correctly unwrap errors in the `EtcdRecovery` client.
4. Update the `grpc-proxy` library to pass through the EOF error.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Cordon & drain a node when the Shutdown message is received.
Also adds a '--force' option to the shutdown command in case the control
plane is unresponsive.
Signed-off-by: Tim Jones <timniverse@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4279
These APIs were deprecated in 0.13, now it's time to drop them for 0.14.
They were not used anywhere in Talos, so no changes on Talos side.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4094
Deprecate old networkd APIs, `talosctl interfaces` and `talosctl routes`
now suggest different commands to be used to achieve same task.
TUI installer was updated to stop using Interfaces API.
Those APIs will be completely removed in 0.14.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This simply uses new protobuf package instead of old one.
Old protobuf package is still in use by Talos dependencies.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The problem was that gRPC method `status.Code(err)` doesn't unwrap
errors, while Talos client returns errors wrapped with
`multierror.Error` and `fmt.Errrorf`, so `status.Code` doesn't return
error code correctly.
Fix that by introducing our own client method which correctly goes over
the chain of wrapped errors.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.
Closes#3421.
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
This change is for Theila which is going to use gRPC proxy to forward
requests from TS frontend right to the node's apid.
`gRPC` proxy operates on top of `grpc.ClientConn` objects, so getting
this connection from the clients which are already being created is the
easiest path.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
When Talos `controlplane` node is waiting for a bootstrap, `etcd`
contents can be recovered from a snapshot created with
`talosctl etcd snapshot` on a healthy cluster.
Bootstrap process goes same way as before, but the etcd data directory
is recovered from the snapshot.
This flow enables disaster recovery for the control plane: given that
periodic backups are available, destroy control plane nodes, re-create
them with the same config, and bootstrap one node with the saved
snapshot to recover etcd state at the time of the snapshot.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds a simple API and `talosctl etcd snapshot` command to stream
snapshot of etcd from one of the control plane nodes to the local file.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3219
We already have `etcd leave`, which makes the node exclude itself from
etcd members.
But in case if the node can't remove itself because it doesn't have
connection to etcd we need this etcd remove-member cli, which basically removes
a node from a different node.
No unit tests for that as it's going to destroy the test cluster.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Also fix recovery grpc handler to print panic stacktrace to the log.
Any API should follow the structure compatible with apid proxying
injection of errors/nodes.
Explicitly fail GenerateConfig API on worker nodes, as it panics on
worker nodes (missing certificates in node config).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
In our client API, as the request goes through `apid` proxying, actual
error might be wrapped into the response (to support multi-node
requests), so it should always be correctly unwrapped.
This has UX issue: `talosctl apply-config` silently doesn't work (server
API fails, but no errror is shown in `talosctl`).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Our upgrades are safe by default - we check etcd health, take locks,
etc. But sometimes upgrades might be a way to recover broken (or
semi-broken) cluster, in that case we need upgrade to run even if the
checks are not passing. This is not a safe way to do upgrades, but it
might be a way to recover a cluster.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.
Two Talos resources (and one controller) are implemented:
* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config
Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Idea is to add an option to perform "selective" reset: default reset
operation is to wipe all partitions (triggering reinstall), while spec
allows only to wipe some of the operations.
Other operations are performed exactly in the same way for any reset
flow.
Possible use case: reset only `EPHEMERAL` partition.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Regular upgrade path takes just one reboot, but it requires all the
processes to be stopped on the node before upgrade might proceed. Under
some circumstances and with potential Talos bugs it might not work
rendering Talos upgrades almost impossible.
Staged upgrades build upon regular install flow to run the upgrade on
the node reboot. Such upgrades require two reboots of the node, and it
requires two pulls of the installer image, but they should be much less
suspicious to the failure. Once the upgrade is staged, node can be
rebooted in any possible way, including hard reset and upgrade is
performed on the next boot.
New ADV format was implemented as well to allow to store install image
ref/options across reboots. New format allows for bigger values and
takes 50% of the `META` partition. Old ADV is still kept for
compatibility reasons.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Now config context is resolved only when it is about to be used, so that
client can operate in config-less mode if config is not required (e.g.
when doing `apply-config --insecure`).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is initial commit of the installer.
What's done:
- verifying node availability before starting any operations.
- gathering information about disks on the machine.
- allows setting: install disk, hostname, machine type, installer image,
kubernetes version, dns domain, cluster-name.
- dumps/merges talosconfig to a file after applying configuration.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/2766
This API is implemented in Maintenance and Machine services.
Can be used to generate configuration on the node, instead of using
talosctl to generate it locally.
To be used in interactive installer and talosctl gen config.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Now maintenance service implements `MachineService` interface, stubbing
all not implemented methods.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This fixes the reverse Go dependency from `pkg/machinery` to `talos`
package.
Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting
out of sync, this should prevent problems in the future.
Fix potential security issue in `token` authorizer to deny requests
without grpc metadata.
In provisioner, add support for launching nodes without the config
(config is not delivered to the provisioned nodes).
Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set
to the node type (as config can be missing now).
In `talosctl cluster create` add a flag to skip providing config to the
nodes so that they enter maintenance mode, while the generated configs
are written down to disk (so they can be tweaked and applied easily).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes were applied automatically.
Import ordering might be questionable, but it's strict:
* stdlib
* other packages
* same package imports
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Instead of hosting a web service, we decided to implement a gRPC service
that exposes APIs that can be used in a client-side interactive installer.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
Return proper message back to the client in case if called method is not
supported by mode any particular node runs in.
Fixes: https://github.com/talos-systems/talos/issues/2629
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Adds the ability to apply (replace) an existing node configuration with
a new one via the Machine API.
Fixes#2345
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>