* Replace logging.Wrap(log.Writer()) with zaptest.NewLogger(suite.T()) where possible.
* Replace reflect.DeepEqual with =|slices.Equal|bytes.Equal where possible.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fixes#8552
This fixes up the previous fix where `for` condition was inverted, and
also updates the idle timeout, so that the transition to idle happens
before the timeout expires.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8552
When `apid` notices update in the PKI, it flushes its client connections
to other machines (used for proxying), as it might need to use new
client certificate.
While flushing, just calling `Close` might abort already running
connections.
So instead, try to close gracefully with a timeout when the connection
is idle.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.
As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Update Go to 1.22.2, update Go modules to resolve
[HTTP/2 issue](https://www.kb.cert.org/vuls/id/421644).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
To be used in the `go-talos-support` module without importing the whole
Talos repo.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Fixes#8345
Both `apid` and `trustd` services use a gRPC connection back to
`machined` to watch changes to the certificates (new certificates being
issued).
This refactors the code to follow regular conventions, so that a failure
to watch will crash the process, and they have a way to restart and
re-establish the watch.
Use the context and errgroup consistently.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Previously a fix was deployed in the Talos API client, but when the
request passes through `apid`, we need to make sure that proxy doesn't
reject large responses.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This refactors code to handle partial machine config - only multi-doc
without v1alpha1 config.
This uses improvements from
https://github.com/cosi-project/runtime/pull/300:
* where possible, use `TransformController`
* use integrated tracker to reduce boilerplate
Sometimes fix/rewrite tests where applicable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6391
Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
columns in the table output
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This brings many fixes, including a new Watch with support for
Bootstapped and Errored event types.
`talosctl` from before this change is still compatible, as there's gRPC
API level backwards compatibility versioning.
New client doesn't yet depend on new event types, so it will work
against Talos 1.2.x.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.
All updates contain no functional changes, just refactorings to adapt to
the new path structure.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
I had to do several things:
- contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9
- dupword produces to many false positives, so it's also disabled
- revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now.
- change lint-vulncheck to use `base` stage as base
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Don't allow worker nodes to act as apid routers:
* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
one of the local addresses of the nodd, it routes the request to
itself without proxying
Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There is no need to use `assert.Implements` since we can express this check during compile time. Go will eliminate `_` variables and any accompanying allocations during dead-code elimination phase.
This commit also removes:
tok := new(v1alpha1.ClusterConfig).Token()
assert.Implements(t, (*config.Token)(nil), tok)
Code since it doesn't check anything - v1alpha1.ClusterConfig.Token() already returns a config.Token interface.
Also - run `go work sync` and `go mod tidy`.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This is enabled via a machine config feature/version contract, as
`talosconfig` certificate generated previously didn't have proper key
usage set, so we need to keep backwards compatibility on upgrades.
New v1.3+ clusters will include this check.
This check prevents even potential mis-use of server certificates as a
client certificate.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes a case when a node is rebooted, and connection via another
endpoint apid "caches" a connection error even when the node is up.
E.g. this command:
```
talosctl -e IP1 -n IP2 version
```
If node `IP2` is rebooted, `apid` at `IP1` might enter long backoff loop
and return an error still when `IP2` is actually up.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Overview: deprecate existing Talos resource API, and introduce new COSI
API.
Consequences:
* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This adds a new metadata field `node` which performs always proxying to
a single node without touching any protobuf structs on the way.
So with `node`, we can call APIs which do not conform to the Talos API
proxying standards, but from the UX point of view things will work same
way, but multiplexing will be handled on the client side.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes apid and machined shutdown sequences to do graceful stop of
gRPC server with timeout.
Also sequences are restructured to stop apid/machined as late as
possible allowing access to the node while the long sequence is running
(e.g. upgrade or reset).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.
There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Use `vtprobuf` optimized Marshal/Unmarshal methods which do not depend
on reflection to reduce memory and CPU usage while using Talos API.
See https://github.com/planetscale/vtprotobuf and
https://vitess.io/blog/2021-06-03-a-new-protobuf-generator-for-go/
Co-authored-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.
Closes#3421.
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
This PR can be split into two parts:
* controllers
* apid binding into COSI world
Controllers
-----------
* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource
apid Binding
------------
Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.
apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>