* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
columns in the table output
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This brings many fixes, including a new Watch with support for
Bootstapped and Errored event types.
`talosctl` from before this change is still compatible, as there's gRPC
API level backwards compatibility versioning.
New client doesn't yet depend on new event types, so it will work
against Talos 1.2.x.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.
All updates contain no functional changes, just refactorings to adapt to
the new path structure.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
I had to do several things:
- contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9
- dupword produces to many false positives, so it's also disabled
- revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now.
- change lint-vulncheck to use `base` stage as base
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Don't allow worker nodes to act as apid routers:
* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
one of the local addresses of the nodd, it routes the request to
itself without proxying
Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There is no need to use `assert.Implements` since we can express this check during compile time. Go will eliminate `_` variables and any accompanying allocations during dead-code elimination phase.
This commit also removes:
tok := new(v1alpha1.ClusterConfig).Token()
assert.Implements(t, (*config.Token)(nil), tok)
Code since it doesn't check anything - v1alpha1.ClusterConfig.Token() already returns a config.Token interface.
Also - run `go work sync` and `go mod tidy`.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This fixes a case when a node is rebooted, and connection via another
endpoint apid "caches" a connection error even when the node is up.
E.g. this command:
```
talosctl -e IP1 -n IP2 version
```
If node `IP2` is rebooted, `apid` at `IP1` might enter long backoff loop
and return an error still when `IP2` is actually up.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Overview: deprecate existing Talos resource API, and introduce new COSI
API.
Consequences:
* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This adds a new metadata field `node` which performs always proxying to
a single node without touching any protobuf structs on the way.
So with `node`, we can call APIs which do not conform to the Talos API
proxying standards, but from the UX point of view things will work same
way, but multiplexing will be handled on the client side.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Use `vtprobuf` optimized Marshal/Unmarshal methods which do not depend
on reflection to reduce memory and CPU usage while using Talos API.
See https://github.com/planetscale/vtprotobuf and
https://vitess.io/blog/2021-06-03-a-new-protobuf-generator-for-go/
Co-authored-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.
Closes#3421.
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
This PR can be split into two parts:
* controllers
* apid binding into COSI world
Controllers
-----------
* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource
apid Binding
------------
Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.
apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves endpoint refresh from the context of the service `apid` in
`machined` into `apid` service itself for the workers. `apid` does
initial poll for the endpoints when it boots, but also periodically
polls for new endpoints to make sure it has accurate list of `trustd`
endpoints to talk to, this handles cases when control plane endpoints
change (e.g. rolling replace of control plane nodes with new IPs).
Related to #3069Fixes#3068
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Instead of doing our homegrown "try all the endpoints" method,
use gRPC load-balancing across configured endpoints.
Generalize load-balancer via gRPC resolver we had in Talos API client,
use it in remote certificate generator code. Generalized resolver is
still under `machinery/`, as `pkg/grpc` is not in `machinery/`, and we
can't depend on Talos code from `machinery/`.
Related to: #3068
Full fix for #3068 requires dynamic updates to control plane endpoints
while apid is running, this is coming in the next PR.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Before these changes, errors were always sent as strings, so if original
error was gRPC error (which is almost always the case for apid), it is
formatted as string and original fields (like code) are lost in the
formatted string.
With this change, apid sends errors as official `grpc.Status` protobuf
structure, and client decodes that into Go grpc.Status based error.
This change is backwards and forwards compatible.
This should fix more cases when integration tests were not able to
ignore grpc `transport is closing` errors when they were sent as strings
from the apid endpoint.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes were applied automatically.
Import ordering might be questionable, but it's strict:
* stdlib
* other packages
* same package imports
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The goal of `pkg/grpc/tls` is to generate `*tls.Config` based on some
input parameters, but it had dependency on `pkg/grpc/gen` for 'remote'
certificate provider (it uses trustd client to sign CSRs).
Package `pkg/client` which is a part of future `machinery/` module
depends on `pkg/grpc/tls` for TLS config generation, so this pulls
`pkg/grpc` into `machinery/`, while it's not really good idea as most of
`pkg/grpc` is about server-side gRPC handling.
So the idea is to move `pkg/grpc/tls` (which has nothing to do with
gRPC), to `github.com/talos-systems/crypto/tls`, so we need to make sure
it has no dependencies on other Talos code.
The idea of this refactoring is to squash local & remote certificate
renewing providers as they had common part extracted, but even after
that almost all the code was identical except for different generators
beind used.
There should be no functional changes with this PR.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This makes `pkg/config` directly importable from other projects.
There should be no functional changes.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This allows node names for APID, but checks for empty or port-appended
addresses in a best-effort manner.
Also adds common function for checking addresses for the presence of an
appended port.
Fixes#2163
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This validates IPs by simple parsing and ensures `host:part` is correct
by using `net.FormatAddress`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.
A few highlights are:
- no more event bus
- functional approach to tasks (no more types defined for each task)
- the task function definition now offers a lot more context, like
access to raw API requests, the current sequence, a logger, the new
state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
install force option
Additionally, this pulls various packes in under machined to make the
code easier to navigate.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR cleans up the formatting for various package imports as they
were causing the linter to throw errors.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
New service `routerd` performs exactly single task: based on incoming
API call service name, it routes the requests to the appropriate Talos
service (`networkd`, `osd`, etc.) Service `routerd` listens of file
socket and routes requests to file sockets.
Service `apid` now does single task as well:
* it either fans out request to other `apid` services running on other
nodes and aggregates responses
* or it forwards requests to local `routerd` (when request destination
is local node)
Cons:
* one more proxying layer on request path
Pros:
* more clear service roles
* `routerd` is part of core Talos, services should register with it to
expose their API; no auth in the service (not exposed to the world)
* `apid` might be replaced with other implementation, it depends on TLS infra,
auth, etc.
* `apid` is better segregated from other Talos services (can only access
`routerd`, can't talk to other Talos services directly, so less exposure
in case of a bug)
This change is no-op to the end users.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR brings our protobuf files into conformance with the protobuf
style guide, and community conventions. It is purely renames, along with
generated docs.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Fixes#1610
1. In `talosconfig`, deprecate `Target` in favor of `Endpoints`
(client-side LB to come next).
2. In `osctl`, use `--nodes` in place of `--target`.
3. In `osctl` add option `--endpoints` to override `Endpoints` for the
call.
Other changes are just updates to catch up with the changes. Most
probably I missed something... And CAPI provider needs update.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This replaces codegen version of apid proxying with
talos-systems/grpc-proxy based version. Proxying is transparent, it
doesn't require exact information about methods and response types. It
requires some common layout response to enhance it properly with node
metadata or errors.
There should be no signifcant changes to the API with the previous
version, but it's worth mentioning a few changes:
1. grpc.ClientConn is established just once per upstream (either local
service or remote apid instance).
2. When called without `-t` (`targets`), apid proxies immediately down
to local service skipping proxying to itself (as before), which results
in empty node metadata in response (before it had local node IP). Might
revert this later to proxy to itself (?).
3. Streaming APIs are now fully supported with multiple targets, but
message definition doesn't contain `ResponseMetadata`, so streaming APIs
are broken now with targets (needs a fix).
4. Errors are now returned as responses with `Error` field set in
`ResponseMetadata`, this requires client library update and `osctl` to
handle it properly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>