Fixes#4231
This allows much faster worker join on `apid` level without waiting for
Kubernetes control plane to be up (and even if the Kubernetes control
plane doesn't go up or the kubelet isn't up).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This distributes API CA (just the certificate, not the key) to the
worker nodes on config generation, and if the CA cert is present on the
worker node, it verifies TLS connection to the trustd with the CA
certificate.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This basically provides `talosctl get --insecure` in maintenance mode.
Only non-sensitive resources are available (equivalent to having
`os:reader` role in the Talos client certificate).
Changes:
* refactored insecure/maintenance client setup in talosctl
* `LinkStatus` is no longer sensitive as it shows only Wireguard public
key, `LinkSpec` still contains private key for obvious reasons
* maintenance mode injects `os:reader` role implicitly
The motivation behind this PR is to deprecate networkd-era interfaces &
routes APIs which are being used in TUI installer, and we need a
replacement.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Use `vtprobuf` optimized Marshal/Unmarshal methods which do not depend
on reflection to reduce memory and CPU usage while using Talos API.
See https://github.com/planetscale/vtprotobuf and
https://vitess.io/blog/2021-06-03-a-new-protobuf-generator-for-go/
Co-authored-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
This fixes endless block on RemoteGenerator.Close method rewriting the
RemoteGenerator using the retry package.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.
Closes#3421.
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
This PR can be split into two parts:
* controllers
* apid binding into COSI world
Controllers
-----------
* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource
apid Binding
------------
Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.
apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Also fix recovery grpc handler to print panic stacktrace to the log.
Any API should follow the structure compatible with apid proxying
injection of errors/nodes.
Explicitly fail GenerateConfig API on worker nodes, as it panics on
worker nodes (missing certificates in node config).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves endpoint refresh from the context of the service `apid` in
`machined` into `apid` service itself for the workers. `apid` does
initial poll for the endpoints when it boots, but also periodically
polls for new endpoints to make sure it has accurate list of `trustd`
endpoints to talk to, this handles cases when control plane endpoints
change (e.g. rolling replace of control plane nodes with new IPs).
Related to #3069Fixes#3068
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Instead of doing our homegrown "try all the endpoints" method,
use gRPC load-balancing across configured endpoints.
Generalize load-balancer via gRPC resolver we had in Talos API client,
use it in remote certificate generator code. Generalized resolver is
still under `machinery/`, as `pkg/grpc` is not in `machinery/`, and we
can't depend on Talos code from `machinery/`.
Related to: #3068
Full fix for #3068 requires dynamic updates to control plane endpoints
while apid is running, this is coming in the next PR.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes the reverse Go dependency from `pkg/machinery` to `talos`
package.
Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting
out of sync, this should prevent problems in the future.
Fix potential security issue in `token` authorizer to deny requests
without grpc metadata.
In provisioner, add support for launching nodes without the config
(config is not delivered to the provisioned nodes).
Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set
to the node type (as config can be missing now).
In `talosctl cluster create` add a flag to skip providing config to the
nodes so that they enter maintenance mode, while the generated configs
are written down to disk (so they can be tweaked and applied easily).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes were applied automatically.
Import ordering might be questionable, but it's strict:
* stdlib
* other packages
* same package imports
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The goal of `pkg/grpc/tls` is to generate `*tls.Config` based on some
input parameters, but it had dependency on `pkg/grpc/gen` for 'remote'
certificate provider (it uses trustd client to sign CSRs).
Package `pkg/client` which is a part of future `machinery/` module
depends on `pkg/grpc/tls` for TLS config generation, so this pulls
`pkg/grpc` into `machinery/`, while it's not really good idea as most of
`pkg/grpc` is about server-side gRPC handling.
So the idea is to move `pkg/grpc/tls` (which has nothing to do with
gRPC), to `github.com/talos-systems/crypto/tls`, so we need to make sure
it has no dependencies on other Talos code.
The idea of this refactoring is to squash local & remote certificate
renewing providers as they had common part extracted, but even after
that almost all the code was identical except for different generators
beind used.
There should be no functional changes with this PR.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2272
`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.
This contains fixes from:
* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
New service `routerd` performs exactly single task: based on incoming
API call service name, it routes the requests to the appropriate Talos
service (`networkd`, `osd`, etc.) Service `routerd` listens of file
socket and routes requests to file sockets.
Service `apid` now does single task as well:
* it either fans out request to other `apid` services running on other
nodes and aggregates responses
* or it forwards requests to local `routerd` (when request destination
is local node)
Cons:
* one more proxying layer on request path
Pros:
* more clear service roles
* `routerd` is part of core Talos, services should register with it to
expose their API; no auth in the service (not exposed to the world)
* `apid` might be replaced with other implementation, it depends on TLS infra,
auth, etc.
* `apid` is better segregated from other Talos services (can only access
`routerd`, can't talk to other Talos services directly, so less exposure
in case of a bug)
This change is no-op to the end users.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The default gRPC dialer honors proxy environment variables, which causes
local unix socket connections to attempt to go through the proxy. This
fixes that by using a custom dialer.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR brings our protobuf files into conformance with the protobuf
style guide, and community conventions. It is purely renames, along with
generated docs.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
In `tls.Config`, there are two hooks for getting certificate for client
and server config. So we need separate configuration methods to
configure them both.
Required in apid to provide refreshing TLS client cert to
grpc.ClientConn.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Logging middleware should be the first one to log the request properly
including logging before proxy goes into action.
I had sec -> msec convertion wrong, but in the end I thought I should
replace it simply with `duration.String()` which is nicer.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Logging is pretty simple and bare minimum is being logged. I believe
better logging can be provided for apid when it does fan-out, but that
is beyond the scope for the first PR.
Sample logs:
```
$ osctl-linux-amd64 logs machined-api
machined 2019/11/11 21:16:43 OK [/machine.Machine/ServiceList] 0.000ms unary Success (:authority=unix:/run/system/machined/machine.sock;content-type=application/grpc;user-agent=grpc-go/1.23.0)
machined 2019/11/11 21:17:09 Unknown [/machine.Machine/Logs] 0.000ms stream open /run/system/log/machined.log: no such file or directory (:authority=unix:/run/system/machined/machine.sock;content-type=application/grpc;user-agent=grpc-go/1.23.0)
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This installs default middleware to recover from panics (convert them to
errors) in all the grpc servers by default.
Slight refactoring to allow that as grpc can only accept Unary/Stream
interceptors only once.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This detangles the gRPC client code from the userdata code. The
motivation behind this is to make creating clients more simple and not
dependent on our configuration format.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
In order for other projects to make use of our APIs, they must not
reside underneath the internal directory. This moves the protobuf
definitions to a top-level "api" directory and scopes them according to
their domain. This change also removes generated code from the gitignore
file so that users don't have to generate the code themseleves.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
The gofumports does everything that gofumpt does with the addition of
formatting imports. This change proposes the use of the `-local` flag so
that we can have imports separated in the following order:
- standard library
- third party
- Talos specific
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
The gofumpt linter is a stricter drop-in replacement for gofmt. The
rules are ones that I strongly agree with and I think it would be better
if we added this linter instead of nit picking every PR.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
When talking to an IPv6 address for a gRPC server, enclose the IPv6
address in brackets.
Also fixes backwards implementation of IPv4/IPv6 test.
Fixes#983
Signed-off-by: Seán C McCord <ulexus@gmail.com>