68 Commits

Author SHA1 Message Date
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
343c55762e
chore: replace talos-systems Go modules with siderolabs
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.

All updates contain no functional changes, just refactorings to adapt to
the new path structure.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-01 12:55:40 +04:00
Andrey Smirnov
d7edd0e2e6
refactor: use go-circular, go-kubeconfig, and go-tail
Remove Talos versions, use new extracted Go modules.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-10-25 20:20:44 +04:00
Tim Jones
e6fba7d3bc
chore: update dependencies
Updates:
* pkgs v1.3.0-alpha.0-33-g8fe5cbc
* tools v1.3.0-alpha.0-20-g3b5f89a
* aws-sdk-go v1.44.120
* docker v20.10.20+incompatible
* fsnotify v1.6.0
* nftables v0.0.0-20221015190445-4f5cd5826fbd
* gen v0.4.0
* grpc-proxy v0.4.0
* spf13/cobra v1.6.0
* u-root v0.10.0
* x/net v0.1.0
* x/sync v0.1.0
* x/sys v0.1.0
* x/term v0.1.0
* x/time v0.1.0
* grpc v1.50.1
* genproto v0.0.0-20221018160656-63c7b68cfc55
* Linux kernel 5.15.74

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-10-21 15:20:01 +04:00
Dmitriy Matrenichev
93e55b85f2
chore: bump golangci-lint to v1.50.0
I had to do several things:
- contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9
- dupword produces to many false positives, so it's also disabled
- revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now.
- change lint-vulncheck to use `base` stage as base

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-10-20 18:33:19 +03:00
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Dmitriy Matrenichev
12827b861c
chore: move "implements" checks to compile time
There is no need to use `assert.Implements` since we can express this check during compile time. Go will eliminate `_` variables and any accompanying allocations during dead-code elimination phase.

This commit also removes:

    tok := new(v1alpha1.ClusterConfig).Token()
	assert.Implements(t, (*config.Token)(nil), tok)

Code since it doesn't check anything - v1alpha1.ClusterConfig.Token() already returns a config.Token interface.

Also - run `go work sync` and `go mod tidy`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-12 16:57:24 +03:00
Andrey Smirnov
e626540dfb
chore: avoid double API request logging in trustd
There's a common logger for API calls already working, so no need to log
in the token authenticator.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-08 14:18:39 +04:00
Andrey Smirnov
f62d17125b
chore: update crypto to use new import path siderolabs/crypto
No functional changes in this PR, just updating import paths.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-07 23:02:50 +04:00
Philipp Sauter
6b23deddcf
feat: support custom ports for connecting to apid from talosctl
Users can now add a port suffix to the endpoints used by talosctl. Either
in the CLI flag or the ~/.talos/config. The default port is still 50000.

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-08-11 16:52:46 +02:00
Andrey Smirnov
92314e47bf
refactor: use controllers/resources to feed trustd with data
This is mostly same as the way `apid` consumes certificates generated by
`machined` via COSI API connection.

Service `trustd` consumes two resources:

* `secrets.Trustd` which contains `trustd` server TLS certificates and
  it gets refreshed as e.g. node IP changes
* `secrets.OSRoot` which contains Talos API CA and join token

This PR fixes an issue with `trustd` certs not always including all IPs
of the node, as previously `trustd` certs will only capture addresses of
the node at the moment of `trustd` startup.

Another thing is that refactoring allows to dynamically change API CA
and join token. This needs more work, but `trustd` should now pick up
changes without any additional changes.

Fixes #5863

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 23:45:34 +04:00
Andrey Smirnov
2e790526f7
refactor: make apid stop gracefully and be stopped late
This fixes apid and machined shutdown sequences to do graceful stop of
gRPC server with timeout.

Also sequences are restructured to stop apid/machined as late as
possible allowing access to the node while the long sequence is running
(e.g. upgrade or reset).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 14:52:04 +04:00
Dmitriy Matrenichev
4dbbf4ac50
chore: add generic methods and use them part #2
Use things from #5702.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 23:10:02 +08:00
Dmitriy Matrenichev
8c675c6692
chore: siderolink maintenance mode
If SideroLink is enabled, maintenance mode should only allow Siderolink connections.

Closes #5627

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-25 02:23:58 +08:00
Dmitriy Matrenichev
9b9191c5e7
fix: increase intiial window and connection window sizes
For #4950

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-25 01:38:59 +04:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
6ccfdbaf1b
fix: avoid replacing default gRPC codec in machinery
Fixes #4987

As machinery is supposed to be widely used project, and gRPC lacks
proper support to override default codec easily, it might come into
conflict with other projects.

Instead, move codec to core talos, and register it explicitly in the
server code (which covers machined, apid, trustd) and client code
(talosctl).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-18 00:39:41 +03:00
Andrey Smirnov
949464e4b6
fix: use leaf certificate in the apid RBAC check
Fixes #4910

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-02 15:14:03 +03:00
Andrey Smirnov
dc9a0cfe94
chore: bump Go dependencies
Bump all dependencies, update `grpc.WithInsecure()` which is deprecated
now.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-20 23:05:32 +03:00
Andrey Smirnov
12f7888b75
feat: feed control plane endpoints on workers from cluster discovery
Fixes #4231

This allows much faster worker join on `apid` level without waiting for
Kubernetes control plane to be up (and even if the Kubernetes control
plane doesn't go up or the kubelet isn't up).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-18 18:49:29 +03:00
Alexey Palazhchenko
e60469a38c
feat: initial support for JSON logging
Hook into logging machinery.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-10-16 16:46:59 +00:00
Andrey Smirnov
62acd62516
fix: check trustd API CA on worker nodes
This distributes API CA (just the certificate, not the key) to the
worker nodes on config generation, and if the CA cert is present on the
worker node, it verifies TLS connection to the trustd with the CA
certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 15:14:23 +03:00
Andrey Smirnov
2b5204200a
feat: enable resource API in the maintenance mode
This basically provides `talosctl get --insecure` in maintenance mode.
Only non-sensitive resources are available (equivalent to having
`os:reader` role in the Talos client certificate).

Changes:

* refactored insecure/maintenance client setup in talosctl
* `LinkStatus` is no longer sensitive as it shows only Wireguard public
key, `LinkSpec` still contains private key for obvious reasons
* maintenance mode injects `os:reader` role implicitly

The motivation behind this PR is to deprecate networkd-era interfaces &
routes APIs which are being used in TUI installer, and we need a
replacement.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-22 21:36:34 +03:00
Alexey Palazhchenko
4dae9ea55c chore: use vtprotobuf compiled marshaling in Talos API
Use `vtprobuf` optimized Marshal/Unmarshal methods which do not depend
on reflection to reduce memory and CPU usage while using Talos API.

See https://github.com/planetscale/vtprotobuf and
https://vitess.io/blog/2021-06-03-a-new-protobuf-generator-for-go/

Co-authored-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-08-09 08:42:13 -07:00
Alexey Palazhchenko
eea750de2c chore: rename "join" type to "worker"
Closes #3413.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-07-09 07:10:45 -07:00
Andrey Smirnov
07fb61e5d2 fix: issue worker apid certs properly on renewal
This fixes endless block on RemoteGenerator.Close method rewriting the
RemoteGenerator using the retry package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-29 09:02:35 -07:00
Alexey Palazhchenko
ad047a7dee chore: small RBAC improvements
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.

Closes #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-25 05:50:38 -07:00
Andrey Smirnov
d8c2bca1b5 feat: reimplement apid certificate generation on top of COSI
This PR can be split into two parts:

* controllers
* apid binding into COSI world

Controllers
-----------

* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource

apid Binding
------------

Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.

apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-23 13:07:00 -07:00
Alexey Palazhchenko
06209bba28 chore: update RBAC rules, remove old APIs
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-18 09:54:49 -07:00
Alexey Palazhchenko
f63ab9dd9b feat: implement talosctl config new command
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-17 09:06:43 -07:00
Alexey Palazhchenko
da329f00ab feat: enable RBAC by default
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-09 11:03:56 -07:00
Alexey Palazhchenko
0f168a8801 feat: add configuration for enabling RBAC
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-08 09:30:41 -07:00
Alexey Palazhchenko
5ad314fe7e feat: implement basic RBAC interceptors
It is not enforced yet.

Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-07 09:28:22 -07:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
d99a016af2 fix: correct response structure for GenerateConfig API
Also fix recovery grpc handler to print panic stacktrace to the log.

Any API should follow the structure compatible with apid proxying
injection of errors/nodes.

Explicitly fail GenerateConfig API on worker nodes, as it panics on
worker nodes (missing certificates in node config).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-11 06:34:10 -08:00
Andrey Smirnov
5855b8d532 fix: refresh control plane endpoints on worker apids on schedule
This moves endpoint refresh from the context of the service `apid` in
`machined` into `apid` service itself for the workers. `apid` does
initial poll for the endpoints when it boots, but also periodically
polls for new endpoints to make sure it has accurate list of `trustd`
endpoints to talk to, this handles cases when control plane endpoints
change (e.g. rolling replace of control plane nodes with new IPs).

Related to #3069

Fixes #3068

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-03 14:27:03 -08:00
Andrey Smirnov
389349c02b fix: use grpc load-balancing when connecting to trustd
Instead of doing our homegrown "try all the endpoints" method,
use gRPC load-balancing across configured endpoints.

Generalize load-balancer via gRPC resolver we had in Talos API client,
use it in remote certificate generator code. Generalized resolver is
still under `machinery/`, as `pkg/grpc` is not in `machinery/`, and we
can't depend on Talos code from `machinery/`.

Related to: #3068

Full fix for #3068 requires dynamic updates to control plane endpoints
while apid is running, this is coming in the next PR.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-01 16:48:00 -08:00
Andrey Smirnov
b2b86a622e fix: remove 'token creds' from maintenance service
This fixes the reverse Go dependency from `pkg/machinery` to `talos`
package.

Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting
out of sync, this should prevent problems in the future.

Fix potential security issue in `token` authorizer to deny requests
without grpc metadata.

In provisioner, add support for launching nodes without the config
(config is not delivered to the provisioned nodes).

Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set
to the node type (as config can be missing now).

In `talosctl cluster create` add a flag to skip providing config to the
nodes so that they enter maintenance mode, while the generated configs
are written down to disk (so they can be tweaked and applied easily).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 14:10:32 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
7875e9499f chore: re-import talos-systems/pkg/crypto/tls
See also https://github.com/talos-systems/crypto/pull/2

This should break dependency of `pkg/client` on `pkg/grpc`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 08:06:38 -07:00
Andrey Smirnov
bd0f4f0564 refactor: rework pkg/grpc/tls to break dependency on pkg/grpc/gen
The goal of `pkg/grpc/tls` is to generate `*tls.Config` based on some
input parameters, but it had dependency on `pkg/grpc/gen` for 'remote'
certificate provider (it uses trustd client to sign CSRs).

Package `pkg/client` which is a part of future `machinery/` module
depends on `pkg/grpc/tls` for TLS config generation, so this pulls
`pkg/grpc` into `machinery/`, while it's not really good idea as most of
`pkg/grpc` is about server-side gRPC handling.

So the idea is to move `pkg/grpc/tls` (which has nothing to do with
gRPC), to `github.com/talos-systems/crypto/tls`, so we need to make sure
it has no dependencies on other Talos code.

The idea of this refactoring is to squash local & remote certificate
renewing providers as they had common part extracted, but even after
that almost all the code was identical except for different generators
beind used.

There should be no functional changes with this PR.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 06:41:21 -07:00
Andrey Smirnov
2697b99b7d refactor: extract pkg/net as github.com/talos-systems/net
This extracts common package as new module/repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 11:04:50 -07:00
Andrey Smirnov
52c5911fcd chore: extract pkg/crypto as external module
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 06:33:30 -07:00
Andrey Smirnov
41d5f7859a chore: update golangci-lint to 1.28.3
Fixes #2272

`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.

This contains fixes from:

* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 08:05:42 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrey Smirnov
a068acfbe4 feat: split routerd from apid
New service `routerd` performs exactly single task: based on incoming
API call service name, it routes the requests to the appropriate Talos
service (`networkd`, `osd`, etc.) Service `routerd` listens of file
socket and routes requests to file sockets.

Service `apid` now does single task as well:

* it either fans out request to other `apid` services running on other
nodes and aggregates responses
* or it forwards requests to local `routerd` (when request destination
is local node)

Cons:

* one more proxying layer on request path

Pros:

* more clear service roles
* `routerd` is part of core Talos, services should register with it to
expose their API; no auth in the service (not exposed to the world)
* `apid` might be replaced with other implementation, it depends on TLS infra,
auth, etc.
* `apid` is better segregated from other Talos services (can only access
`routerd`, can't talk to other Talos services directly, so less exposure
in case of a bug)

This change is no-op to the end users.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-05 22:05:56 +03:00
Andrew Rynhard
fcaed8b0dd fix: don't proxy gRPC unix connections
The default gRPC dialer honors proxy environment variables, which causes
local unix socket connections to attempt to go through the proxy. This
fixes that by using a custom dialer.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-10 05:37:50 -08:00
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00