95 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
c6f90d0149
chore: replace sync.Map with concurrent.HashTrieMap
Also bump `cosi-project/runtime` to the v0.4.4

Closes #8851

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 20:45:47 +03:00
Dmitriy Matrenichev
aca475c665
chore: small usability fixes
* Replace logging.Wrap(log.Writer()) with zaptest.NewLogger(suite.T()) where possible.
* Replace reflect.DeepEqual with =|slices.Equal|bytes.Equal where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 05:48:11 +03:00
Andrey Smirnov
41f92e0ba4
chore: update Go to 1.22.4, other updates
Bump go modules, adjust the code.

New linter warnings.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 20:59:52 +04:00
Andrey Smirnov
5d07ac5a7d
fix: close apid inter-backend connections gracefully for real
Fixes #8552

This fixes up the previous fix where `for` condition was inverted, and
also updates the idle timeout, so that the transition to idle happens
before the timeout expires.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-16 16:21:34 +04:00
Andrey Smirnov
336e611746
fix: close the apid connection to other machines gracefully
Fixes #8552

When `apid` notices update in the PKI, it flushes its client connections
to other machines (used for proxying), as it might need to use new
client certificate.

While flushing, just calling `Close` might abort already running
connections.

So instead, try to close gracefully with a timeout when the connection
is idle.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-08 19:47:04 +04:00
Dmitry Sharshakov
653f838b09
feat: support multiple Docker cluster in talosctl cluster create
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.

As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 21:21:39 +04:00
Andrey Smirnov
951904554e
chore: bump dependencies (go 1.22.2)
Update Go to 1.22.2, update Go modules to resolve
[HTTP/2 issue](https://www.kb.cert.org/vuls/id/421644).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 14:59:24 +04:00
Andrey Smirnov
8eacc4ba80
feat: support rotation of Talos API CA
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-22 12:16:47 +04:00
Dmitriy Matrenichev
19f15a840c
chore: bump golangci-lint to 1.57.0
Fix all discovered issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-21 01:06:53 +03:00
Artem Chernyshev
3c8f51d707
chore: move cli formatters and version modules to machinery
To be used in the `go-talos-support` module without importing the whole
Talos repo.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-07 16:29:15 +03:00
Andrey Smirnov
67ac6933d3
fix: handle errors to watch apid/trustd certs
Fixes #8345

Both `apid` and `trustd` services use a gRPC connection back to
`machined` to watch changes to the certificates (new certificates being
issued).

This refactors the code to follow regular conventions, so that a failure
to watch will crash the process, and they have a way to restart and
re-establish the watch.

Use the context and errgroup consistently.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-23 17:38:56 +04:00
Dmitriy Matrenichev
fa3b933705
chore: replace fmt.Errorf with errors.New where possible
This time use `eg` from `x/tools` repo tool to do this.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-14 17:39:30 +03:00
Andrey Smirnov
e9c7ac17a9
fix: set max msg recv size when proxying
Previously a fix was deployed in the Talos API client, but when the
request passes through `apid`, we need to make sure that proxy doesn't
reject large responses.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-09 21:08:54 +04:00
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Andrey Smirnov
a096f05a56
chore: update gRPC library and enable shared write buffers
Fixes #7576

See https://github.com/grpc/grpc-go/pull/6309

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-13 21:27:46 +04:00
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Andrey Smirnov
544cb4fe7d
refactor: accept partial machine configuration
This refactors code to handle partial machine config - only multi-doc
without v1alpha1 config.

This uses improvements from
https://github.com/cosi-project/runtime/pull/300:

* where possible, use `TransformController`
* use integrated tracker to reduce boilerplate

Sometimes fix/rewrite tests where applicable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 17:00:42 +04:00
Andrey Smirnov
6b39c6a4d3
fix: enable compression and bump gRPC max msg size
Fixes #7482

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-19 22:46:37 +04:00
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Andrey Smirnov
bb02dd263c
chore: drop deprecated stuff for Talos 1.5
* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
  columns in the table output

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-18 19:46:37 +04:00
Andrey Smirnov
2ebe410e93
feat: update COSI to v0.2.0
This brings many fixes, including a new Watch with support for
Bootstapped and Errored event types.

`talosctl` from before this change is still compatible, as there's gRPC
API level backwards compatibility versioning.

New client doesn't yet depend on new event types, so it will work
against Talos 1.2.x.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-29 21:21:59 +04:00
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
343c55762e
chore: replace talos-systems Go modules with siderolabs
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.

All updates contain no functional changes, just refactorings to adapt to
the new path structure.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-01 12:55:40 +04:00
Tim Jones
e6fba7d3bc
chore: update dependencies
Updates:
* pkgs v1.3.0-alpha.0-33-g8fe5cbc
* tools v1.3.0-alpha.0-20-g3b5f89a
* aws-sdk-go v1.44.120
* docker v20.10.20+incompatible
* fsnotify v1.6.0
* nftables v0.0.0-20221015190445-4f5cd5826fbd
* gen v0.4.0
* grpc-proxy v0.4.0
* spf13/cobra v1.6.0
* u-root v0.10.0
* x/net v0.1.0
* x/sync v0.1.0
* x/sys v0.1.0
* x/term v0.1.0
* x/time v0.1.0
* grpc v1.50.1
* genproto v0.0.0-20221018160656-63c7b68cfc55
* Linux kernel 5.15.74

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-10-21 15:20:01 +04:00
Dmitriy Matrenichev
93e55b85f2
chore: bump golangci-lint to v1.50.0
I had to do several things:
- contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9
- dupword produces to many false positives, so it's also disabled
- revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now.
- change lint-vulncheck to use `base` stage as base

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-10-20 18:33:19 +03:00
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Andrey Smirnov
2dadcd6695
fix: stop worker nodes from acting as apid routers
Don't allow worker nodes to act as apid routers:

* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
  one of the local addresses of the nodd, it routes the request to
  itself without proxying

Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:07:31 +04:00
Dmitriy Matrenichev
12827b861c
chore: move "implements" checks to compile time
There is no need to use `assert.Implements` since we can express this check during compile time. Go will eliminate `_` variables and any accompanying allocations during dead-code elimination phase.

This commit also removes:

    tok := new(v1alpha1.ClusterConfig).Token()
	assert.Implements(t, (*config.Token)(nil), tok)

Code since it doesn't check anything - v1alpha1.ClusterConfig.Token() already returns a config.Token interface.

Also - run `go work sync` and `go mod tidy`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-12 16:57:24 +03:00
Andrey Smirnov
161a52a9ef
feat: check apid client certificate extended key usage
This is enabled via a machine config feature/version contract, as
`talosconfig` certificate generated previously didn't have proper key
usage set, so we need to keep backwards compatibility on upgrades.

New v1.3+ clusters will include this check.

This check prevents even potential mis-use of server certificates as a
client certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-09 16:37:21 +04:00
Andrey Smirnov
f62d17125b
chore: update crypto to use new import path siderolabs/crypto
No functional changes in this PR, just updating import paths.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-07 23:02:50 +04:00
Andrey Smirnov
f6fa746193
fix: limit apid backoff max delay
This fixes a case when a node is rebooted, and connection via another
endpoint apid "caches" a connection error even when the node is up.

E.g. this command:

```
talosctl -e IP1 -n IP2 version
```

If node `IP2` is rebooted, `apid` at `IP1` might enter long backoff loop
and return an error still when `IP2` is actually up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 21:59:46 +04:00
Andrey Smirnov
8c3ac4c42b
chore: limit GOMAXPROCS for Talos services
Fixes #5971

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-24 15:42:49 +04:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Andrey Smirnov
1dee0579e9
feat: add support for proxying one-to-one to apid
This adds a new metadata field `node` which performs always proxying to
a single node without touching any protobuf structs on the way.

So with `node`, we can call APIs which do not conform to the Talos API
proxying standards, but from the UX point of view things will work same
way, but multiplexing will be handled on the client side.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-08 15:39:22 +04:00
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Andrey Smirnov
2e790526f7
refactor: make apid stop gracefully and be stopped late
This fixes apid and machined shutdown sequences to do graceful stop of
gRPC server with timeout.

Also sequences are restructured to stop apid/machined as late as
possible allowing access to the node while the long sequence is running
(e.g. upgrade or reset).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 14:52:04 +04:00
Andrey Smirnov
065b59276c
feat: implement packet capture API
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.

There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-19 01:23:09 +04:00
Dmitriy Matrenichev
70fc424099
chore: add generic methods and use them
Things like ToSet, Keys etc...

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 02:59:23 +08:00
Dmitriy Matrenichev
6351928611
chore: redo pointer with github.com/siderolabs/go-pointer module
With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-02 02:17:13 +04:00
Dmitriy Matrenichev
9b9191c5e7
fix: increase intiial window and connection window sizes
For #4950

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-25 01:38:59 +04:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
dc9a0cfe94
chore: bump Go dependencies
Bump all dependencies, update `grpc.WithInsecure()` which is deprecated
now.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-20 23:05:32 +03:00
Alexey Palazhchenko
0f169bf9b1
chore: add API deprecations mechanism
Refs #4576.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-30 06:31:55 +00:00
Andrey Smirnov
753a82188f
refactor: move pkg/resources to machinery
Fixes #4420

No functional changes, just moving packages around.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-15 19:50:35 +03:00
Alexey Palazhchenko
fc7dc45484
chore: check our API idiosyncrasies
Closes #3760.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-08 11:50:27 +00:00
Alexey Palazhchenko
4dae9ea55c chore: use vtprotobuf compiled marshaling in Talos API
Use `vtprobuf` optimized Marshal/Unmarshal methods which do not depend
on reflection to reduce memory and CPU usage while using Talos API.

See https://github.com/planetscale/vtprotobuf and
https://vitess.io/blog/2021-06-03-a-new-protobuf-generator-for-go/

Co-authored-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-08-09 08:42:13 -07:00
Alexey Palazhchenko
ad047a7dee chore: small RBAC improvements
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.

Closes #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-25 05:50:38 -07:00
Andrey Smirnov
d8c2bca1b5 feat: reimplement apid certificate generation on top of COSI
This PR can be split into two parts:

* controllers
* apid binding into COSI world

Controllers
-----------

* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource

apid Binding
------------

Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.

apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-23 13:07:00 -07:00
Alexey Palazhchenko
f63ab9dd9b feat: implement talosctl config new command
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-17 09:06:43 -07:00
Alexey Palazhchenko
da329f00ab feat: enable RBAC by default
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-09 11:03:56 -07:00