70 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Andrey Smirnov
2dadcd6695
fix: stop worker nodes from acting as apid routers
Don't allow worker nodes to act as apid routers:

* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
  one of the local addresses of the nodd, it routes the request to
  itself without proxying

Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-13 15:07:31 +04:00
Dmitriy Matrenichev
12827b861c
chore: move "implements" checks to compile time
There is no need to use `assert.Implements` since we can express this check during compile time. Go will eliminate `_` variables and any accompanying allocations during dead-code elimination phase.

This commit also removes:

    tok := new(v1alpha1.ClusterConfig).Token()
	assert.Implements(t, (*config.Token)(nil), tok)

Code since it doesn't check anything - v1alpha1.ClusterConfig.Token() already returns a config.Token interface.

Also - run `go work sync` and `go mod tidy`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-12 16:57:24 +03:00
Andrey Smirnov
161a52a9ef
feat: check apid client certificate extended key usage
This is enabled via a machine config feature/version contract, as
`talosconfig` certificate generated previously didn't have proper key
usage set, so we need to keep backwards compatibility on upgrades.

New v1.3+ clusters will include this check.

This check prevents even potential mis-use of server certificates as a
client certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-09 16:37:21 +04:00
Andrey Smirnov
f62d17125b
chore: update crypto to use new import path siderolabs/crypto
No functional changes in this PR, just updating import paths.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-07 23:02:50 +04:00
Andrey Smirnov
f6fa746193
fix: limit apid backoff max delay
This fixes a case when a node is rebooted, and connection via another
endpoint apid "caches" a connection error even when the node is up.

E.g. this command:

```
talosctl -e IP1 -n IP2 version
```

If node `IP2` is rebooted, `apid` at `IP1` might enter long backoff loop
and return an error still when `IP2` is actually up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 21:59:46 +04:00
Andrey Smirnov
8c3ac4c42b
chore: limit GOMAXPROCS for Talos services
Fixes #5971

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-24 15:42:49 +04:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Andrey Smirnov
1dee0579e9
feat: add support for proxying one-to-one to apid
This adds a new metadata field `node` which performs always proxying to
a single node without touching any protobuf structs on the way.

So with `node`, we can call APIs which do not conform to the Talos API
proxying standards, but from the UX point of view things will work same
way, but multiplexing will be handled on the client side.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-08 15:39:22 +04:00
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Andrey Smirnov
2e790526f7
refactor: make apid stop gracefully and be stopped late
This fixes apid and machined shutdown sequences to do graceful stop of
gRPC server with timeout.

Also sequences are restructured to stop apid/machined as late as
possible allowing access to the node while the long sequence is running
(e.g. upgrade or reset).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 14:52:04 +04:00
Andrey Smirnov
065b59276c
feat: implement packet capture API
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.

There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-19 01:23:09 +04:00
Dmitriy Matrenichev
70fc424099
chore: add generic methods and use them
Things like ToSet, Keys etc...

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 02:59:23 +08:00
Dmitriy Matrenichev
6351928611
chore: redo pointer with github.com/siderolabs/go-pointer module
With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-02 02:17:13 +04:00
Dmitriy Matrenichev
9b9191c5e7
fix: increase intiial window and connection window sizes
For #4950

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-25 01:38:59 +04:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
dc9a0cfe94
chore: bump Go dependencies
Bump all dependencies, update `grpc.WithInsecure()` which is deprecated
now.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-20 23:05:32 +03:00
Alexey Palazhchenko
0f169bf9b1
chore: add API deprecations mechanism
Refs #4576.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-30 06:31:55 +00:00
Andrey Smirnov
753a82188f
refactor: move pkg/resources to machinery
Fixes #4420

No functional changes, just moving packages around.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-15 19:50:35 +03:00
Alexey Palazhchenko
fc7dc45484
chore: check our API idiosyncrasies
Closes #3760.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-08 11:50:27 +00:00
Alexey Palazhchenko
4dae9ea55c chore: use vtprotobuf compiled marshaling in Talos API
Use `vtprobuf` optimized Marshal/Unmarshal methods which do not depend
on reflection to reduce memory and CPU usage while using Talos API.

See https://github.com/planetscale/vtprotobuf and
https://vitess.io/blog/2021-06-03-a-new-protobuf-generator-for-go/

Co-authored-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-08-09 08:42:13 -07:00
Alexey Palazhchenko
ad047a7dee chore: small RBAC improvements
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.

Closes #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-25 05:50:38 -07:00
Andrey Smirnov
d8c2bca1b5 feat: reimplement apid certificate generation on top of COSI
This PR can be split into two parts:

* controllers
* apid binding into COSI world

Controllers
-----------

* `k8s.EndpointController` provides control plane endpoints on worker
nodes (it isn't required for now on control plane nodes)
* `secrets.RootController` now provides OS top-level secrets (CA cert)
and secret configuration
* `secrets.APIController` generates API secrets (certificates) in a bit
different way for workers and control plane nodes: controlplane nodes
generate directly, while workers reach out to `trustd` on control plane
nodes via `k8s.Endpoint` resource

apid Binding
------------

Resource `secrets.API` provides binding to protobuf by converting
itself back and forth to protobuf spec.

apid no longer receives machine configuration, instead it receives
gRPC-backed socket to access Resource API. apid watches `secrets.API`
resource, fetches certs and CA from it and uses that in its TLS
configuration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-23 13:07:00 -07:00
Alexey Palazhchenko
f63ab9dd9b feat: implement talosctl config new command
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-17 09:06:43 -07:00
Alexey Palazhchenko
da329f00ab feat: enable RBAC by default
Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-09 11:03:56 -07:00
Alexey Palazhchenko
5ad314fe7e feat: implement basic RBAC interceptors
It is not enforced yet.

Refs #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-07 09:28:22 -07:00
Alexey Palazhchenko
c81cfb2167 chore: allow building with debug handlers
Refs #3534.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-05-13 02:20:15 -07:00
Andrey Smirnov
e664362cec feat: add API and command to save etcd snapshot (backup)
This adds a simple API and `talosctl etcd snapshot` command to stream
snapshot of etcd from one of the control plane nodes to the local file.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-02 09:20:16 -07:00
Andrey Smirnov
b0209fd29d refactor: move networkd, timed APIs to machined, remove routerd
This moves implementation of the user-facing APIs to the machined, and
as now all the APIs are implemented by machined, remove routerd and
adjust apid to proxy to machined.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-24 00:00:28 -07:00
Andrey Smirnov
ac8764702f refactor: move apid, routerd, timed and trustd to single executable
This removes container images for the aforementioned services, they are
now built into `machined` executable which launches one or another
service based on `argv[0]`.

Containers are started with rootfs directory which contains only a
single executable file for the service.

This creates rootfs on squashfs for each container in
`/opt/<container>`.

Service `networkd` is not touched as it's handled in #3350.

This removes all the image imports, snapshots and other things which
were associated with the existing way to run containers.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-23 09:48:11 -07:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
953ce643ab feat: bump etcd client library to 3.5.0-alpha.0
This version is finally using working `go.mod` files and tags, so no
more hacks with imports, and allows us to bump `grpc` library to the
latest version (I also did for this PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 10:36:15 -08:00
Andrey Smirnov
5855b8d532 fix: refresh control plane endpoints on worker apids on schedule
This moves endpoint refresh from the context of the service `apid` in
`machined` into `apid` service itself for the workers. `apid` does
initial poll for the endpoints when it boots, but also periodically
polls for new endpoints to make sure it has accurate list of `trustd`
endpoints to talk to, this handles cases when control plane endpoints
change (e.g. rolling replace of control plane nodes with new IPs).

Related to #3069

Fixes #3068

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-03 14:27:03 -08:00
Andrey Smirnov
389349c02b fix: use grpc load-balancing when connecting to trustd
Instead of doing our homegrown "try all the endpoints" method,
use gRPC load-balancing across configured endpoints.

Generalize load-balancer via gRPC resolver we had in Talos API client,
use it in remote certificate generator code. Generalized resolver is
still under `machinery/`, as `pkg/grpc` is not in `machinery/`, and we
can't depend on Talos code from `machinery/`.

Related to: #3068

Full fix for #3068 requires dynamic updates to control plane endpoints
while apid is running, this is coming in the next PR.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-01 16:48:00 -08:00
Andrey Smirnov
512c79e8d6 fix: lower memory usage a bit by disabling memory profiling
As of now, we're not using Go profiling, so it's safe to disable it to
save some memory and CPU costs. Once we start using it, we can re-enable
it conditionally.

Each process allocates around 1.4MiB on amd64 for memory profiling
buckets.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-01 04:49:59 -08:00
Andrey Smirnov
11863dd74d feat: implement resource API in Talos
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.

Two Talos resources (and one controller) are implemented:

* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config

Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-19 11:45:46 -08:00
Andrey Smirnov
6a0e652f0c fix: correctly transport gRPC errors from apid
Before these changes, errors were always sent as strings, so if original
error was gRPC error (which is almost always the case for apid), it is
formatted as string and original fields (like code) are lost in the
formatted string.

With this change, apid sends errors as official `grpc.Status` protobuf
structure, and client decodes that into Go grpc.Status based error.

This change is backwards and forwards compatible.

This should fix more cases when integration tests were not able to
ignore grpc `transport is closing` errors when they were sent as strings
from the apid endpoint.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-23 11:08:51 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Artem Chernyshev
e7e99cf1b3 feat: support disk usage command in talosctl
Usage example:

```bash
talosctl du --nodes 10.5.0.2 /var -H -d 2
NODE       NAME
10.5.0.2   8.4 kB   etc
10.5.0.2   1.3 GB   lib
10.5.0.2   16 MB    log
10.5.0.2   25 kB    run
10.5.0.2   4.1 kB   tmp
10.5.0.2   1.3 GB   .
```

Supported flags:
- `-a` writes counts for all files, not just directories.
- `-d` recursion depth
- '-H' humanize size outputs.
- '-t' size threshold (skip files if < size or > size).

Fixes: https://github.com/talos-systems/talos/issues/2504

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-13 09:30:31 -07:00
Andrew Rynhard
d4f103ffcb fix: pass config via stdin
In order to perform upgrades the way we would like, it is important that
we avoid any bind mounts into containers. This change ensures that all
system services get their config via stdin.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-08-20 15:26:13 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
7875e9499f chore: re-import talos-systems/pkg/crypto/tls
See also https://github.com/talos-systems/crypto/pull/2

This should break dependency of `pkg/client` on `pkg/grpc`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 08:06:38 -07:00
Andrey Smirnov
bd0f4f0564 refactor: rework pkg/grpc/tls to break dependency on pkg/grpc/gen
The goal of `pkg/grpc/tls` is to generate `*tls.Config` based on some
input parameters, but it had dependency on `pkg/grpc/gen` for 'remote'
certificate provider (it uses trustd client to sign CSRs).

Package `pkg/client` which is a part of future `machinery/` module
depends on `pkg/grpc/tls` for TLS config generation, so this pulls
`pkg/grpc` into `machinery/`, while it's not really good idea as most of
`pkg/grpc` is about server-side gRPC handling.

So the idea is to move `pkg/grpc/tls` (which has nothing to do with
gRPC), to `github.com/talos-systems/crypto/tls`, so we need to make sure
it has no dependencies on other Talos code.

The idea of this refactoring is to squash local & remote certificate
renewing providers as they had common part extracted, but even after
that almost all the code was identical except for different generators
beind used.

There should be no functional changes with this PR.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 06:41:21 -07:00
Andrey Smirnov
2697b99b7d refactor: extract pkg/net as github.com/talos-systems/net
This extracts common package as new module/repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 11:04:50 -07:00
Andrey Smirnov
47608fb874 refactor: make pkg/config not rely on machined/../internal/runtime
This makes `pkg/config` directly importable from other projects.

There should be no functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 12:40:12 -07:00
Andrey Smirnov
c54639e541 feat: implement server-side API for cluster health checks
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.

Summary of changes:

* new `cluster` API

* `apid` now listens without auth on local file socket

* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate

* `talosctl health` by default now does server-side health check

UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-15 13:52:13 -07:00
Andrey Smirnov
cbb7ca8390 refactor: merge osd into machined
This merges `osd` API into `machined`. API was copied from `osd` into
`machined`, and `osd` API was deprecated.

For backwards compatibility, `machined` still implements `osd` API, so
older Talos API clients can still talk to the node without changes.

Docs were updated. No functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-13 12:50:00 -07:00
Andrew Rynhard
a5a2d959ed feat: upgrade runc to v1.0.0-rc90
This updates runc to the same version vendored by containerd.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-02 13:19:33 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Seán C McCord
c23d8ff49e fix: allow node names
This allows node names for APID, but checks for empty or port-appended
addresses in a best-effort manner.
Also adds common function for checking addresses for the presence of an
appended port.

Fixes #2163

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-06-05 08:46:22 -07:00