3932 Commits

Author SHA1 Message Date
Andrey Smirnov
4e9c322564
fix: correctly render hosts.toml with multiple endpoints
We should preserve the order of keys in generated `hosts.toml`, but
go-toml library has no real way to do that on marshaling, so fix the
previous workaround, as it was generating invalid TOML.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 15:34:42 +04:00
Andrey Smirnov
cdd0f08bc5
feat: check client <> server version in some Talos commands
Talos commands which are sensitive to resource API changes:

* `get`
* `edit`, `patch`
* `upgrade-k8s`

Commands with upcoming changes for actorID:

* `reboot`
* `reset`
* `shutdown`
* `upgrade`

Fixes #6101

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-26 18:37:51 +04:00
Noel Georgi
446b0af58b
chore: bump kernel and runc
Bump kernel to [5.15.63](https://github.com/siderolabs/pkgs/pull/564)
Bump runc to [v1.1.4](https://github.com/siderolabs/pkgs/pull/568)

This PR also brings in the kernel build with NFSv4.2 [client support](https://github.com/siderolabs/pkgs/pull/567)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-26 19:01:56 +05:30
Andrey Smirnov
8c203ce9b1
feat: remove the machine from the discovery service on reset
Fixes #6137

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-25 22:05:52 +04:00
Dmitriy Matrenichev
b59ca5810e
chore: move from inet.af/netaddr to net/netip and go4.org/netipx
Closes #6007

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-25 17:51:32 +03:00
Andrey Smirnov
053af1d59e
fix: update etcd certificates when node addresses changes
Fixes #6110

I somehow missed the fact that etcd certs were not made fully reactive
to node address changes (I wrongly assume it was already the fact).

This PR refactors etcd certificate generation process to be
resource-based and introduces unit-tests for the controller.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-25 00:27:52 +04:00
Andrey Smirnov
11edb2c6f8
test: re-enable upgrade tests
Now final upgrade version is COSI API compatible.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-24 22:23:49 +04:00
Dmitriy Matrenichev
0310e20890
chore: bump github.com/siderolabs/protoenc to v0.1.5
Get improvements from the new version

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-24 19:57:56 +03:00
Dmitriy Matrenichev
29bd632401
chore: remove old build tags syntax
This commit removes lines contains old build tag syntax.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-24 17:27:01 +03:00
Noel Georgi
b500d0aa90
chore: bump k8s to v1.25.0
Bump k8s to
[v1.25.0](https://github.com/kubernetes/kubernetes/releases/tag/v1.25.0)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-24 18:58:44 +05:30
Noel Georgi
29e574be74
docs: update to v1.2.0-beta.1
Update Talos version in docs to v1.2.0-beta.1

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-24 18:21:36 +05:30
Andrey Smirnov
26b549f2a1
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-24 16:02:02 +04:00
Andrey Smirnov
8c3ac4c42b
chore: limit GOMAXPROCS for Talos services
Fixes #5971

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-24 15:42:49 +04:00
Andrey Smirnov
361e85b744
fix: properly read kexec disabled sysctl
Fixes #6046

Fix by @bzub

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-24 00:06:14 +04:00
Noel Georgi
cfe6c2bc2d
docs: nvidia oss drivers
Add docs on using NVIDIA OSS drivers

Part of #6127

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-23 20:34:39 +05:30
Andrey Smirnov
2f2d97b6b5
fix: don't wait for the hostname in maintenance mode
Fixes #6119

With new stable default hostname feature, any default hostname is
disabled until the machine config is available.

Talos enters maintenance mode when the default config source is empty,
so it doesn't have any machine config available at the moment
maintenance service is started.

Hostname might be set via different sources, e.g. kernel args or via
DHCP before the machine config is available, but if all these sources
are not available, hostname won't be set at all.

This stops waiting for the hostname, and skips setting any DNS names in
the maintenance mode certificate SANs if the hostname is not available.

Also adds a regression test via new `--disable-dhcp-hostname` flag to
`talosctl cluster create`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-23 17:52:20 +04:00
Noel Georgi
b15a639246
chore: bump kernel to 5.15.62
Bump kernel to 5.15.62. Ref: https://github.com/siderolabs/pkgs/pull/559

This PR uses pkgs from https://github.com/siderolabs/pkgs/pull/562

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-23 18:12:27 +05:30
Andrey Smirnov
a0d94be30d
fix: stable default hostname bias
When converting to base36 a 256-bit number there's a bias in the
first character of the base36 encoding, as 256-bit number never fits
perfectly base 36 number.

To give an example, when converting 4-digit binary number to decimal,
the first digit of the decimal number will be [0..3], while the
second digit won't be biased:

```
0000 -> 00
0001 -> 01
...
0111 -> 15
1000 -> 16
...
1111 -> 31
```

Same issue happens when going from e.g. base16 to base36.

Stable hostnames were biased towards having a digit as the first
character.

The fix is to skip the first character of the base36 representation, and
also we don't need to convert all 256 bits to base36, if we use only 6
characters, we can save some CPU resources by taking only 8 bytes
instead of full 32 bytes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-22 21:36:05 +04:00
Andrey Smirnov
da4cd34ef5
feat: update etcd advertised peer addresses on the fly
This allows to update the member information (for the current node) with
new advertised peer URLs as the config changes.

E.g. if the node IP changes, this will update the peer URLs for the
member accordingly.

At the same time any member update requires quorum, so changing IPs can
only be done on node-by-node basis.

If there are no changes to advertised peer URLs, controller does
nothing.

Talos node might still need a reboot to update the listen addresses, as
these are not handled automatically for now.

Fixes #6080

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-22 19:49:51 +04:00
Noel Georgi
faf92ce016
chore: bump kubernetes to v1.25.0-rc.1
Bump kubernetes to v1.25.0-rc.1

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-19 00:21:23 +05:30
Noel Georgi
52de919e34
chore: bump containerd to v1.6.8
Bump containerd to [v1.6.8](https://github.com/siderolabs/pkgs/pull/552)

Use the fixed [pkgs version](https://github.com/siderolabs/pkgs/pull/555)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-18 21:31:50 +05:30
Philipp Sauter
7d43fc79b1
fix: make 'ca', 'crt' and 'key' flags optional for 'talosctl config add'
As the 'ca', 'crt' and 'key' parameters are now optional for the talos
client, requiring them for the 'talosctl config add' command no longer
makes sense.

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-08-17 16:51:10 +02:00
Artem Chernyshev
fd467e02c1
fix: handle grub config being empty in the Revert function
Looks like it returns nil if it doesn't exist and the code doesn't
handle it properly.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-16 23:05:43 +03:00
Artem Chernyshev
9492aca652
fix: clean up cancelCtxMu leftovers in PriorityLock
Removed it from one place but forgot to clean up the other usages.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-16 13:19:56 +03:00
Noel Georgi
61e3eb2eaa
fix: talosctl edit mc loop
Fixes re-opening editor forever when using `talosctl edit mc`.
Also fixes the temp dir getting filled up with temporary files created
for editing machine config.

Fixes: #6098

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-16 05:37:19 +05:30
Artem Chernyshev
32db7a7f5d
fix: surround cancelCtx with the mutex
Looks like `cancelCtx` access from the different goroutines wasn't
protected.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-15 22:21:07 +03:00
Philipp Sauter
f37da96ef3
feat: enable talos client to connect to Talos through an auth proxy
Talos client can connect to Talos API via a proxy with basic auth.
Additionally it is now optional to specify a TLS CA,key or crt. Optionally
Developers can build talosctl with WITH_DEBUG=1 to allow insecure
connections when http:// endpoints are specified.

Fixes #5980

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-08-15 18:05:26 +02:00
Noel Georgi
123d32174e
chore: validate that etcd ca is not empty
Validate that the etcd CA is not empty in machine configuration.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-15 21:15:40 +05:30
Dmitriy Matrenichev
0fe4492e72
chore: bump golangci-lint from 1.47.2 to 1.48.0
Patch version linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-15 18:11:30 +03:00
Andrey Smirnov
7e527777e8
chore: update API descriptors
Re-generate protobuf API descriptors in preparation for 1.2.0-beta.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-15 17:35:09 +04:00
Andrey Smirnov
65098c14e6
chore: bump to the final released versions
In preparation for Talos release 1.2.0, update tools/pkgs/extras to
1.2.0.

Also update Go modules to released versions.

There should be no actual changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-15 17:15:23 +04:00
Noel Georgi
9512e8f301
feat: allow modules to be loaded via extension
Allow modules to be loaded via [extensions](https://github.com/siderolabs/extensions/pull/52).

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-15 17:14:38 +05:30
Andrey Smirnov
2c482936bb
chore: bump dependencies
dependabot + go-mod-tidy

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-15 15:03:07 +04:00
Utku Ozdemir
586e29dfca
feat: add event actor id to client api and events cmd
Add the missing actor id on the event and a way to filter by it to the events cli command.

Related to siderolabs/talos#5499.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-12 22:01:34 +02:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Utku Ozdemir
d04211f85f
feat: add new event watch fn and return action responses on API
Add a new function EventsWatchV2 that blocks until receiving the first event then switches to non-blocking mode.

Also add new API functions to return responses of the lifecycle actions `reboot`, `reset` and `shutdown`.

Required for the client-side part of siderolabs/talos#5499.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-12 19:07:02 +02:00
Steve Francis
f88d08e21b
docs: clarification of AWS set up process
AWS documentation fixes.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 19:34:14 +04:00
Andrey Smirnov
b48adb8ec5
chore: revert kernel with BTF support
It might be causing CI instability, so let's try backing it out.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 18:29:04 +04:00
Dmitriy Matrenichev
e422ea63d0
chore: add proto definitions for common types
This commit adds proto definitions for this types;
- *url.URL
- netaddr.IP
- netaddr.IPPort
- netaddr.IPPrefix
- *x509.PEMEncodedKey
- *x509.PEMEncodedCertificateAndKey

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-12 15:38:31 +03:00
Artem Chernyshev
5c6648e3d2
fix: make talosctl command return nonzero error codes if it had errors
Multinode requests were printing out the errors for each node to stderr,
but they didn't set the global error.

Refactor the code a bit to use a single function for handling that logic
to avoid rewriting it in many other places.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-12 14:19:45 +03:00
Andrey Smirnov
dce923f747
feat: allow configuring etcd listen addresses
This introduces new configuration settings to configure
advertised/listen subnets. For backwards compatibility when using no
settings or old 'subnet' argument, etcd still listens on all addresses.

If new `advertisedSubnets` is being used, this automatically limits etcd
listen addresses to the same value. `listenSubnets` can be configured
also explicitly e.g. to listen on additional addresses for some other
scenarios (e.g. accessing etcd from outside of the cluster).

See #5668

One more thing left (for a separate PR) is to update etcd advertised
URLs on the fly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 14:57:54 +04:00
Andrey Smirnov
4c3485ae3f
feat: update Kubernetes to 1.25.0-rc.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.25.0-rc.0

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 00:17:45 +04:00
Noel Georgi
ea6ceab245
chore: bump kernel to 5.15.60
Bump kernel to [5.15.60](https://github.com/siderolabs/pkgs/pull/547)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-12 00:34:29 +05:30
Andrey Smirnov
20a5640857
fix: introduce 'routed' NodeAddresses and use them in kubelet
Same change will be done for the etcd in a separate PR.

The idea is to introduce a subset of `current` addresses: `routed`
addresses don't include external IPs (like AWS), as they are not on the
node, and excludes SideroLink IPs (as these are not routeable).

Reimplement `kubelet` nodeIP selection based on the new resources
removing the reliance on `net.IPAddrs`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-11 21:11:47 +04:00
Trevor Sullivan
f1de478943
docs: verbiage in Digital Ocean tutorial
Fix veribage in DigitalOcean tutorial.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-11 21:03:11 +05:30
Philipp Sauter
6b23deddcf
feat: support custom ports for connecting to apid from talosctl
Users can now add a port suffix to the endpoints used by talosctl. Either
in the CLI flag or the ~/.talos/config. The default port is still 50000.

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-08-11 16:52:46 +02:00
Noel Georgi
07cd0924ea
fix: recursive seccomp mounts
Since `/var/lib/kubelet` was mounted with `rbind` and `rshared`, the
seccomp profile mount from the host at `/var/lib/seccomp/profiles` when
mounted at `/var/lib/kubelet/seccomp/profiles` would create a mount back
to the host creating an extra mount everytime kubelet starts/restarts.

Fix the issue by using the same path for the seccomp profiles on both
host and kubelet.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-11 19:48:45 +05:30
Andrey Smirnov
696f2b735e
chore: update kernel to the version with BTF support
See https://github.com/siderolabs/pkgs/pull/482

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-11 17:50:47 +04:00
Utku Ozdemir
b5da686a7b
feat: add actor ID to events & emit an initial empty event
Add a new field `actorID` to the events and populate it with a UUID for the lifecycle actions `reboot`, `reset`, `upgrade` and `shutdown`. This actor ID will be present on all events emitted by this triggered action. We can use this ID later on the client side to be able to track triggered actions.

We also emit an event with an empty payload on the events streaming GRPC endpoint when a client connects. The purpose of this event is to signal to the client that the event streaming has actually started.

Server-side part of siderolabs/talos#5499.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-11 15:14:11 +02:00
Dmitriy Matrenichev
fec0ed29d4
fix: add missing LinkStatusType registration
Forgot about it. Also bump protoenc and fix encoders/decoders.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-11 14:59:29 +03:00