3256 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
c0709d9707
feat: increase aio-max-nr and inotify.max_user_instances
Increase values:
- fs.aio-max-nr to 1048576 (for Ceph|Veritas|other storages)
- fs.inotify.max_user_instances to 8192 (since the usual 512 is too small today's needs)

There is no need to adjust fs.inotify.max_user_watches since it's set dynamically during startup by kernel.

Closes #5175

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-04-26 18:29:29 +04:00
Andrey Smirnov
85b328e997
refactor: convert secrets resources to use typed.Resource
No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-26 14:51:56 +03:00
Andrey Smirnov
e91350acd7
refactor: convert time & v1alpha1 resources to use typed.Resource
No functional changes, just pure refactoring.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-25 22:41:52 +03:00
Andrey Smirnov
45464412e0
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-25 16:26:41 +03:00
Andrey Smirnov
0af6b35a66
feat: update etcd to 3.5.4
See https://github.com/etcd-io/etcd/releases/tag/v3.5.4

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-25 15:49:02 +03:00
Tim Jones
7ad27751cb
docs: fix analytics and sitemap
Fixes the Google Analytics tracking ID and
restores the production sitemap.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-23 23:00:16 +02:00
Andrey Smirnov
55ff876dc6
chore: bump K8s Go modules to 1.24.0-rc.0
This was skipped due to https://github.com/kubernetes/kubernetes/issues/109565

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-22 20:32:42 +03:00
Andrey Smirnov
f1f43131f8
fix: strip 'v' prefix from versions on Kubernetes upgrade
This fixes an issue when `talosctl upgrade-k8s` fails with unhelpful
message if the version is specified as `v1.23.5` vs. `1.23.5`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-22 14:59:12 +03:00
Andrey Smirnov
ec621477bd
chore: tune QEMU disk provisioner options
As QEMU clusters are used for testing, use unsafe cache options to
reduce amount of fsyncs going to the host blockdevice.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-21 22:39:30 +03:00
Andrey Smirnov
b085343dcb
feat: use discovery information for etcd join (and other etcd calls)
Talos historically relied on `kubernetes` `Endpoints` resource (which
specifies `kube-apiserver` endpoints) to find other controlplane members
of the cluster to connect to the `etcd` nodes for the cluster (when node
local etcd instance is not up, for example). This method works great,
but it relies on Kubernetes endpoint being up. If the Kubernetes API is
down for whatever reason, or if the loadbalancer malfunctions, endpoints
are not available and join/leave operations don't work.

This PR replaces the endpoints lookup to use the `Endpoints` COSI
resource which is filled in using two methods:

* from the discovery data (if discovery is enabled, default to enabled)
* from the Kubernetes `Endpoints` resource

If the discovery is disabled (or not available), this change does almost
nothing: still Kubernetes is used to discover control plane endpoints,
but as the data persists in memory, even if the Kubernetes control plane
endpoint went down, cached copy will be used to connect to the endpoint.

If the discovery is enabled, Talos can join the etcd cluster immediately
on boot without waiting for Kubernetes to be up on the bootstrap node
which means that Talos cluster initial bootstrap runs in parallel on all
control plane nodes, while previously nodes were waiting for the first
node to finish bootstrap enough to fill in the endpoints data.

As the `etcd` communication is anyways protected with mutual TLS,
there's no risk even if the discovery data is stale or poisoned, as etcd
operations would fail on TLS mismatch.

Most of the changes in this PR actually enable populating Talos
`Endpoints` resource based on the `Kubernetes` `endpoints` resource
using the watch API.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-21 22:00:27 +03:00
Artem Chernyshev
2b03057b91
feat: implement a new mode try in the config manipulation commands
The new mode allows changing the config for a period of time, which
allows trying the configuration and automatically rolling it back in case
if it doesn't work for example.

The mode can only be used with changes that can be applied without a
reboot.

When changed it doesn't write the configuration to disk, only changes it
in memory.
`--timeout` parameter can be used to customize the rollback delay.
The default timeout is 1 minute.

Any consequent configuration change will abort try mode and the last
applied configuration will be used.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-21 20:31:45 +03:00
Noel Georgi
51a68c31ff
chore: allow mounting files from the host
Allow mounting files from host into extension services as per the [OCI
spec](https://github.com/opencontainers/runtime-spec/blob/main/config.md#mounts)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-21 21:00:31 +05:30
Noel Georgi
f3e330a0aa
docs: fix network dependency
Fix network dependency

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-21 19:04:33 +05:30
Steve Francis
7ba39bd600
docs: clarify discovery service
Clarify discovery service

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-21 18:14:29 +05:30
Andrey Smirnov
8057d076ad
release(v1.1.0-alpha.1): prepare release
This is the official v1.1.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
v1.1.0-alpha.1 pkg/machinery/v1.1.0-alpha.1
2022-04-20 20:56:48 +03:00
Noel Georgi
1d5c08e74f
chore: bump kernel to 5.15.35
Bump kernel to 5.15.35 LTS

Ref: https://github.com/siderolabs/pkgs/pull/454

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-20 20:33:10 +05:30
Andrey Smirnov
9bf23e5162
feat: update Kubernetes to 1.24.0-rc.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.24.0-rc.0

Go modules are not updated due to missing tags:
https://github.com/kubernetes/kubernetes/issues/109565

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 16:53:51 +03:00
Andrey Smirnov
d78ed320b7
docs: fix the docs reference to star registry redirects
Since Talos moved to new registry redirect CRI plugin format, start
redirects are no longer supported in the CRI plugin (see
https://github.com/containerd/containerd/blob/main/docs/hosts.md).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 16:03:46 +03:00
Andrey Smirnov
257dfb8709
fix: run the 'post' stage of the service always
For most of the Talos service `post` stage does nothing, so it was never
properly noticed. FOr extension service, pre/post stages perform
mounting and unmounting of the overlayfs, so if post stage doesn't run
(if the runner can't be created), next time service is started, it won't
start as the post stage never ran.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 15:35:04 +03:00
Andrey Smirnov
992e230234
fix: correctly handle stopping services with reverse dependencies
This bug showed up with extension services: say we have a service
`ext-foo` which depends on service `cri`.

Service `ext-foo` will be started correctly only once `cri` is up.

But we should also stop `ext-foo` before `cri` is stopped, as otherwise
the dependency chain is broken. This PR fixes exactly that: once `cri`
is stopped, anything which depends on it should be stopped. We should
stop as well anything which depends on `ext-foo` (transitive
dependency).

In practical terms we use dependency on `cri` in extension service to
correctly stop/start extension services with `/var` filesystem
mount/unmount.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 15:14:08 +03:00
Tim Jones
bb7a50bd5b
docs: fix netlify redirects
Fixes Netlify redirect commands by adding an extra
path segment aliging the directory properly.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-20 13:16:14 +02:00
Tim Jones
486f79bc77
docs: fix netlify deploy url
Fixes the URL from Netlify given to Hugo
to build absolute URLs with the proper base.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-20 12:18:02 +02:00
Tim Jones
e8cbedb05b
docs: add canonical link ref
Adds a canonical link tag to doc pages
to help SEO find the current version of
documentation.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-20 10:41:27 +02:00
Tim Jones
0fe4a7832b
docs: improve latest-version banner
Make the latest-version banner sticky and
more noticeable, and ensure the link to the
latest version links to the current document
if possible.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-19 22:37:14 +02:00
Andrey Smirnov
23984efcdf
fix: detect lingering mounts in the installer correctly
Not sure how and when it got broken, but we're looking for mounts for
the blockdevice (like `/dev/vda`), while the actual mount info contains
the partition device (like `/dev/vda6`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-19 21:18:40 +03:00
Dmitriy Matrenichev
54dba925f8
chore: refactor network resource to use typed resource
Refactor all types except LinkStatus and LinkRefresh to use typed.Resource.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-04-19 18:10:40 +04:00
Andrey Smirnov
4eb9f45cc8
refactor: split polymorphic K8sControlPlane into typed resources
Having polymorphic (spec type depends on ID) resources is not a good
idea, and it's not compatible with protobuf encoding.

Introduce new resources for each polymorphic sub-spec using new Go 1.18
generic typed.Resource to reduce the boilerplate code.

(Still needs proper deepcopy-gen, but I'm skipping it for now, as
K8sControlPlane had also broken deep copy).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-19 16:53:09 +03:00
Andrey Smirnov
68dfdd3311
fix: provide logger to the etcd snapshot restore
With update of the client library to 3.5.3, etcd library started using
the logger, so using `nil` isn't fine anymore.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-19 15:16:33 +03:00
Tim Jones
f190403f01
docs: add how to get config after interactive setup
Add a note on how machine configuration can be retrieved
from the node, after e.g. interactive setup.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-19 10:53:53 +02:00
Tim Jones
fac7b94667
docs: improve vip caveats documentation
Many users have been using the VIP functionality to configure
endpoints in Talos config. Documentation to clarify the possible
issues with that option and that it should be avoided.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-19 10:37:29 +02:00
Tim Jones
250df9e670
docs: improve rook-ceph description
Improve the Rook Ceph documentation.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-18 22:50:52 +02:00
Tim Jones
b5c1d868de
docs: add talos/kubernetes config faq
Add an entry to our FAQs on why separate configurations
are needed for Talos and Kubernetes.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-18 21:39:47 +02:00
Andrey Smirnov
39721ee939
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-18 21:20:14 +03:00
Noel Georgi
610945774a
chore: bump tools and pkgs
Bump tools and pkgs

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-18 20:51:21 +05:30
Andrey Smirnov
2b68c8b67b
fix: enable long timestamps for xfs
This "fixes" the message like:

```
xfs filesystem being mounted at /var supports timestamps until 2038 (0x7fffffff)
```

We should support Talos beyond 2038, even if we switch to a different
filesystem type by 2038 :)

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-18 16:21:03 +03:00
Dmitriy Matrenichev
be00d77492
chore: implement cluster resources using cosi typed resource
Bump github.com/cosi-project/runtime and use typed.Resource

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-04-18 16:28:12 +04:00
Tim Jones
460d5ab13f
docs: fix extension services alias
Fixes a typo in the Extension Services document alias
which serves as the redirect from the old location.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-15 21:48:51 +03:00
Noel Georgi
bbdfda2dd2
chore: xfs quota support in kernel
XFS quota support in kernel

Ref: https://github.com/siderolabs/pkgs/pull/451

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-15 17:02:04 +05:30
Noel Georgi
8ff8fc77f3
chore: enable rpi4 poe hat fan control
Enable the Rpi4 PoE hat fan control by pulling in the overlay
compatible with the upstream kernel driver.

Ref: https://github.com/siderolabs/pkgs/pull/450

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-15 00:17:40 +05:30
Artem Chernyshev
2b9722d1f5
feat: add dry-run flag in apply-config and edit commands
Dry run prints out config diff, selected application mode without
changing the configuration.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-14 19:12:57 +03:00
Andrey Smirnov
8af50fcd27
fix: correct cri package import path
Containerd CRI plugin was merged into the main repo, but we were using
old import path, so our constants coming from the module were outdated.

This fixes the image version for the pause container.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-14 16:27:45 +03:00
Andrey Smirnov
ce09ede839
feat: update etcd to 3.5.3
See https://github.com/etcd-io/etcd/releases/tag/v3.5.3

This release should contain a fix for data consistency issue when etcd
is killed under high load.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-14 13:56:03 +03:00
Noel Georgi
13f41baddf
chore: bump kernel to 5.15.34
Bump kernel to 5.15.34

Ref: https://github.com/siderolabs/pkgs/pull/448

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-14 12:19:05 +05:30
Tim Jones
fa57b5d922
docs: reorganize documentation
Make improvements to help documentation discoverability and categorization.
Ensure all content pages have a description.
Ensure all link are replaced with Hugo shortcode.
Ensure all moved pages have an alias so redirects work.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-13 23:49:32 +02:00
Noel Georgi
a91eb9358d
chore: bump deps
Ref:
- https://github.com/siderolabs/tools/pull/185
- https://github.com/siderolabs/pkgs/pull/447
- https://github.com/siderolabs/extras/pull/44

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-13 22:22:11 +05:30
Andrey Smirnov
0aad0df2eb
refactor: remove String() for resource implementation
See https://github.com/cosi-project/runtime/pull/69

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-12 20:51:02 +03:00
Andrey Smirnov
a4060513c6
feat: build Talos with support for x86-64-v2 microarchitecture
See https://github.com/golang/go/wiki/MinimumRequirements#microarchitecture-support

This relies on new Go 1.18 feature to use more efficient x86-64
instructions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-11 21:12:59 +03:00
Noel Georgi
8faebd410b
chore: bump tools and pkgs
Bump tools and pkgs to get kernel 5.15.33

5.15.33 has a bunch of fixes for some CVE's,
it was too hard to track those and reference

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-11 19:56:42 +05:30
Andrey Smirnov
8499b7e7dc
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-11 16:53:39 +03:00
Dmitriy Matrenichev
a7ba7ea679
feat: migrate to go 1.18
Increase go.mod version from 1.17 to 1.18 in all projects. Update Makefile
to use latest tooling. Fix golangci by disable nolintlint for now.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-04-11 17:17:54 +04:00