3932 Commits

Author SHA1 Message Date
Bastiaan Schaap
2ff6db749a
chore: add Nedap Security Atlas as adopter
Add Nedap Security Atlas as a Talos adopter.

Signed-off-by: Bastiaan Schaap <bastiaan.schaap@nedap.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-05-05 16:31:08 +05:30
Noel Georgi
89cab200b8
chore: bump kubernetes to v1.24.0
Bump kubernetes to v1.24.0

Ref: https://github.com/siderolabs/kubelet/pull/45

Also update coredns [manifests](https://github.com/coredns/deployment/blob/master/kubernetes/coredns.yaml.sed)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-05-05 00:34:35 +05:30
Dmitriy Matrenichev
09d16349f4
chore: refactor StaticPod and StaticPodStatus into typed.Resource
This two required some additional attention and were split into separate branch. Also fix data race in NodeAddressSpec.DeepCopy method.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-04 19:53:16 +04:00
Dmitriy Matrenichev
d2935f98c4
chore: refactor LinkRefresh and LinkStatus into typed.Resource
From #5472 Andrey comments, this commit changes LinkRefresh and LinkStatus into typed.Resource by moving Bump and Physical methods to *Spec types.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-04 19:31:14 +04:00
Philipp Sauter
b52e0b9b9e
fix: talosctl throws error if gen option and --input-dir flags are combined
The user will get an error message and talosctl aborts if `talosctl cluster create` is called with gen options and the --input-dir flag.

Fixes #2275

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-05-04 16:18:03 +02:00
Tim Jones
0e15de3a8a
docs: add adopters file
Adds an ADOPTERS markdown to the repo to allow users to show
they have adopted using Talos Linux in their organization.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-05-04 11:06:30 +02:00
Noel Georgi
bb932c2970
chore: bump containerd to v1.6.4
Bump containerd to v1.6.4

Ref: https://github.com/siderolabs/pkgs/pull/466

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-05-04 00:41:30 +05:30
Noel Georgi
4eaaa2d597
chore: bump kernel to 5.15.37
Bump kernel to 5.15.37

Ref: https://github.com/siderolabs/pkgs/pull/463

Also bump [pkgs](https://github.com/siderolabs/pkgs/pull/465) and [tools](https://github.com/siderolabs/tools/pull/193)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-05-03 21:36:59 +05:30
Dmitriy Matrenichev
89dde8f2c4
chore: refactor remaining resources into typed.Resource
Refactor remaining resources into typed.Resource. Exceptions are:
- MachineConfig
- MachineType
- LinkRefresh
- LinkStatus
all of which contain additional methods, and cannot be simply reworked into new resource framework.

StaticPod and StaticPodStatus are also absent from this PR, because they result in e2e errors which are going to be resolved in the next PR.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-03 18:40:37 +04:00
Andrey Smirnov
bd089e702d
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-05-03 16:30:59 +03:00
Tames McTigue
3136334b93
docs: fix links in VMware documentation
The links to the patch and script files were changed and not reflected
here. There was also a missing curl command in the first example of
downloading the patch.

Signed-off-by: Tames McTigue <tames@northwestern.edu>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-05-03 16:07:31 +03:00
Andrey Smirnov
403df0e180
docs: provide example on using config generation package
There were many discussions on creating native Talos providers for TF,
Pulumi, etc., but there's no documented idiomatic way to use our
machinery package to generate the config. This PR tries to fill this
gap.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-05-03 15:44:51 +03:00
Dmitriy Matrenichev
6351928611
chore: redo pointer with github.com/siderolabs/go-pointer module
With the advent of generics, redo pointer functionality and remove github.com/AlekSi/pointer dependency.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-02 02:17:13 +04:00
Andrey Smirnov
a269f740ce
docs: copy knowledge base to v1.0 docs
As Talos v1.0.4 now supports kubelet with graceful shutdown disabled,
update the docs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-29 12:22:52 +03:00
Dmitriy Matrenichev
4832010263
fix: return an error if there is no byte slice in ReadonlyProvider
Current code contains a data race, since access to r.bytes in Bytes() is unguarded and can be called from several goroutines. There is no need for it anyway, since WrapReadonly always gets a full slice. Refactor code to reflect that.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-04-28 17:10:49 +04:00
Andrey Smirnov
6e7486f099
fix: allow graceful node shutdown to be overridden
The problem is that these values needs to be set to zero if the kubelet
feature gate is disabled, so we can't assume that we can override zero
value with the proper config, so we have to do an extra check on the
supplied configuration.

Also creates KB article on disabling this feature gate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-28 14:33:58 +03:00
Dmitriy Matrenichev
867d38f28f
feat: add bond slaves ordering
Before this change, we didn't preserve bonded interfaces ordering, which caused problems in some scenarios. Fix this by remembering their position in the original config.

Fixes #5207.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-04-28 01:15:11 +04:00
Andrey Smirnov
03ef62ad8b
fix: include Go primitive types into unstructured deepcopy
This code was written from JSON point of view, but
when YAML is unmarshaled,  we get more primitive Go types
as values, so why not include all of them?

This was showing as an error when applying a machine config e.g. for
kubelet extraArgs like:

```
shutdownGracePeriod: 0
```

Changing this to string fixes the problem, but it's not the best UX.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-27 23:16:45 +03:00
Noel Georgi
f06e6acf2f
chore: bump kernel to 5.15.36
Bump kernel to 5.15.36 LTS

Ref:
 - https://github.com/siderolabs/pkgs/pull/458
 - https://github.com/siderolabs/pkgs/pull/460

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-28 01:09:54 +05:30
Andrey Smirnov
c0d386abb6
fix: don't mount D-Bus socket via mount under recursive bind mount
`/var/run` was mounted from `/run`, and D-Bus socket to `/var/run/dbus/`
path, so when the container is stopped, container mounts are removed,
but on the host side mount propagates back, so D-Bus socket gets
propagated back to the host `/run`, and on the next kubelet restart
process continues adding even more mount levels exponentially.
Eventually on kubelet restart kernel resources are exhausted and the
node freezes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-27 21:09:59 +03:00
Andrey Smirnov
9a8ff76df2
refactor: rewrite perf resource to use typed.Resource
No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-27 17:57:49 +03:00
Andrey Smirnov
71d04c4d5c
refactor: rewrite runtime resources to use typed.Resource
No functional changes.

Also bump bumped cosi-runtime with the fix for the UnmarshalProto.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-27 16:47:50 +03:00
Andrey Smirnov
7568d51fc8
fix: trigger CRI config merge on correct resource update
When registry CRI config gets updated, contents of the file are written
to the `EtcFileSpec` resource, which gets rendered to disk and resource
`EtcFileStatus` is updated when the config is ready.

CRI config parts are merged from contents of `*.part` files which come
from system extensions and dynamic registry config which is written via
`EtcFileSpec` resource. As the controller was incorrectly triggered on
`EtcFileSpec` resource while reading files from disk, it might have read
stale contents of CRI config part (which hasn't been fully rendered to
disk), it might miss the latest content of the CRI config.

With the fix, controller is triggered on `EtcFileStatus` update, so when
the file is rendered to disk.

The symptom of the bug is the empty CRI registry config like:

```shell
talosctl read /etc/cri/conf.d/cri.toml

  ## /etc/cri/conf.d/00-base.part

version = 2

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    runtime_type = "io.containerd.runc.v2"
    discard_unpacked_layers = true

  ## /etc/cri/conf.d/01-registries.part
```

Notice that the `01-registries.part` is empty.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-26 23:25:59 +03:00
Tim Jones
c456dbcb93
docs: remove references to init nodes
Init nodes were deprecated in v1.0 so it makes sense
to remove the documentation about them and consign
them to the past!

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-26 21:57:21 +02:00
Andrey Smirnov
1973095d14
feat: update containerd to 1.6.3
This includes a fix for image pull slowness from
https://github.com/containerd/containerd/pull/6702.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-26 21:43:28 +03:00
Tim Jones
b51292d884
docs: reformat config reference
Update the configuration reference documentation
to show field information in a tabular format.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-26 18:06:55 +02:00
Dmitriy Matrenichev
c0709d9707
feat: increase aio-max-nr and inotify.max_user_instances
Increase values:
- fs.aio-max-nr to 1048576 (for Ceph|Veritas|other storages)
- fs.inotify.max_user_instances to 8192 (since the usual 512 is too small today's needs)

There is no need to adjust fs.inotify.max_user_watches since it's set dynamically during startup by kernel.

Closes #5175

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-04-26 18:29:29 +04:00
Andrey Smirnov
85b328e997
refactor: convert secrets resources to use typed.Resource
No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-26 14:51:56 +03:00
Andrey Smirnov
e91350acd7
refactor: convert time & v1alpha1 resources to use typed.Resource
No functional changes, just pure refactoring.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-25 22:41:52 +03:00
Andrey Smirnov
45464412e0
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-25 16:26:41 +03:00
Andrey Smirnov
0af6b35a66
feat: update etcd to 3.5.4
See https://github.com/etcd-io/etcd/releases/tag/v3.5.4

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-25 15:49:02 +03:00
Tim Jones
7ad27751cb
docs: fix analytics and sitemap
Fixes the Google Analytics tracking ID and
restores the production sitemap.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-23 23:00:16 +02:00
Andrey Smirnov
55ff876dc6
chore: bump K8s Go modules to 1.24.0-rc.0
This was skipped due to https://github.com/kubernetes/kubernetes/issues/109565

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-22 20:32:42 +03:00
Andrey Smirnov
f1f43131f8
fix: strip 'v' prefix from versions on Kubernetes upgrade
This fixes an issue when `talosctl upgrade-k8s` fails with unhelpful
message if the version is specified as `v1.23.5` vs. `1.23.5`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-22 14:59:12 +03:00
Andrey Smirnov
ec621477bd
chore: tune QEMU disk provisioner options
As QEMU clusters are used for testing, use unsafe cache options to
reduce amount of fsyncs going to the host blockdevice.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-21 22:39:30 +03:00
Andrey Smirnov
b085343dcb
feat: use discovery information for etcd join (and other etcd calls)
Talos historically relied on `kubernetes` `Endpoints` resource (which
specifies `kube-apiserver` endpoints) to find other controlplane members
of the cluster to connect to the `etcd` nodes for the cluster (when node
local etcd instance is not up, for example). This method works great,
but it relies on Kubernetes endpoint being up. If the Kubernetes API is
down for whatever reason, or if the loadbalancer malfunctions, endpoints
are not available and join/leave operations don't work.

This PR replaces the endpoints lookup to use the `Endpoints` COSI
resource which is filled in using two methods:

* from the discovery data (if discovery is enabled, default to enabled)
* from the Kubernetes `Endpoints` resource

If the discovery is disabled (or not available), this change does almost
nothing: still Kubernetes is used to discover control plane endpoints,
but as the data persists in memory, even if the Kubernetes control plane
endpoint went down, cached copy will be used to connect to the endpoint.

If the discovery is enabled, Talos can join the etcd cluster immediately
on boot without waiting for Kubernetes to be up on the bootstrap node
which means that Talos cluster initial bootstrap runs in parallel on all
control plane nodes, while previously nodes were waiting for the first
node to finish bootstrap enough to fill in the endpoints data.

As the `etcd` communication is anyways protected with mutual TLS,
there's no risk even if the discovery data is stale or poisoned, as etcd
operations would fail on TLS mismatch.

Most of the changes in this PR actually enable populating Talos
`Endpoints` resource based on the `Kubernetes` `endpoints` resource
using the watch API.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-21 22:00:27 +03:00
Artem Chernyshev
2b03057b91
feat: implement a new mode try in the config manipulation commands
The new mode allows changing the config for a period of time, which
allows trying the configuration and automatically rolling it back in case
if it doesn't work for example.

The mode can only be used with changes that can be applied without a
reboot.

When changed it doesn't write the configuration to disk, only changes it
in memory.
`--timeout` parameter can be used to customize the rollback delay.
The default timeout is 1 minute.

Any consequent configuration change will abort try mode and the last
applied configuration will be used.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-21 20:31:45 +03:00
Noel Georgi
51a68c31ff
chore: allow mounting files from the host
Allow mounting files from host into extension services as per the [OCI
spec](https://github.com/opencontainers/runtime-spec/blob/main/config.md#mounts)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-21 21:00:31 +05:30
Noel Georgi
f3e330a0aa
docs: fix network dependency
Fix network dependency

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-21 19:04:33 +05:30
Steve Francis
7ba39bd600
docs: clarify discovery service
Clarify discovery service

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-21 18:14:29 +05:30
Andrey Smirnov
8057d076ad
release(v1.1.0-alpha.1): prepare release
This is the official v1.1.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
v1.1.0-alpha.1 pkg/machinery/v1.1.0-alpha.1
2022-04-20 20:56:48 +03:00
Noel Georgi
1d5c08e74f
chore: bump kernel to 5.15.35
Bump kernel to 5.15.35 LTS

Ref: https://github.com/siderolabs/pkgs/pull/454

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-04-20 20:33:10 +05:30
Andrey Smirnov
9bf23e5162
feat: update Kubernetes to 1.24.0-rc.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.24.0-rc.0

Go modules are not updated due to missing tags:
https://github.com/kubernetes/kubernetes/issues/109565

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 16:53:51 +03:00
Andrey Smirnov
d78ed320b7
docs: fix the docs reference to star registry redirects
Since Talos moved to new registry redirect CRI plugin format, start
redirects are no longer supported in the CRI plugin (see
https://github.com/containerd/containerd/blob/main/docs/hosts.md).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 16:03:46 +03:00
Andrey Smirnov
257dfb8709
fix: run the 'post' stage of the service always
For most of the Talos service `post` stage does nothing, so it was never
properly noticed. FOr extension service, pre/post stages perform
mounting and unmounting of the overlayfs, so if post stage doesn't run
(if the runner can't be created), next time service is started, it won't
start as the post stage never ran.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 15:35:04 +03:00
Andrey Smirnov
992e230234
fix: correctly handle stopping services with reverse dependencies
This bug showed up with extension services: say we have a service
`ext-foo` which depends on service `cri`.

Service `ext-foo` will be started correctly only once `cri` is up.

But we should also stop `ext-foo` before `cri` is stopped, as otherwise
the dependency chain is broken. This PR fixes exactly that: once `cri`
is stopped, anything which depends on it should be stopped. We should
stop as well anything which depends on `ext-foo` (transitive
dependency).

In practical terms we use dependency on `cri` in extension service to
correctly stop/start extension services with `/var` filesystem
mount/unmount.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-20 15:14:08 +03:00
Tim Jones
bb7a50bd5b
docs: fix netlify redirects
Fixes Netlify redirect commands by adding an extra
path segment aliging the directory properly.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-20 13:16:14 +02:00
Tim Jones
486f79bc77
docs: fix netlify deploy url
Fixes the URL from Netlify given to Hugo
to build absolute URLs with the proper base.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-20 12:18:02 +02:00
Tim Jones
e8cbedb05b
docs: add canonical link ref
Adds a canonical link tag to doc pages
to help SEO find the current version of
documentation.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-20 10:41:27 +02:00
Tim Jones
0fe4a7832b
docs: improve latest-version banner
Make the latest-version banner sticky and
more noticeable, and ensure the link to the
latest version links to the current document
if possible.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-04-19 22:37:14 +02:00