2230 Commits

Author SHA1 Message Date
Andrey Smirnov
11056a8034 docs: add highlights for 0.9 release
This describes high-level new features in Talos 0.9.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-10 07:21:13 -08:00
Andrey Smirnov
ae8bedb9a0 docs: add control plane conversion guide and 0.9 upgrade notes
These docs are critical to get 0.9.0-beta released.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-10 07:20:44 -08:00
Andrey Smirnov
ed9673e50a docs: add troubleshooting control plane documentation
Describe common failures and debugging approach.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Co-authored-by: Spencer Smith <rsmitty@users.noreply.github.com>
2021-03-09 13:31:08 -08:00
Andrey Smirnov
485cb1262f docs: update Kubernetes upgrade guide
CLI tool usage is same, but manual process is quite different.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-09 13:23:58 -08:00
Andrey Smirnov
d3798cd7a8 docs: document controller runtime, resources and talosctl get
This is more of a in-depth guide explaining internals.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Co-authored-by: Spencer Smith <rsmitty@users.noreply.github.com>
2021-03-09 11:27:48 -08:00
Artem Chernyshev
c2e353d6af fix: do not print out help string if the parameters are correct
There was an issue that `talosctl apply config` version was printing out
the help even if arguments are correct.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-09 10:30:45 -08:00
Andrey Smirnov
56c95eace3 chore: bump dependencies via dependabot
See #3267 #3268

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-09 06:03:55 -08:00
Andrey Smirnov
49853fc2ec fix: mkdir source of the extra mounts for the kubelet
This makes sure source directory exists before performing mount
operation.

Also adds an ability to patch the config bundle configs with JSON patch,
which is exposed in `talosctl cluster create`, this allowed me to easily
test this fix:

```
talosctl cluster create ... --config-patch='[{"op": "add", "path": "/machine/kubelet/extraMounts", "value": [{"destination": "/var/log/containers", "type": "bind", "source": "/var/log/containers", "options": ["rshared", "rbind", "rw"]}]}]'
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 11:47:55 -08:00
Andrey Smirnov
e8e91d6434 fix: properly propagate nameservers to provisioned docker clusters
This was failed refactoring to the new config options.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 11:09:37 -08:00
Andrey Smirnov
f4ca6e9a6e feat: update containerd to version 1.4.4
See https://github.com/containerd/containerd/releases/tag/v1.4.4

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 11:00:21 -08:00
Andrey Smirnov
3084a3f35b chore: update tools/pkgs/extras tags
No actual changes, just referencing tagged releases for 0.9.0.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 07:29:36 -08:00
Andrey Smirnov
81acadf345 fix: ignore connection refused errors when updating/converting cp
Without loadbalancer, when api-server goes down, there will be
connection refused errors which should be retried.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 06:59:06 -08:00
Andrey Smirnov
db3785b930 fix: align partition start to the physical sector size
See https://github.com/talos-systems/go-blockdevice/pull/31

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 06:54:12 -08:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
f3a32fff99 chore: expire objects in CI S3 bucket
Otherwise we can quickly overflow our storage backend.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-04 06:18:38 -08:00
Andrey Smirnov
7e8f13652c chore: fix upgrade tests by bumping 0.9 to alpha.5
Resources/types were renamed after alpha.4, so we need Talos API to
match expectations of the upgrade test built against master.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-03 13:53:06 -08:00
Andrey Smirnov
044fb7708c fix: chmod etcd PKI path to fix virtual IP for upgrades with persistence
On upgrade with persistenct, etcd PKI path retains old mode 0600 which
breaks networkd bind mount for etcd certs.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-03 13:27:23 -08:00
Andrey Smirnov
ec72ae892b release(v0.9.0-alpha.5): prepare release
This is the official v0.9.0-alpha.5 release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
v0.9.0-alpha.5
2021-03-03 12:04:05 -08:00
Artem Chernyshev
4e47f6766e feat: bypass lock if ACPI reboot/shutdown issued
Fixes: https://github.com/talos-systems/talos/issues/2997

Listen for restart events in parallel with the boot sequence and cancel
the context if got `RestartEvent`.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-03 22:05:59 +03:00
Andrey Smirnov
60b7f79fd8 feat: add --on-reboot flag to talosctl edit/patch machineConfig
This allows to apply config even if sequencer is locked to recover from
confguration mistakes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-03 08:48:29 -08:00
Andrey Smirnov
49a23bbde8 chore: bump Go module dependencies
This bumps all the dependencies that can be bumped with minor fixups in
the code.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-03 18:45:12 +03:00
Andrey Smirnov
40a2e4d4fa feat: support JSON output in talosctl get, event types
This adds support for `-o json` (easier to use `jq` to query additional
data), and prints event name in `--watch` mode.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-03 06:18:14 -08:00
Artem Chernyshev
638af35db0 chore: properly propagate context object in the controller
This is required to correctly handle ACPI reboot or forceful reboots
during sequence that locks the controller.
Additionally fix `NoSchedule` untaint when the configuration is changed.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-03 16:59:27 +03:00
Andrey Smirnov
60aa011c7a feat: rename namespaces, resources, types etc
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.

No functional changes.

Additionally implements printing extra columns in `talosctl get xyz`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 13:34:15 -08:00
Andrey Smirnov
3a2caca781 release(v0.9.0-alpha.4): prepare release
This is the official v0.9.0-alpha.4 release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
v0.9.0-alpha.4
2021-03-02 12:50:20 -08:00
Andrey Smirnov
8ffb55943c fix: ignore 'ENOENT' (no such file directory) on mount
This fixes a race condition between `udevd` issuing ioctl `BLKRRPART`
when block device is closed after partitioning/formatting and Talos
trying to mount a partition. When `BLKRRPART` is issued, kernel
temporarily wipes out all the in-memory partitions killing `/dev/sdX`
devices until partition scan is done.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 11:19:18 -08:00
Spencer Smith
a241e9ee47 feat: update linux kernel to 5.10.19
This PR pulls in a new version of pkgs which includes a linux kernel
bump.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2021-03-02 11:08:40 -08:00
Andrey Smirnov
561f8aa15e fix: move etcd to cri containerd runner
This fixes a problem when Talos pulls `etcd` image one every reboot, as
`etcd` was running in the system containerd which is completely
ephemeral (backed by `tmpfs`).

Also skip pulling if image is already present and unpacked (same fix for
the `kubelet` image).

Fixes #3229

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 07:58:07 -08:00
Andrey Smirnov
1d8ed9b5cd chore: update provision/upgrade tests to 0.9.0-alpha.3
This drops support for 0.7.x in upgrade tests, and bumps tests to use
version 0.9.0-alpha.3 as the next stable (it will eventually graduate to
0.9.0).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 07:11:16 -08:00
Artem Chernyshev
02c0c25bad docs: bump v0.8 release version in the SBCs guides
Makes sense to update these guides to point to the v0.8.4 as it contains
many good fixes.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-02 07:09:33 -08:00
Artem Chernyshev
9333e2a600 docs: add disk encryption guide
Describe usage tips, caveats, flow.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-02 06:44:40 -08:00
Andrey Smirnov
a12a5dd255 release(v0.9.0-alpha.3): prepare release
This is the official v0.9.0-alpha.3 release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
v0.9.0-alpha.3
2021-03-01 12:55:08 -08:00
Andrey Smirnov
31e56e63db fix: update in-cluster kubeconfig validity to match other certs
Talos generates in-cluster kubeconfig for the kube-scheduler and
kube-controller-manager to authenticate to kube-apiserver. Bug was that
validity of that kubeconfig was set to 24h by mistake. Fix that by
bumping validity to default for other Kubernetes certs (1 year).

Add a certificate refresh at 50% of the validity.

Fix bugs with copying secret resources which was leading to updates not
being propagated correctly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-01 11:16:04 -08:00
Artem Chernyshev
c2f7a4b6f8 fix: add ApplyDynamicConfig call in the apply-config --immediate mode
This should align all `apply-config` modes to use the same flows.
Also added unit-tests for `ApplyDynamicConfig`.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-01 08:04:39 -08:00
Artem Chernyshev
376fdcf6cb feat: implement etcd remove-member cli command
Fixes: https://github.com/talos-systems/talos/issues/3219

We already have `etcd leave`, which makes the node exclude itself from
etcd members.
But in case if the node can't remove itself because it doesn't have
connection to etcd we need this etcd remove-member cli, which basically removes
a node from a different node.

No unit tests for that as it's going to destroy the test cluster.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-01 07:55:08 -08:00
Andrey Smirnov
c8ae00937e chore: bump dependencies via dependabot
See #3226, #3227, #3228

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-01 06:16:53 -08:00
Andrey Smirnov
d173fd4c01 feat: update etcd to 3.4.15
See https://github.com/etcd-io/etcd/releases/tag/v3.4.15

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-01 06:16:40 -08:00
Alexey Palazhchenko
5ae315f493 fix: set hdmi_safe=1 on Raspberry Pi for maximum HDMI compatibility
Setting hdmi_safe to 1 will lead to "safe mode" settings being used to try to boot with
maximum HDMI compatibility.
See https://www.raspberrypi.org/documentation/configuration/config-txt/video.md

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-02-28 10:27:28 -08:00
Seán C McCord
61cb2fb25c feat: talosctl: allow v-prefixed k8s versions
Accept both `1.19.2` and `v1.19.2` formats for kubernetes versions in
`talosctl gen`.

Fixes #3155

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2021-02-26 15:31:37 -08:00
Andrey Smirnov
c7ee239087 fix: show stopped/exited containers via CRI inspector
This fixes output of `talosctl containers` to show failed/exited
containers so that it's possible to see e.g. `kube-apiserver` container
when it fails to start. This also enables using ID from the container
list to see logs of failing containers, so it's easy to debug issues
when control plane pods don't start because of wrong configuration.

Also remove option to use either CRI or containerd inspector, default to
containerd for system namespace and to CRI for kubernetes namespace.

The only side effect is that we can't see `kubelet` container in the
output of `talosctl containers -k`, but `kubelet` itself is available in
`talosctl services` and `talosctl logs kubelet`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-26 14:45:13 -08:00
Andrey Smirnov
d7cdc8cc15 feat: implement simple layer 2 shared IP for CP
This adds a VIP (virtual IP) option to the network configuration of an
interface, which will allow a set of nodes to share a floating IP
address among them.  For now, this is restricted to control plane use
and only a single shared IP is supported.

Fixes #3111

Signed-off-by: Seán C McCord <ulexus@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-26 14:14:34 -08:00
Artem Chernyshev
63160277d6 fix: make ApplyDynamicConfig idempotent
Detect defined SANs and append only non-overlapping ones.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-26 12:15:58 -08:00
Artem Chernyshev
041620c852 feat: implement talosctl edit and patch config commands
Fixes: https://github.com/talos-systems/talos/issues/3209

Using parts of `kubectl` package to run the editor.
Also using the same approach as in `kubectl edit` command:
- add commented section to the top of the file with the description.
- if the config has errors, display validation errors in the commented
section at the top of the file.
- retry apply config until it succeeds.
- abort if no changes were detected or if the edited file is empty.

Patch currently supports jsonpatch only and can read it either from the
file or from the inline argument.

https://asciinema.org/a/wPawpctjoCFbJZKo2z2ATDXeC

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-26 02:00:20 +03:00
Andrey Smirnov
c29cfaa09b chore: build both Darwin and Linux versions of talosctl
This showed up as missing Darwin talosctl in the release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 10:47:57 -08:00
Andrey Smirnov
953ce643ab feat: bump etcd client library to 3.5.0-alpha.0
This version is finally using working `go.mod` files and tags, so no
more hacks with imports, and allows us to bump `grpc` library to the
latest version (I also did for this PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 10:36:15 -08:00
Andrey Smirnov
24b4c0bcb3 refactor: add context to the networkd
This changes introduces top-level cancellable on signal context to
networkd to abort operations when networkd is being stopped.

This allows for clean restarts of networkd container, and it is required
to support canceallable context for VIP etcd operations.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 09:39:53 -08:00
Andrey Smirnov
9464c4cbcd refactor: split WithNetworkConfig into sub-options
Allow setting individual options for the network interface while
generating config instead of providing whole config. This solves the
problem of merging options from different sources to build the config.

There should be no changes with this PR.

This is prep work for control plane VIP.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 07:54:53 -08:00
Andrey Smirnov
779ac74a08 fix: improve the drain function
Critical bug (I believe) was that drain code entered the loop to evict
the pod after wait for pod to be deleted returned success effectively
evicting pod once again once it got rescheduled to a different node.

Add a global timeout to prevent draining code from running forever.

Filter more pod types which shouldn't be ever drained.

Fixes #3124

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 07:02:24 -08:00
Andrey Smirnov
f24c815373 fix: correctly set service state in the resource
As 'healthy' was always set to true, some tasks started earlier than
expected, and specifically etcd cert was generated while the time sync
was happening leading to half-broken cert on RPi4.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-24 13:28:14 -08:00
Andrey Smirnov
4e19b597ab test: add integration test with Canal CNI and reset API
Canal CNI is known to be trying to reach out to k8s control plane on pod
teardown.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-24 11:34:02 -08:00