talos

mirror of https://github.com/siderolabs/talos.git synced 2025-09-02 04:21:11 +02:00

Author	SHA1	Message	Date
Caleb Woodbine	da6f786cab	fix: kuberentes => kubernetes typo uh uh, small typo... nothing to see here. Signed-off-by: Caleb Woodbine <calebwoodbine.public@gmail.com>	2021-07-19 05:59:35 -07:00
Artem Chernyshev	2e463348b2	fix: pass all logs through the options.Log method Looks like I've missed some 🤦 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-15 08:32:48 -07:00
Andrey Smirnov	4e9c5afb6d	fix: make ethtool optional in link status controller When Talos runs in a container, `ethtool` availability depends on host kernel support, and we don't strictly need `ethtool` to make networking work, so make it optional instead of hard failure. Example: https://gist.github.com/rgl/392d6e16d176f28430230b06ec80496c Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-15 08:32:15 -07:00
Artem Chernyshev	bf61c2cc4a	fix: write upgrade logs only to the LogOutput if it's defined No need to print them to stdout in that case. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-15 07:02:45 -07:00
Andrey Smirnov	9c73257cb1	feat: update Go to 1.16.6 See: * https://github.com/talos-systems/tools/pull/140 * https://github.com/talos-systems/pkgs/pull/300 * https://github.com/talos-systems/extras/pull/21 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-14 06:44:22 -07:00
Artem Chernyshev	23ef1d40af	chore: add ability to redirect talos upgrade module logs to io.Writer This is going to be useful in the third party code which is using upgrade modules, to collect output logs instead of printing them to the stdout. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-07-13 08:12:06 -07:00
dependabot[bot]	33e9d6c984	chore: bump github.com/aws/aws-sdk-go in /hack/cloud-image-uploader Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.39.0 to 1.39.4. - [Release notes](https://github.com/aws/aws-sdk-go/releases) - [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go/compare/v1.39.0...v1.39.4) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2021-07-12 05:06:06 -07:00
dependabot[bot]	604434c43e	chore: bump github.com/prometheus/procfs from 0.6.0 to 0.7.0 Bumps [github.com/prometheus/procfs](https://github.com/prometheus/procfs) from 0.6.0 to 0.7.0. - [Release notes](https://github.com/prometheus/procfs/releases) - [Commits](https://github.com/prometheus/procfs/compare/v0.6.0...v0.7.0) --- updated-dependencies: - dependency-name: github.com/prometheus/procfs dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2021-07-12 04:33:39 -07:00
dependabot[bot]	2ea28f62d8	chore: bump node from 16.3.0-alpine to 16.4.2-alpine Bumps node from 16.3.0-alpine to 16.4.2-alpine. --- updated-dependencies: - dependency-name: node dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2021-07-12 03:20:49 -07:00
Andrey Smirnov	b358a189bc	fix: correctly pick route scope for link-local destination Route scope doesn't depend on destination IP type being link-local, e.g. in Azure route to link local address is create with gateway, and that should be global (universe) scope route. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-09 13:01:27 -07:00
Serge Logvinov	6848d43142	feat: can change clusterdns ip lists Add change clusterdns ip list on node Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-07-09 12:33:34 -07:00
Andrey Smirnov	72b76abfd4	fix: workaround issues when IPv6 is fully or partially disabled Fixes #3847 Fixes #3919 1. Looks like `::1/128` is assigned to `lo` interface by the kernel without our help, and kernel does it properly whether IPv6 is enabled for not (including particular interface). 2. If IPv6 is disabled completely with command line, we should ignore failures to write ipv6 sysctls (as these are not security-related, skipping them isn't a risk). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-09 12:33:22 -07:00
Alexey Palazhchenko	679b08f4fa	docs: update docs for 0.12 Plus remove versions in a few places. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-09 09:39:51 -07:00
Andrey Smirnov	6fbec9e0cb	fix: cache etcd client used for healthchecks We run etcd health check every 30s, and create/destroy client every 30s. This puts a lot of pressure on etcd itself and machined. There's protobuf overhead, TLS connection overhead, etc. As we don't support changing etcd PKI (yet), client created once is good enough for the lifetime of the node. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-09 07:40:00 -07:00
Alexey Palazhchenko	eea750de2c	chore: rename "join" type to "worker" Closes #3413. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-09 07:10:45 -07:00
Andrey Smirnov	951493ac83	docs: update what's new for Talos 0.11 This is just copy-paste from our changelog. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-08 14:47:48 -07:00
Andrey Smirnov	b47d1098b1	docs: promote 0.11 docs to be the latest Also adds AWS AMIs for 0.11.0 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-08 13:37:12 -07:00
Andrey Smirnov	d930a26502	chore: implement DeepCopy for machine configuration Resources code extensively uses DeepCopy to prevent in-memory copy of the resource to be mutated outside of the resource model. Previous implementation relied on YAML serialization to copy the machine configuration which was slow, potentially might lead to panics and it generates pressure on garbage collection. This implementation uses k8s code generator to generate DeepCopy methods with some manual helpers when code generator can't handle it. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-08 07:21:24 -07:00
Andrey Smirnov	fe4ed3c734	chore: ignore tags which don't look like semantic version This allows us to use tags for Go submodules `pkg/machinery/v0.11.0` and still keeps Talos tag follow semantic version `v0.11.0`. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-08 07:20:04 -07:00
Andrey Smirnov	b969e7720e	chore: update references to old protobuf package This simply uses new protobuf package instead of old one. Old protobuf package is still in use by Talos dependencies. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-08 05:34:12 -07:00
Alexey Palazhchenko	2ba8ac9ab4	docs: add documentation directory for 0.12 Plus, convert a few absolute URLs with a version number to relative URLs without versions. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-08 04:44:51 -07:00
Andrey Smirnov	011e2885e7	fix: validate bond slaves addressing This extends network device machine configuration validation to make sure that bond slaves don't have any addressing methods set, as this might run into a conflict with the bond setup. Also makes sure no interface is part of two bonds. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 12:32:34 -07:00
Andrey Smirnov	10c28758a4	fix: ignore DeadlineExceeded error correctly on bootstrap The problem was that gRPC method `status.Code(err)` doesn't unwrap errors, while Talos client returns errors wrapped with `multierror.Error` and `fmt.Errrorf`, so `status.Code` doesn't return error code correctly. Fix that by introducing our own client method which correctly goes over the chain of wrapped errors. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 12:02:26 -07:00
Andrey Smirnov	77fabaceca	chore: ignore future pkg/machinery/vX.Y.Z tags Drone shouldn't build releases for `pkg/machinery/vX.Y.Z` tags. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 10:33:10 -07:00
Andrey Smirnov	6b661114d0	fix: make COSI runtime history depth smaller This reduces Talos memory usage. See https://github.com/cosi-project/runtime/pull/51 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-07 10:32:54 -07:00
Andrey Smirnov	9bf899bdd8	fix: make forfeit leadership connect to the right node I believe `clientv3.SetEndpoints()` calls doesn't make etcd client connect to the endpoints mentioned immediately, it might stil reuse old connection (?). At the same time `MaintenanceClient` which implements `MoveLeader` calls doesn't support explicit endpoint setting (as other similar calls do), so we have to manually force the connection to the leader node we need. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-06 13:50:57 -07:00
Alexey Palazhchenko	4708beaee5	feat: implement `talosctl config info` command Closes #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-06 00:58:47 -07:00
Andrey Smirnov	6d13d2cf92	fix: close Kubernetes API client The problem is that there's no official way to close Kuberentes client underlying TCP/HTTP connections. So each time Talos initializes connection to the control plane endpoint, new client is built, but this client is never closed, so the connection stays active on the load balancers, on the API server level, etc. It also eats some resources out of Talos itself. We add a way to close underlying connections by using helper from the Kubernetes client libraries to force close all TCP connections which should shut down all HTTP/2 connections as well. Alternative approach might be to cache a client for some time, but many of the clients are created with temporary PKI, so even cached client still needs to be closed once it gets stale, and it's not clear how to recreate a client in case existing one is broken for one reason or another (and we need to force a re-connection). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:25:26 -07:00
Andrey Smirnov	aaa36f3b4f	fix: ignore 'not a leader' error on forfeit leadership When forfeiting etcd leadership, it might be that the node still reports leadership status while not being a leader once the actual API call is used. We should ignore such an error as the node is not a leader. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:23:24 -07:00
Andrey Smirnov	22a4193678	fix: workaround 'Unauthorized' errors when accessing Kubernetes API This should fix an error like: ``` failed to create etcd client: error getting kubernetes endpoints: Unauthorized ``` The problem is that the generated cert was used immediately, so even slight time sync issue across nodes might render the cert not (yet) usable. Cert is generated on one node, but might be used on any other node (as it goes via the LB). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-05 14:15:03 -07:00
Alexey Palazhchenko	71c6f7004e	chore: bump go.mod dependencies Closes #3879, #3880, #3881, #3882, #3883, #3884, #3885, #3886. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-05 06:59:14 -07:00
Alexey Palazhchenko	915cd8fe20	docs: add guide for RBAC Document how to enable RBAC without screwing up. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-07-05 05:56:29 -07:00
Serge Logvinov	f5721050de	fix: controlplane keyusage * kube-apiserver keyusage serverAuth * kube-scheduler keyusage clientAuth * kube-controller-manager keyusage clientAuth * kubeconfig keyusage clientAuth Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-07-01 12:49:29 -07:00
Andrey Smirnov	3d7726613c	fix: fill uuid argument correctly in the config download URL It was broken, because `?uuid=` URL parses to `{"uuid": []string{""}}`. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 10:50:39 -07:00
Serge Logvinov	d8602025c8	chore: update containerd config version 2 * Rename key cri -> io.containerd.grpc.v1.cri * Disable plugins aufs,zfs,devmapper,btrfs (less warning messages on boot time) Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2021-07-01 09:08:54 -07:00
Andrey Smirnov	5949ec4e6e	docs: describe the new network configuration subsystem Internal details, resources, examples inspecting the configuration. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 09:02:56 -07:00
Spencer Smith	444d72b4d7	feat: update pkgs version This PR bumps pkgs to v0.7.0-alpha.0, so that we gain a fix for hotplugging of nvme drives. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-07-01 07:55:00 -07:00
Andrey Smirnov	e883c12b31	fix: make output of `upgrade-k8s` command less scary This removes `retrying error` messages while waiting for the API server pod state to reflect changes from the updated static pod definition. Log more lines to notify about the progress. Skip `kube-proxy` if not found (as we allow it to be disabled). ``` $ talosctl upgrade-k8s -n 172.20.0.2 --from 1.21.0 --to 1.21.2 discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"] updating "kube-apiserver" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-controller-manager" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating "kube-scheduler" to version "1.21.2" > "172.20.0.2": starting update > "172.20.0.2": machine configuration patched > "172.20.0.2": waiting for API server state pod update < "172.20.0.2": successfully updated > "172.20.0.3": starting update > "172.20.0.3": machine configuration patched > "172.20.0.3": waiting for API server state pod update < "172.20.0.3": successfully updated > "172.20.0.4": starting update > "172.20.0.4": machine configuration patched > "172.20.0.4": waiting for API server state pod update < "172.20.0.4": successfully updated updating daemonset "kube-proxy" to version "1.21.2" kube-proxy skipped as DaemonSet was not found ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-07-01 06:54:36 -07:00
Andrey Smirnov	7f8e50de4d	fix: restart the merge controllers on conflict Fixes #3861 What this change effectively does is that it changes immediate reconcile request to an error return, so that controller will be restarted with a backoff. More details: * root cause of the update/teardown conflict is that the finalizer is still pending on the tearing down resource * finalizer might not be removed immediately, e.g. if the controller which put the finalizer is itself in the crash loop * if the merge controller queues reconcile immediately, it restarts itself, but the finalizer is still there, so it once again goes into reconcile loop and that goes forever until the finalizer is removed, so instead if the controller fails, it will be restarted with exponential backoff lowering the load on the system Change is validated with the unit-tests reproducing the conflict. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 08:23:33 -07:00
Andrey Smirnov	60d7360944	fix: ignore deadline exceeded errors on bootstrap With the recent changes, bootstrap API might wait for the time to be in sync (as the apid is launched before time is sync). We set timeout to 500ms for the bootstrap API call, so there's a chance that a call might time out, and we should ignore it. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 06:59:36 -07:00
Andrey Smirnov	ee06dd69fc	fix: don't print git sha of the release twice in the dashboard This is a small nit, Talos version already contains sha when it is needed. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-30 03:45:34 -07:00
Andrey Smirnov	07fb61e5d2	fix: issue worker apid certs properly on renewal This fixes endless block on RemoteGenerator.Close method rewriting the RemoteGenerator using the retry package. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-29 09:02:35 -07:00
Andrey Smirnov	84817f7334	chore: bump Talos version in upgrade tests Preparing for 0.11 to be stable release soon. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-29 07:24:48 -07:00
Alexey Palazhchenko	2fa54107b2	chore: fix tests for disabled RBAC This commit also introduces a hidden `--json` flag for `talosctl version` command that is not supported and should be re-worked at #907. Refs #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 13:56:40 -07:00
Andrey Smirnov	78583ba985	fix: don't set bond delay options if miimon is not enabled Basically all delay options are interlocked with `miimon`: if `miimon` is zero, all delays are set to zero, and kernel complains even if zero delay attribute is sent while miimon is zero. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-28 11:55:39 -07:00
Alexey Palazhchenko	bbf1c091d4	feat: add RBAC to `talosctl version` output Refs #3852. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 07:10:25 -07:00
Andrey Smirnov	5f6ec3ef66	fix: handle cases when merged resource re-appears before being destroyed The sequence of events to reproduce the problem: * some resource was merged as final representation with ID `x` * underlying source resource gets destroyed * merge controller marks final resource `x` for teardown and waits for the finalizers to be empty * another source resource appears which gets merged to same final `x` * as `x` is in the teardown phase, spec controller will ignore it * merge controller doesn't see the problem as well, as `x` spec is correct, but the phase is wrong (which merge controller ignores) This pulls in COSI fix to return an error if a resource in teardown phase is modified. This way merge controller knows that the resource `x` is in the teardown phase, so it should be first fully torn down, and then new representation should be re-created as new resource with same ID `x`. Regression unit-tests included (they don't reproduce the sequence of events always reliably, but they do with 10% probability). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-06-28 06:45:44 -07:00
Rui Lopes	1e9a0e745d	fix: documentation typos Fix a couple of documentation typos. Signed-off-by: Rui Lopes <rgl@ruilopes.com>	2021-06-28 02:50:31 -07:00
Alexey Palazhchenko	f228af4061	chore: bump go.mod dependencies Closes #3848, #3849, #3850, #3851. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-06-28 02:25:43 -07:00
Spencer Smith	2060ceaa0b	chore: add CAPI version to CI setup This PR makes sure we pin to a known CAPI version because with the new v0.4.x released, we'll fail until we support the v1alpha4 APIs. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-06-25 10:44:07 -04:00

1 2 3 4 5 ...

2626 Commits