3932 Commits

Author SHA1 Message Date
Artem Chernyshev
13499fc302
feat: support patching the machine config in the apply-config cmd
Fixes: https://github.com/siderolabs/talos/issues/6045

`talosctl apply-config` now supports `--config-patch` flag that takes
machine config patches as the input.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-11 13:56:23 +03:00
Andrey Smirnov
be351dcb99
release(v1.2.0-alpha.2): prepare release
This is the official v1.2.0-alpha.2 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
pkg/machinery/v1.2.0-alpha.2 v1.2.0-alpha.2
2022-08-10 23:03:53 +04:00
Andrey Smirnov
5dd1b40020
feat: disable Kubernetes discovery backend by default
Fixes #5827

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-10 22:25:31 +04:00
Noel Georgi
b62b18a972
feat: bump k8s to v1.25.0-beta.0
Bump k8s to v1.25.0-beta.0

Update most kubernetes `master` references to `controlplane`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-10 22:17:53 +05:30
Dmitriy Matrenichev
7b80a747bc
feat: add protobuf encoding/decoding for Go structs
This commit adds the support for encoding/decoding Go structs with `protobuf:<n>` tags.

Closes #5940

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-10 16:04:08 +03:00
Steve Francis
00c3ee3ac3
docs: remove obsolete references to init nodes
This PR removes obsolete references

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2022-08-09 14:38:09 -04:00
Andrey Smirnov
6eefa9d9cb
fix: properly filter resources in maintenance server
The issue was introduced in PR #6042

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-09 14:26:37 +04:00
Andrey Smirnov
fa5aad01a0
docs: fix issues in GCP docs
Fixes #6034
Fixes #6035
Closes #6027

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-08 23:29:53 +04:00
Andrey Smirnov
98f056603e
chore: bump dependencies
go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-08 22:31:59 +04:00
Utku Ozdemir
84e712a9f1
feat: introduce Talos API access from Kubernetes
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.

Closes siderolabs/talos#4422.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-08 18:27:26 +02:00
Noel Georgi
d7be308921
chore: bump kernel to 5.15.59
Bump kernel to [5.15.59](https://github.com/siderolabs/pkgs/pull/546)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-08 19:28:34 +05:30
Andrey Smirnov
c2c2d65bc9
refactor: use COSI access filter for resource access
This replaces old resource API filter the new one based on new COSI
feature to filter access to the resources.

There should be no functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-08 17:25:09 +04:00
Andrey Smirnov
1dee0579e9
feat: add support for proxying one-to-one to apid
This adds a new metadata field `node` which performs always proxying to
a single node without touching any protobuf structs on the way.

So with `node`, we can call APIs which do not conform to the Talos API
proxying standards, but from the UX point of view things will work same
way, but multiplexing will be handled on the client side.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-08 15:39:22 +04:00
Flightkick
86eb01cd6c
docs: add missing dev tools
Document missing dev tools.

Signed-off-by: Flightkick <Flightkick@users.noreply.github.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-08 16:27:55 +05:30
Gwyn
4fd676c046
docs: fix typo in theila name
Fixes typo of thelia -> theila

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-07 00:49:50 +05:30
Andrey Smirnov
856beb21cc
feat: containerd 1.6.7, Flannel 1.19.1
See

* https://github.com/flannel-io/flannel/releases/tag/v0.19.1
* https://github.com/containerd/containerd/releases/tag/v1.6.7

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-05 19:40:32 +04:00
Noel Georgi
e97b9f6d3e
feat: support dhcp options for vlan
Add `DHCPOptions` for VLAN device.

Fixes: #6011

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-05 01:41:46 +05:30
Andrey Smirnov
92314e47bf
refactor: use controllers/resources to feed trustd with data
This is mostly same as the way `apid` consumes certificates generated by
`machined` via COSI API connection.

Service `trustd` consumes two resources:

* `secrets.Trustd` which contains `trustd` server TLS certificates and
  it gets refreshed as e.g. node IP changes
* `secrets.OSRoot` which contains Talos API CA and join token

This PR fixes an issue with `trustd` certs not always including all IPs
of the node, as previously `trustd` certs will only capture addresses of
the node at the moment of `trustd` startup.

Another thing is that refactoring allows to dynamically change API CA
and join token. This needs more work, but `trustd` should now pick up
changes without any additional changes.

Fixes #5863

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 23:45:34 +04:00
Noel Georgi
80d298abfb
feat: support skipping node registration
This PR supports skipping node registration from K8s.

This is an adavnced use case and only needs to be used in special cases.
In this mode Kubelet only runs the static pods.

Fixes: #5847

Operations that will be broken:

- `talosctl cluster create` would eventually timeout since it expects
  nodes to be registered.
- `talosctl health` since it expects nodes to be registered.
- `talosctl upgrade-k8s` since it expects nodes to be registered. Static
  pods can still be updated by editing the machine config..

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-04 23:09:47 +05:30
Andrey Smirnov
7795de313a
fix: use controllers/resources for etcd configuration
This extracts etcd configuration and finalized run arguments as
resources managed by controllers.

The biggest change in terms of UX is that Talos now waits for the etcd
configured subnet to be actually available before starting etcd.
Previously etcd quickly failed if the requested subnet was not available
on the host.

Coupled with other fixes (#5951, #5988), this should bring etcd
join/promote sequence back into proper shape.

I also reverted all temporary measures for discovering etcd endpoints,
now etcd join doesn't depend on Kubernetes (once again).

Fixes #5889

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 21:14:43 +04:00
Andrey Smirnov
f9b664c947
fix: reload trusted CA list when client is recreated
Fixes #5652

This reworks and unifies HTTP client/transport management in Talos:

* cleanhttp is used everywhere consistently
* DefaultClient is using pooled client, other clients use regular
  transport
* like before, Proxy vars are inspected on each request (but now
  consistently)
* manifest download functions now recreate the client on each run to
  pick up latest changes
* system CA list is picked up from a fixed locations, and supports
  reloading on changes

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 20:01:35 +04:00
Andrey Smirnov
8847ccd031
fix: shutdown some streaming API calls when machined API is shuting down
This provides a "quick" shutdown when the API server is going down.

This solves to have a "clean" shutdown of the `talosctl events` stream
when the server is rebooting.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 17:13:41 +04:00
Noel Georgi
f95b537262
fix: allow files in extension spec
Support allowing explicit files in extensions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-04 01:54:28 +05:30
Andrey Smirnov
1a8f6ec8e1
fix: don't advertise Kubernetes pod networks over KubeSpan by default
This is incompatible with Calico and Cilium in default configuration, as
it's not easy to figure out exact PodCIDRs of the node.

We change the default but provide the option to revert the old behavior.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 22:33:59 +04:00
Artem Chernyshev
e3d4a0e4d1
fix: make reset work even if the node is not bootstrapped/not joined
Now the sequencer is smart enough to skip `LeaveEtcd` and
`CordonAndDrain` node if the node is not fully joined to the cluster
yet.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-08-03 17:15:50 +03:00
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Eng Zer Jun
fb058a7c92
test: use T.TempDir to create temporary test directory
This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 16:31:55 +04:00
Andrey Smirnov
6fc38bae69
fix: iterate over etcd members endpoints for member promotion
This uses all available (potential) etcd endpoints, which includes the
member being promoted as well. We avoid failures by iterating over the
list of endpoints on each attempt to make sure each and every endpoint
is tried.

Part of #5889

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-02 00:16:33 +04:00
Andrey Smirnov
c70b692fb3
fix: update default address if removed from the host
This fixes a case when some IP which became default at some point was
removed completely from the node. In that case Talos should set default
address to another address, as having default IP not on the node doesn't
make much sense.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 23:19:04 +04:00
Utku Ozdemir
cf620d4733
feat: read talosconfig from secrets directory
Similar to the way kubectl reads kubeconfig, we attempt to load talosconfig file from multiple locations. If the file exists under `/var/run/secrets/talos.dev/config`, we load with higher priority before falling back to `~/.talos/config`. This will allow talosctl to be able to access Talos API from inside a pod when talosconfig is mounted into `/var/run/secrets/talos.dev/config`, similar to the way Kubernetes service account tokens work.

Part of siderolabs/talos#5980.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-01 18:56:57 +02:00
Eirik Askheim
1ad8e6122c
fix: keep entire vlan id when parsing cmdline
Only last digit was kept.

Signed-off-by: Eirik Askheim <eirik@x13.no>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 19:52:15 +04:00
Andrey Smirnov
fe2ee3b100
feat: implement MachineStatus resource
Fixes #5789

Example:

```yaml
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: staticPods
              reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready
```

As events (CLI doesn't show full contents):

```
172.20.0.2   cbhf2l6f9lrs738hehfg   talos/runtime/machine.MachineStatusEvent   BOOTING   ready: false, unmet conditions: [time network services]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 18:36:10 +04:00
Andrey Smirnov
670d274c45
chore: bump dependencies
Dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 17:37:43 +04:00
Tommy Botten Jensen
08d2612e07
docs: bond devices are comma separated
Update kernel arguments bond doc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 20:51:35 +04:00
Dmitriy Matrenichev
c3c3e14db5
chore: add gotagsrewrite tool and use it to add tags to resources
This commit adds gotagsrewrite tool, which is used to add `protobuf:"<n>"` tags to structs with //gotagsrewrite:gen comment. This will be used in conjunction with github.com/siderolabs/protoenc.

Closes #5941

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-29 14:51:02 +03:00
Andrey Smirnov
2e790526f7
refactor: make apid stop gracefully and be stopped late
This fixes apid and machined shutdown sequences to do graceful stop of
gRPC server with timeout.

Also sequences are restructured to stop apid/machined as late as
possible allowing access to the node while the long sequence is running
(e.g. upgrade or reset).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 14:52:04 +04:00
Andrey Smirnov
0cdf222431
fix: retry Conflict errors when upgrading k8s manifests
Fixes #5985

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 13:20:04 +04:00
Andrey Smirnov
1db097f509
release(v1.2.0-alpha.1): prepare release
This is the official v1.2.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
pkg/machinery/v1.2.0-alpha.1 v1.2.0-alpha.1
2022-07-28 21:43:44 +04:00
Noel Georgi
5ac4947b63
feat: enable default seccomp profile for kubelet
Enable the default seccomp profile provided by the container runtime

Fixes: #5293

Ref: https://kubernetes.io/docs/tutorials/security/seccomp/

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-28 21:45:49 +05:30
Artem Chernyshev
e5994ff7a7
fix: skip ResetDuringBoot test if the Cluster config is unknown
And improve retry logic in the test.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-28 15:57:58 +03:00
Artem Chernyshev
8028e10749
fix: wait for boot done when rebooting a node in the integration tests
We shouldn't start cluster healthcheck until boot sequence is done.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 23:58:43 +03:00
Artem Chernyshev
ae1bec59e9
feat: allow running only one sequence at a time
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:

| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot                                             | Y    | Y      | Y     | N       |
| reset                                              | Y    | N      | N     | N       |
| upgrade                                            | Y    | N      | N     | N       |

With a small addition that `WithTakeover` is still there.
If set, priority is ignored.

This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 17:21:36 +03:00
Andrey Smirnov
ec05aee040
fix: correctly unwrap errors when streaming
When message is sent via the proxy, `metadata.error` carries only string
representation which can't be unmarshalled back into an `error` which we
can match against. A similar fix was already done for "unary" responses,
but we missed the streaming case.

This fixes a spurious failure in integration tests when calling
`talosctl pcap --duration 1s`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-26 23:52:37 +04:00
Dmitriy Matrenichev
7c7f2d8c3b
feat: refactor disk size matcher to be compatible with DeepEqual
Replace Matcher field with Matcher method and store Op and size data directly in InstallDiskSizeMatcher.

Closes #5860.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-26 17:10:11 +03:00
Andrey Smirnov
3addea83b9
feat: introduce support for Talos API access from Kubernetes
This is a first step: providing a service to access Talos API.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-26 00:02:19 +04:00
Matthew Richardson
34d3a41643
docs: add missing <> to relref
Fixing small issue in syntax.

Signed-off-by: Matthew Richardson <M.Richardson@ed.ac.uk>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 23:20:51 +04:00
Andrey Smirnov
c4d2d20c41
fix: enable stable hostnames for worker configs as well
This fixes a small bug with stable hostnames when they were only enabled
for control plane nodes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 22:30:44 +04:00
Noel Georgi
0326bac1f9
chore: bump kernel to 5.15.57
Bump kernel to [5.15.57](https://github.com/siderolabs/pkgs/pull/539)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-25 21:16:18 +05:30
Andrey Smirnov
86820c33f1
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 18:14:49 +04:00
Andrey Smirnov
6e7dfeeb38
fix: data race in packet capture (part 2)
The `PacketSource` interface is racy, as it provides a channel to read
packets from, while packets are read in a (invisible) goroutine, so
closing the capture handle creates a data race with reading.

Unwrap that goroutine into an explicit loop to avoid the race.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 15:24:32 +04:00