155 Commits

Author SHA1 Message Date
Andrey Smirnov
10155c390e
feat: enable xfs project quota support, kubelet feature
This is controlled with a feature flag which gets enabled automatically
for Talos 1.5+.

Fixes #7181

If enabled, configures kubelet to use project quotas to track xfs volume
usage, which is much more efficient than doing `du` periodically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-19 20:33:39 +04:00
Andrey Smirnov
bb02dd263c
chore: drop deprecated stuff for Talos 1.5
* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
  columns in the table output

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-18 19:46:37 +04:00
Utku Ozdemir
62c6e9655c
feat: introduce siderolink config resource & reconnect
Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now).

Introduce a controller for this resource which populates it from the Kernel cmdline.

Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes.

Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses.

Closes siderolabs/talos#7142, siderolabs/talos#7143.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-05-05 17:04:34 +02:00
Andrey Smirnov
860002c735
fix: don't reload control plane pods on cert SANs changes
Fixes #7159

The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.

So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.

There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-05 16:59:09 +04:00
Artem Chernyshev
07c3c5d59e
feat: return disk subsystem in the Disks API
Fixes: https://github.com/siderolabs/talos/issues/7017

Should allow external services to detect which user block devices might
need to be wiped during reset.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-03-31 16:10:59 +03:00
Andrey Smirnov
aa14993539
feat: introduce network probes
Network probes are configured with the specs, and provide their output
as a status.

At the moment only platform code can configure network probes.

If any network probes are configured, they affect network.Status
'Connectivity' flag.

Example, create the probe:

```
talosctl -n 172.20.0.3 meta write 0xa '{"probes": [{"interval": "1s", "tcp": {"endpoint": "google.com:80", "timeout": "10s"}}]}'
```

Watch probe status:

```
$ talosctl -n 172.20.0.3 get probe
NODE         NAMESPACE   TYPE          ID                  VERSION   SUCCESS
172.20.0.3   network     ProbeStatus   tcp:google.com:80   5         true
```

With failing probes:

```
$ talosctl -n 172.20.0.3 get probe
NODE         NAMESPACE   TYPE          ID                  VERSION   SUCCESS
172.20.0.3   network     ProbeStatus   tcp:google.com:80   4         true
172.20.0.3   network     ProbeStatus   tcp:google.com:81   1         false
$ talosctl -n 172.20.0.3 get networkstatus
NODE         NAMESPACE   TYPE            ID       VERSION   ADDRESS   CONNECTIVITY   HOSTNAME   ETC
172.20.0.3   network     NetworkStatus   status   5         true      true           true       true

```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-31 15:20:21 +04:00
Nico Berlee
0af8fe2fb5
feat: netstat pod support
talosctl netstat -k show all host and non-hostnetwork pods sockets/connections.
talosctl netstat namespace/pod shows sockets/connections of a specific pod +
autocompletes in the shell.

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-30 23:39:38 +04:00
Dennis Marttinen
45c5b47a57
feat: dhcpv4: send current hostname, fix spec compliance of renewals
This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the `network.HostnameSpecController`.

This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered
issues and missing features in the upstream `insomniacslk/dhcp` library, the
fixes and improvements for which are now finally merged.

Sending hostname updates have been tested against `dnsmasq` and the built-in
DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against `dnsmasq`.

Signed-off-by: Dennis Marttinen <twelho@welho.tech>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-29 21:04:32 +04:00
Andrey Smirnov
442cb9c1b0
feat: implement APIs to write to META
This allows to put keys to META partition.

META contents can be viewed with `talosctl get metakeys`.

There is not real usecase for it yet, but the next PRs will introduce
two special keys which can be written:

* platform network config for `metal`
* `${code}` variable

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-15 22:17:52 +04:00
Utku Ozdemir
9e07832db9
feat: implement summary dashboard
Implement the new summary dashboard with node info and logs.
Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config.

Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key.

Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node.

Disable the network config editor screen in the new dashboard until it is fully implemented with its backend.

Closes siderolabs/talos#4790.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-03-15 13:13:28 +01:00
Andrey Smirnov
1df841bb54
refactor: change the interface of META
Use a global instance, handle loading/saving META in global context.

Deprecate legacy syslinux ADV, provide an easier interface for
consumers.

Expose META as resources.

Fix the bootloader revert process (it was completely broken for quite a
while :sad:).

This is a first step which mostly does preparation work, real changes
will come in the next PRs:

* add APIs to write to META
* consume META keys for platform network config for `metal`
* custom key for URL `${code}`

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-15 15:43:16 +04:00
Nico Berlee
97048f7c37
feat: netstat in API and client
Implements netstat in Talos API and client (talosctl).

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-09 15:48:30 +04:00
Artem Chernyshev
b520710810
feat: introduce new flag in reset API that makes Talos reset user disks
Fixes: https://github.com/siderolabs/talos/issues/6815

Additionally, make it possible to run reset in maintenance mode: to
enable a way for resetting system disk and remove all traces of Talos
from it.

The new reset flow works in a separate sequence, changed disk probe
lookup to check the boot partition instead of the ephemeral one.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-02-28 15:10:41 +03:00
Dmitriy Matrenichev
8e9fc13d7c
feat: implement enum generator for proto files
`structprotogen` now supports generating enums directly instead of using predeclared file and hardcoded types. To use this functionality, simply put `structprotogen:gen_enum` in the comment above const block, you want to have the proto definitions for.

Closes #6215

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-01-11 16:02:21 +03:00
Andrey Smirnov
96629d5ba6
feat: implement etcd maintenance commands
This allows to safely recover out of space quota issues, and perform
degragmentation as needed.

`talosctl etcd status` command provides lots of information about the
cluster health.

See docs for more details.

Fixes #4889

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-03 23:25:28 +04:00
Andrey Smirnov
89dbb0ecf0
release(v1.4.0-alpha.0): prepare release
This is the official v1.4.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-12-23 22:32:09 +04:00
Andrey Smirnov
bbb56840e4
chore: update protobuf API descriptors for 1.3.0
Set the API descriptors for v1.3.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-29 14:41:43 +04:00
Serge Logvinov
e432579d48
feat: kubespan node endpoints filter
This feature allows us to use only IPv4 or IPv6 stack to reach the peers.
Also, it can help to not share the node-specific IPs,
which cannot be accessible at all.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
2022-11-18 19:55:42 +04:00
Philipp Sauter
e1e340bdd9
feat: expose Talos node labels as a machine configuration field
We add the `nodeLabels` key to the machine config to allow users to add
node labels to the kubernetes Node object. A controller
reads the nodeLabels from the machine config and applies them via the
kubernetes API.
Older versions of talosctl will throw an unknown keys error if `edit mc`
 is called on a node with this change.

Fixes #6301

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-15 21:25:40 +04:00
Philipp Sauter
4e114ca120
feat: use the etcd member id for etcd operations instead of hostname
We add a controller that provides the etcd member id as a resource
and change the etcd related commands to support member ids next to
hostnames.

Fixes: #6223

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-11-10 19:17:56 +04:00
Serge Logvinov
06fea24414
feat: expand platform metadata resources
* add IPv6 to the ExternalIPs resource.
* platformMetadata can define Spot instances.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-07 18:57:17 +04:00
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
30bbf6463a
refactor: use siderolabs/net version with netip.Addr
Replace most of `net.IP` usage in Talos with `netip.Addr`, refactor code
accordingly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-02 14:21:03 +04:00
Serge Logvinov
8bfa7ac1d6
feat: platform metadata resource
This resource stores common platform metadata information.
Such as:

* Hostname
* Region
* Zone
* InstanceType (SKU)
* InstanceID
* ProviderID (CCM cloud native magic string)

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-10-28 14:32:39 +04:00
Philipp Sauter
23842114f0
feat: support encryption with secretbox
We add support for encryption with secretbox. While AESCBC is still
supported secretbox will take precedence if both are configured.
Secretbox is not the default encryption for new clusters.

Fixes: #6362

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-10-26 19:06:53 +02:00
Philipp Sauter
c6e1702eca
feat: use URL-based manifests to present static pods to the kubelet
Previously static pod manifests were written to and read from a folder
on the disk. We add a controller that cleans up the default static pod
manifests on the disk and serves them as a PodList manifest via HTTP.
The to the manifest is injected into the kubelet. File based static pod
manifests are still supported and may be enabled by setting the key
kubelet -> enableManifestsDirectory in the machine config.

Fixes #5494

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-10-25 14:30:19 +02:00
Serge Logvinov
dc70d892a3
fix: support setting KubeSpan link MTU
Kubespan creates package size more than MTU external interface size.

This PR adds capabilities to change MTU size through machine config.
And sets MTU of the default kubespan route.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-10-17 14:39:15 +04:00
Andrey Smirnov
993743f634
fix: skip hostname via DHCP on OpenStack platform
Introduce new DHCP operator option to skip hostname request/response,
and use that in OpenStack platform.

OpenStack configures interface with DHCP, while providing dummy hostname
over DHCP and proper hostname over metadata. As operators override
platform settings, DHCP hostname takes over OpenStack hostname. As a
fix, ignore DHCP hostname while on OpenStack.

Fixes #6350

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-10-10 14:18:46 +04:00
Noel Georgi
48dee48057
feat: support mtu for routes
Support setting MTU for routes.

Fixes: #6324

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-30 16:38:22 +05:30
Serge Logvinov
18c377a4d1
feat: customize audit policy
Add resource `AuditPolicyConfigs.kubernetes.talos.dev`.
It can be changed through machine config `cluster.apiServer.auditPolicy`

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-28 13:46:44 +04:00
Andrey Smirnov
0b2767c164
feat: implement 'permanent addr' in link statuses
Permanent address is only available for physical links, and it might be
different from the 'hardware address': when bonding, 'hardware address'
gets overridden from the bond master, while 'permanent address' still
shows MAC of the interface.

This part of the fix for incorrect bonding issue on Equinix Metal.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-26 14:45:46 +04:00
Andrey Smirnov
ce12c7b380
chore: update COSI runtime to v0.2.0-alpha.1
This adds metadata annotations and fixes some hanging watch loops.

There should be no functional changes for Talos.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-20 22:02:57 +04:00
Noel Georgi
5e21cca52d
feat: support setting kernel parameters
Support setting kernel parameters via machine config.

Fixes: #6206

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-09-05 23:45:51 +05:30
Dmitriy Matrenichev
bd56621cdf
feat: add structprotogen tool
This commit adds structprotogen tool which is used to generate proto file from Go structs.

Closes #6078.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-05 16:54:00 +03:00
Andrey Smirnov
7e527777e8
chore: update API descriptors
Re-generate protobuf API descriptors in preparation for 1.2.0-beta.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-15 17:35:09 +04:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Dmitriy Matrenichev
e422ea63d0
chore: add proto definitions for common types
This commit adds proto definitions for this types;
- *url.URL
- netaddr.IP
- netaddr.IPPort
- netaddr.IPPrefix
- *x509.PEMEncodedKey
- *x509.PEMEncodedCertificateAndKey

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-12 15:38:31 +03:00
Utku Ozdemir
b5da686a7b
feat: add actor ID to events & emit an initial empty event
Add a new field `actorID` to the events and populate it with a UUID for the lifecycle actions `reboot`, `reset`, `upgrade` and `shutdown`. This actor ID will be present on all events emitted by this triggered action. We can use this ID later on the client side to be able to track triggered actions.

We also emit an event with an empty payload on the events streaming GRPC endpoint when a client connects. The purpose of this event is to signal to the client that the event streaming has actually started.

Server-side part of siderolabs/talos#5499.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-11 15:14:11 +02:00
Noel Georgi
b62b18a972
feat: bump k8s to v1.25.0-beta.0
Bump k8s to v1.25.0-beta.0

Update most kubernetes `master` references to `controlplane`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-10 22:17:53 +05:30
Dmitriy Matrenichev
7b80a747bc
feat: add protobuf encoding/decoding for Go structs
This commit adds the support for encoding/decoding Go structs with `protobuf:<n>` tags.

Closes #5940

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-10 16:04:08 +03:00
Andrey Smirnov
92314e47bf
refactor: use controllers/resources to feed trustd with data
This is mostly same as the way `apid` consumes certificates generated by
`machined` via COSI API connection.

Service `trustd` consumes two resources:

* `secrets.Trustd` which contains `trustd` server TLS certificates and
  it gets refreshed as e.g. node IP changes
* `secrets.OSRoot` which contains Talos API CA and join token

This PR fixes an issue with `trustd` certs not always including all IPs
of the node, as previously `trustd` certs will only capture addresses of
the node at the moment of `trustd` startup.

Another thing is that refactoring allows to dynamically change API CA
and join token. This needs more work, but `trustd` should now pick up
changes without any additional changes.

Fixes #5863

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 23:45:34 +04:00
Andrey Smirnov
fe2ee3b100
feat: implement MachineStatus resource
Fixes #5789

Example:

```yaml
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: staticPods
              reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready
```

As events (CLI doesn't show full contents):

```
172.20.0.2   cbhf2l6f9lrs738hehfg   talos/runtime/machine.MachineStatusEvent   BOOTING   ready: false, unmet conditions: [time network services]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 18:36:10 +04:00
Artem Chernyshev
ae1bec59e9
feat: allow running only one sequence at a time
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:

| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot                                             | Y    | Y      | Y     | N       |
| reset                                              | Y    | N      | N     | N       |
| upgrade                                            | Y    | N      | N     | N       |

With a small addition that `WithTakeover` is still there.
If set, priority is ignored.

This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 17:21:36 +03:00
Andrey Smirnov
065b59276c
feat: implement packet capture API
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.

There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-19 01:23:09 +04:00
Andrey Smirnov
022581d809
release(v1.2.0-alpha.0): prepare release
This is the official v1.2.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-30 19:01:07 +04:00
Andrey Smirnov
f2997c0f22
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-06 23:27:17 +04:00
Artem Chernyshev
2b03057b91
feat: implement a new mode try in the config manipulation commands
The new mode allows changing the config for a period of time, which
allows trying the configuration and automatically rolling it back in case
if it doesn't work for example.

The mode can only be used with changes that can be applied without a
reboot.

When changed it doesn't write the configuration to disk, only changes it
in memory.
`--timeout` parameter can be used to customize the rollback delay.
The default timeout is 1 minute.

Any consequent configuration change will abort try mode and the last
applied configuration will be used.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-21 20:31:45 +03:00
Artem Chernyshev
2b9722d1f5
feat: add dry-run flag in apply-config and edit commands
Dry run prints out config diff, selected application mode without
changing the configuration.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-14 19:12:57 +03:00
Andrey Smirnov
25d19131d3
release(v1.1.0-alpha.0): prepare release
This is the official v1.1.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-01 18:23:19 +03:00
Tomasz Zurkowski
cc7719c9d0
docs: improve comments in security proto
The existing comments did not match the service definition (they look
like copy paste from another service). I also added a little bit more
comments for the fields in the request and response.

Signed-off-by: Tomasz Zurkowski <zurkowski@google.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-16 14:18:48 +03:00