This is controlled with a feature flag which gets enabled automatically
for Talos 1.5+.
Fixes#7181
If enabled, configures kubelet to use project quotas to track xfs volume
usage, which is much more efficient than doing `du` periodically.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
columns in the table output
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now).
Introduce a controller for this resource which populates it from the Kernel cmdline.
Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes.
Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses.
Closessiderolabs/talos#7142, siderolabs/talos#7143.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fixes#7159
The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.
So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.
There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Describe scaling down Talos cluster.
Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Updated documentation, what's new, etc.
Also fix some minor UI issues in the dashboard.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: https://github.com/siderolabs/talos/issues/7017
Should allow external services to detect which user block devices might
need to be wiped during reset.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Network probes are configured with the specs, and provide their output
as a status.
At the moment only platform code can configure network probes.
If any network probes are configured, they affect network.Status
'Connectivity' flag.
Example, create the probe:
```
talosctl -n 172.20.0.3 meta write 0xa '{"probes": [{"interval": "1s", "tcp": {"endpoint": "google.com:80", "timeout": "10s"}}]}'
```
Watch probe status:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 5 true
```
With failing probes:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 4 true
172.20.0.3 network ProbeStatus tcp:google.com:81 1 false
$ talosctl -n 172.20.0.3 get networkstatus
NODE NAMESPACE TYPE ID VERSION ADDRESS CONNECTIVITY HOSTNAME ETC
172.20.0.3 network NetworkStatus status 5 true true true true
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Update NVIDIA install docs and add an example of setting `nvidia` as the
default runtimeclass.
NVIDIA doesn't have published images of vectoradd for CUDA 12, replacing
example with running `nvidia-smi` command.
Signed-off-by: Noel Georgi <git@frezbo.dev>
talosctl netstat -k show all host and non-hostnetwork pods sockets/connections.
talosctl netstat namespace/pod shows sockets/connections of a specific pod +
autocompletes in the shell.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the `network.HostnameSpecController`.
This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered
issues and missing features in the upstream `insomniacslk/dhcp` library, the
fixes and improvements for which are now finally merged.
Sending hostname updates have been tested against `dnsmasq` and the built-in
DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against `dnsmasq`.
Signed-off-by: Dennis Marttinen <twelho@welho.tech>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Unify getting environment variables, support passing environment
variables via kernel args.
Fixes#6984
See #6999
For META this will be used to pass environment variables to the
installer for ISO images (or PXE booting).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This allows to put keys to META partition.
META contents can be viewed with `talosctl get metakeys`.
There is not real usecase for it yet, but the next PRs will introduce
two special keys which can be written:
* platform network config for `metal`
* `${code}` variable
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement the new summary dashboard with node info and logs.
Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config.
Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key.
Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node.
Disable the network config editor screen in the new dashboard until it is fully implemented with its backend.
Closessiderolabs/talos#4790.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Use a global instance, handle loading/saving META in global context.
Deprecate legacy syslinux ADV, provide an easier interface for
consumers.
Expose META as resources.
Fix the bootloader revert process (it was completely broken for quite a
while :sad:).
This is a first step which mostly does preparation work, real changes
will come in the next PRs:
* add APIs to write to META
* consume META keys for platform network config for `metal`
* custom key for URL `${code}`
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Adds a default image URL and ensures that an empty image URL is not
sent when calling `talosctl upgrade`.
Fixes#6912
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR adds first 12 symbols from container ID and adds them to `talosctl -k containers` each container output.
That way we can ensure that we get the logs from proper container even if there is a newer one.
Closes#6886
Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Add cilium e2e tests. The existing cilium check was very old, update to
latest cilium version and also add a test for KPR strict mode.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This introduces a new role for Talos API which fills the gap between
`os:reader` and `os:admin` roles.
Fixes#6898
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: https://github.com/siderolabs/talos/issues/6815
Additionally, make it possible to run reset in maintenance mode: to
enable a way for resetting system disk and remove all traces of Talos
from it.
The new reset flow works in a separate sequence, changed disk probe
lookup to check the boot partition instead of the ephemeral one.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>