This PR adds the support for CoreDNS forwarding to host DNS. We try to bind on 9th address on the first element from
`serviceSubnets` and create a simple service so k8s will not attempt to rebind it.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-authored-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This controller combines kobject events, and scan of `/sys/block` to
build a consistent list of available block devices, updating resources
as the blockdevice changes.
Based on these resources the next step can run probe on the blockdevices
as they change to present a consistent view of filesystems/partitions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there.
Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries
full Caddy server and various other modules that we don't need in Talos. On our side we implement
random selection of the actual dns and request forwarding.
Closes#7693
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
It works well for small clusters, but with bigger clusters it puts too
much load on the discovery service, as it has quadratic complexity in
number of endpoints discovered/reported from each member.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
We already have the code which supports custom enums, so let's extend it to support custom enums in slices and
fix the NfTablesConntrackStateMatch proto definition.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Many changes to the nftables backend which will be used in the follow-up
PR with #4421.
1. Add support for chain policy: drop/accept.
2. Properly handle match on all IPs in the set (`0.0.0.0/0` like).
3. Implement conntrack state matching.
4. Implement multiple ifname matching in a single rule.
5. Implement anonymous counters.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Implement initial set of backend controllers/resources to handle
nftables chains/rules etc.
Replace the KubeSpan nftables operations with controller-based.
See #4421
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This PR does those things:
- It allows API calls `MetaWrite` and `MetaRead` in maintenance mode.
- SystemInformation resource now waits for available META
- SystemInformation resource now overwrites UUID from META if there is an override
- META now supports "UUID override" and "unique token" keys
- ProvisionRequest now includes unique token and Talos version
For #7694
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Add security state resource that describes the state of Talos SecureBoot
and PCR signing key fingerprints.
The UKI fingerprint is currently not populated.
Fixes: #7514
Signed-off-by: Noel Georgi <git@frezbo.dev>
Fixes#7430
Introduce a set of resources which look similar to other API
implementations: CA, certs, cert SANs, etc.
Introduce a controller which manages the service based on resource
state.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7379
Add possibility to configure the controlplane static pod resources via
APIServer, ControllerManager and Scheduler configs.
Signed-off-by: LukasAuerbeck <17929465+LukasAuerbeck@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:
```
systemDiskEncryption:
ephemeral:
keys:
- kms:
endpoint: https://1.2.3.4:443
slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
This commit adds support for API load balancer. Quick way to enable it is during cluster creation using new `api-server-balancer-port` flag (0 by default - disabled). When enabled all API request will be routed across
cluster control plane endpoints.
Closes#7191
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fixes#7233
Waiting for node readiness now happens in the `MachineStatus` controller
which won't mark the node as ready until Kubernetes `Node` is ready.
Handling cordoning/uncordining happens with help of additional resource
in `NodeApplyController`.
New controller provides reactive `NodeStatus` resource to see current
status of Kubernetes `Node`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
See #7233
The controlplane label is simply injected into existing controller-based
node label flow.
For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.
This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
I ended up completely rewriting the controller, simplifying the flow
(somewhat) so that there's just a single control flow in the controller,
while reading from v1alpha1 events is converted to reading from a
channel.
Fixes#7227
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7226
This follows same flow as other similar changes - split out logging
configuration as a separate resource, source it for now in the cmdline.
Rewrite the controller to allow multiple log outputs, add send retries.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR adds support for creating a list of API endpoints (each is pair of host and port).
It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.
For #7191
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Use `udevd` rules to create stable interface names.
Link controllers should wait for `udevd` to settle down, otherwise link
rename will fail (interface should not be UP).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is controlled with a feature flag which gets enabled automatically
for Talos 1.5+.
Fixes#7181
If enabled, configures kubelet to use project quotas to track xfs volume
usage, which is much more efficient than doing `du` periodically.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
columns in the table output
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now).
Introduce a controller for this resource which populates it from the Kernel cmdline.
Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes.
Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses.
Closessiderolabs/talos#7142, siderolabs/talos#7143.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fixes#7159
The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.
So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.
There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Network probes are configured with the specs, and provide their output
as a status.
At the moment only platform code can configure network probes.
If any network probes are configured, they affect network.Status
'Connectivity' flag.
Example, create the probe:
```
talosctl -n 172.20.0.3 meta write 0xa '{"probes": [{"interval": "1s", "tcp": {"endpoint": "google.com:80", "timeout": "10s"}}]}'
```
Watch probe status:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 5 true
```
With failing probes:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 4 true
172.20.0.3 network ProbeStatus tcp:google.com:81 1 false
$ talosctl -n 172.20.0.3 get networkstatus
NODE NAMESPACE TYPE ID VERSION ADDRESS CONNECTIVITY HOSTNAME ETC
172.20.0.3 network NetworkStatus status 5 true true true true
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the `network.HostnameSpecController`.
This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered
issues and missing features in the upstream `insomniacslk/dhcp` library, the
fixes and improvements for which are now finally merged.
Sending hostname updates have been tested against `dnsmasq` and the built-in
DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against `dnsmasq`.
Signed-off-by: Dennis Marttinen <twelho@welho.tech>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement the new summary dashboard with node info and logs.
Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config.
Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key.
Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node.
Disable the network config editor screen in the new dashboard until it is fully implemented with its backend.
Closessiderolabs/talos#4790.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Use a global instance, handle loading/saving META in global context.
Deprecate legacy syslinux ADV, provide an easier interface for
consumers.
Expose META as resources.
Fix the bootloader revert process (it was completely broken for quite a
while :sad:).
This is a first step which mostly does preparation work, real changes
will come in the next PRs:
* add APIs to write to META
* consume META keys for platform network config for `metal`
* custom key for URL `${code}`
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
`structprotogen` now supports generating enums directly instead of using predeclared file and hardcoded types. To use this functionality, simply put `structprotogen:gen_enum` in the comment above const block, you want to have the proto definitions for.
Closes#6215
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This feature allows us to use only IPv4 or IPv6 stack to reach the peers.
Also, it can help to not share the node-specific IPs,
which cannot be accessible at all.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
We add the `nodeLabels` key to the machine config to allow users to add
node labels to the kubernetes Node object. A controller
reads the nodeLabels from the machine config and applies them via the
kubernetes API.
Older versions of talosctl will throw an unknown keys error if `edit mc`
is called on a node with this change.
Fixes#6301
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add a controller that provides the etcd member id as a resource
and change the etcd related commands to support member ids next to
hostnames.
Fixes: #6223
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>