102 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
6eade3d5ef
chore: add ability to rewrite uuids and set unique tokens for Talos
This PR does those things:
- It allows API calls `MetaWrite` and `MetaRead` in maintenance mode.
- SystemInformation resource now waits for available META
- SystemInformation resource now overwrites UUID from META if there is an override
- META now supports "UUID override" and "unique token" keys
- ProvisionRequest now includes unique token and Talos version

For #7694

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-10 18:17:54 +03:00
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Andrey Smirnov
0f1920bdda
chore: provide a resource to peek into Linux clock adjustments
This is a follow-up for #7567, which won't be backported to 1.5.

This allows to get an output like:

```
$ talosctl -n 172.20.0.5 get adjtimestatus -w
NODE         *   NAMESPACE TYPE            ID     VERSION   OFFSET        ESTERROR   MAXERROR   STATUS               SYNC
172.20.0.5   +   runtime   AdjtimeStatus   node   47        -18.14306ms   0s         191.5ms    STA_PLL | STA_NANO   true
172.20.0.5       runtime   AdjtimeStatus   node   48        -17.109555ms  0s         206.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   49        -16.134923ms  0s         221.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   50        -15.21581ms   0s         236.5ms    STA_PLL | STA_NANO   true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 22:06:53 +04:00
Andrey Smirnov
87fe8f1a2a
feat: implement image generation profiles
Support full configuration for image generation, including image
outputs, support most features (where applicable) for all image output
types, unify image generation process.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 19:13:44 +04:00
Noel Georgi
68e6b98f7d
feat: add security state resource
Add security state resource that describes the state of Talos SecureBoot
and PCR signing key fingerprints.

The UKI fingerprint is currently not populated.

Fixes: #7514

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-31 22:02:08 +05:30
Andrey Smirnov
544cb4fe7d
refactor: accept partial machine configuration
This refactors code to handle partial machine config - only multi-doc
without v1alpha1 config.

This uses improvements from
https://github.com/cosi-project/runtime/pull/300:

* where possible, use `TransformController`
* use integrated tracker to reduce boilerplate

Sometimes fix/rewrite tests where applicable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 17:00:42 +04:00
Andrey Smirnov
786e86f5b8
refactor: rewrite the way Talos acquires the machine configuration
Fixes #7453

The goal is to make it possible to load some multi-doc configuration
from the platform source (or persisted in STATE) before machine acquires
full configuration.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-24 14:26:42 +04:00
Dmitriy Matrenichev
5f34f5b41f
chore: rename api load balancer to KubePrism
Closes #7432

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-07-14 15:23:53 +03:00
Andrey Smirnov
bdb96189fa
refactor: make maintenance service controller-based
Fixes #7430

Introduce a set of resources which look similar to other API
implementations: CA, certs, cert SANs, etc.

Introduce a controller which manages the service based on resource
state.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-10 15:41:52 +04:00
Dmitriy Matrenichev
445f5ad542
feat: support API server load balancer
This commit adds support for API load balancer. Quick way to enable it is during cluster creation using new `api-server-balancer-port` flag (0 by default - disabled). When enabled all API request will be routed across
cluster control plane endpoints.

Closes #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-16 10:09:20 -04:00
Andrey Smirnov
0a99965efb
refactor: replace uncordonNode with controllers
Fixes #7233

Waiting for node readiness now happens in the `MachineStatus` controller
which won't mark the node as ready until Kubernetes `Node` is ready.

Handling cordoning/uncordining happens with help of additional resource
in `NodeApplyController`.

New controller provides reactive `NodeStatus` resource to see current
status of Kubernetes `Node`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 21:48:42 +04:00
Andrey Smirnov
dbaf5c6997
refactor: task labelControlPlane into controllers
See #7233

The controlplane label is simply injected into existing controller-based
node label flow.

For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.

This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-12 15:25:13 +04:00
Andrey Smirnov
f5e3272fce
refactor: task 'updateBootLoader' as controller
Fixes #7232

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-09 15:27:48 +04:00
Andrey Smirnov
e7be6ee7c3
refactor: make event log streaming fully reactive
I ended up completely rewriting the controller, simplifying the flow
(somewhat) so that there's just a single control flow in the controller,
while reading from v1alpha1 events is converted to reading from a
channel.

Fixes #7227

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-08 23:13:33 +04:00
Andrey Smirnov
5dab45e869
refactor: allow kmsg log streaming to be reconfigured on the fly
Fixes #7226

This follows same flow as other similar changes - split out logging
configuration as a separate resource, source it for now in the cmdline.

Rewrite the controller to allow multiple log outputs, add send retries.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-06 15:56:24 +04:00
Dmitriy Matrenichev
8a02ecd4cb
chore: add endpoints balancer controller
This PR adds support for creating a list of API endpoints (each is pair of host and port).

It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.

For #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-05 20:47:52 -04:00
Andrey Smirnov
bab484a405
feat: use stable network interface names
Use `udevd` rules to create stable interface names.

Link controllers should wait for `udevd` to settle down, otherwise link
rename will fail (interface should not be UP).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-01 21:29:12 +04:00
Andrey Smirnov
0bb7e8a5cf
refactor: split config.Provider into Config & Container
See #7230

This is a step towards preparing for multi-doc config.

Split the `config.Provider` interface into parts which have different
implementation:

* `config.Config` accesses the config itself, it might be implemented by
  `v1alpha1.Config` for example
* `config.Container` will be a set of config documents, which implement
  validation, encoding, etc.

`Version()` method dropped, as it makes little sense and it was almost
not used.

`Raw()` method renamed to `RawV1Alpha1()` to support legacy direct
access to `v1alpha1.Config`, next PR will refactor more to make it
return proper type.

There will be many more changes coming up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 16:05:16 +04:00
Utku Ozdemir
62c6e9655c
feat: introduce siderolink config resource & reconnect
Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now).

Introduce a controller for this resource which populates it from the Kernel cmdline.

Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes.

Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses.

Closes siderolabs/talos#7142, siderolabs/talos#7143.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-05-05 17:04:34 +02:00
Andrey Smirnov
860002c735
fix: don't reload control plane pods on cert SANs changes
Fixes #7159

The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.

So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.

There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-05 16:59:09 +04:00
Andrey Smirnov
08ec66c55c
feat: clean up (garbage collect) system images which are not referenced
Fixes #7121

Talos pulls some images on its own (without CRI/kubelet) to the `system`
namespace of the CRI containerd. These images are not visible to the
CRI/kubelet, so we need to clean them up manually.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-04-26 16:47:49 +04:00
Andrey Smirnov
aa14993539
feat: introduce network probes
Network probes are configured with the specs, and provide their output
as a status.

At the moment only platform code can configure network probes.

If any network probes are configured, they affect network.Status
'Connectivity' flag.

Example, create the probe:

```
talosctl -n 172.20.0.3 meta write 0xa '{"probes": [{"interval": "1s", "tcp": {"endpoint": "google.com:80", "timeout": "10s"}}]}'
```

Watch probe status:

```
$ talosctl -n 172.20.0.3 get probe
NODE         NAMESPACE   TYPE          ID                  VERSION   SUCCESS
172.20.0.3   network     ProbeStatus   tcp:google.com:80   5         true
```

With failing probes:

```
$ talosctl -n 172.20.0.3 get probe
NODE         NAMESPACE   TYPE          ID                  VERSION   SUCCESS
172.20.0.3   network     ProbeStatus   tcp:google.com:80   4         true
172.20.0.3   network     ProbeStatus   tcp:google.com:81   1         false
$ talosctl -n 172.20.0.3 get networkstatus
NODE         NAMESPACE   TYPE            ID       VERSION   ADDRESS   CONNECTIVITY   HOSTNAME   ETC
172.20.0.3   network     NetworkStatus   status   5         true      true           true       true

```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-31 15:20:21 +04:00
Utku Ozdemir
9e07832db9
feat: implement summary dashboard
Implement the new summary dashboard with node info and logs.
Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config.

Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key.

Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node.

Disable the network config editor screen in the new dashboard until it is fully implemented with its backend.

Closes siderolabs/talos#4790.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-03-15 13:13:28 +01:00
Andrey Smirnov
80fed31940
feat: include Kubernetes controlplane endpoint as one of the endpoints
These endpoints are used for workers to find the addresses of the
controlplane nodes to connect to `trustd` to issue certificates of
`apid`.

These endpoints today come from two sources:

* discovery service data
* Kubernetes API server endpoints

This PR adds to the list static entry based on the Kubernetes control
plane endpoint in the machine config.

E.g. if the loadbalancer is used for the controlplane endpoint, and that
loadbalancer also proxies requests for port 50001 (trustd), this static
endpoint will provide workers with connectivity to trustd even if the
discovery service is disabled, and Kubernetes API is not up.

If this endpoint doesn't provide any trustd API, Talos will still try
other endpoints.

Talos does server certificate validation when calling trustd,
so including malicious endpoints doesn't cause any harm, as malicious
endpoint can't provider proper server certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-03 21:33:18 +04:00
Andrey Smirnov
2ebe410e93
feat: update COSI to v0.2.0
This brings many fixes, including a new Watch with support for
Bootstapped and Errored event types.

`talosctl` from before this change is still compatible, as there's gRPC
API level backwards compatibility versioning.

New client doesn't yet depend on new event types, so it will work
against Talos 1.2.x.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-29 21:21:59 +04:00
Philipp Sauter
e1e340bdd9
feat: expose Talos node labels as a machine configuration field
We add the `nodeLabels` key to the machine config to allow users to add
node labels to the kubernetes Node object. A controller
reads the nodeLabels from the machine config and applies them via the
kubernetes API.
Older versions of talosctl will throw an unknown keys error if `edit mc`
 is called on a node with this change.

Fixes #6301

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-15 21:25:40 +04:00
Philipp Sauter
4e114ca120
feat: use the etcd member id for etcd operations instead of hostname
We add a controller that provides the etcd member id as a resource
and change the etcd related commands to support member ids next to
hostnames.

Fixes: #6223

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-11-10 19:17:56 +04:00
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
343c55762e
chore: replace talos-systems Go modules with siderolabs
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.

All updates contain no functional changes, just refactorings to adapt to
the new path structure.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-01 12:55:40 +04:00
Philipp Sauter
c6e1702eca
feat: use URL-based manifests to present static pods to the kubelet
Previously static pod manifests were written to and read from a folder
on the disk. We add a controller that cleans up the default static pod
manifests on the disk and serves them as a PodList manifest via HTTP.
The to the manifest is injected into the kubelet. File based static pod
manifests are still supported and may be enabled by setting the key
kubelet -> enableManifestsDirectory in the machine config.

Fixes #5494

Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
2022-10-25 14:30:19 +02:00
Andrey Smirnov
8b2235c3b6
fix: lookup Equinix Metal bond slaves using 'permanent addr'
See #6333

Using permanent address fixes issues with mis-matching the links after
they got bonded.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-26 18:10:39 +04:00
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Andrey Smirnov
da4cd34ef5
feat: update etcd advertised peer addresses on the fly
This allows to update the member information (for the current node) with
new advertised peer URLs as the config changes.

E.g. if the node IP changes, this will update the peer URLs for the
member accordingly.

At the same time any member update requires quorum, so changing IPs can
only be done on node-by-node basis.

If there are no changes to advertised peer URLs, controller does
nothing.

Talos node might still need a reboot to update the listen addresses, as
these are not handled automatically for now.

Fixes #6080

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-22 19:49:51 +04:00
Utku Ozdemir
84e712a9f1
feat: introduce Talos API access from Kubernetes
We add a new CRD, `serviceaccounts.talos.dev` (with `tsa` as short name), and its controller which allows users to get a `Secret` containing a short-lived Talosconfig in their namespaces with the roles they need. Additionally, we introduce the `talosctl inject serviceaccount` command to accept a YAML file with Kubernetes manifests and inject them with Talos service accounts so that they can be directly applied to Kubernetes afterwards. If Talos API access feature is enabled on Talos side, the injected workloads will be able to talk to Talos API.

Closes siderolabs/talos#4422.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-08 18:27:26 +02:00
Andrey Smirnov
92314e47bf
refactor: use controllers/resources to feed trustd with data
This is mostly same as the way `apid` consumes certificates generated by
`machined` via COSI API connection.

Service `trustd` consumes two resources:

* `secrets.Trustd` which contains `trustd` server TLS certificates and
  it gets refreshed as e.g. node IP changes
* `secrets.OSRoot` which contains Talos API CA and join token

This PR fixes an issue with `trustd` certs not always including all IPs
of the node, as previously `trustd` certs will only capture addresses of
the node at the moment of `trustd` startup.

Another thing is that refactoring allows to dynamically change API CA
and join token. This needs more work, but `trustd` should now pick up
changes without any additional changes.

Fixes #5863

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 23:45:34 +04:00
Andrey Smirnov
7795de313a
fix: use controllers/resources for etcd configuration
This extracts etcd configuration and finalized run arguments as
resources managed by controllers.

The biggest change in terms of UX is that Talos now waits for the etcd
configured subnet to be actually available before starting etcd.
Previously etcd quickly failed if the requested subnet was not available
on the host.

Coupled with other fixes (#5951, #5988), this should bring etcd
join/promote sequence back into proper shape.

I also reverted all temporary measures for discovering etcd endpoints,
now etcd join doesn't depend on Kubernetes (once again).

Fixes #5889

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 21:14:43 +04:00
Andrey Smirnov
fe2ee3b100
feat: implement MachineStatus resource
Fixes #5789

Example:

```yaml
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: staticPods
              reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready
```

As events (CLI doesn't show full contents):

```
172.20.0.2   cbhf2l6f9lrs738hehfg   talos/runtime/machine.MachineStatusEvent   BOOTING   ready: false, unmet conditions: [time network services]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 18:36:10 +04:00
Noel Georgi
5ac4947b63
feat: enable default seccomp profile for kubelet
Enable the default seccomp profile provided by the container runtime

Fixes: #5293

Ref: https://kubernetes.io/docs/tutorials/security/seccomp/

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-28 21:45:49 +05:30
Andrey Smirnov
3addea83b9
feat: introduce support for Talos API access from Kubernetes
This is a first step: providing a service to access Talos API.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-26 00:02:19 +04:00
Andrey Smirnov
a2aea97263
fix: write etcd PKI files in a controller
Instead of writing PKI "once" around the startup time, keep writing PKI
files as the certificates get updated. `etcd` is able to reload
certificates, so we should keep updating them e.g. if the hostname/IPs
change over time.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-21 18:37:45 +04:00
Noel Georgi
184e113f35
chore: disable systeminfo controller in container
Disable systeminfo controller in container mode

Fixes: #5849

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-01 19:22:58 +05:30
Dmitriy Matrenichev
4dbbf4ac50
chore: add generic methods and use them part #2
Use things from #5702.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 23:10:02 +08:00
Andrey Smirnov
c1aed62405
fix: wait for /var to be mounted in kubelet service controller
This is a cosmetic fix: when `KubeletServiceController` tries to write
files to `/etc/kubernetes` before `/var` mounted, it would fail.
Controller will be restarted, but each restart involves a backoff on
each restart which gets longer with each restart.

On the first boot, or when EPHEMERAL is encrypted, mounting might take
considerable time (seconds), so during that time controller might enter
such long backoff timeout that it will delay whole boot sequence - it
won't finish before `kubelet` is started.

By waiting for `EPHEMERAL` to be mounted before starting the controller
we eliminate long backoff cycles.

Also fix a bug when `StartAllServices` task might start a kubelet early
(before `KubeletServiceController` is actually going to start it).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-02 01:00:47 +04:00
Noel Georgi
20286c9082
feat: add cpu/ram info as resource
Expose processor and memory information from SMBIOS as Talos resources.

Output from QEMU:

```bash
❯ talosctl -n 10.5.0.2 get cpu
NODE       NAMESPACE   TYPE        ID      VERSION   MANUFACTURER   MODEL        CORES   THREADS
10.5.0.2   hardware    Processor   CPU-0   1         QEMU           pc-q35-6.2   4       1

❯ talosctl -n 10.5.0.2 get ram
NODE       NAMESPACE   TYPE     ID       VERSION   MANUFACTURER   MODEL   SIZE
10.5.0.2   hardware    Memory   DIMM-0   1         QEMU                   2048

❯ talosctl -n 10.5.0.2 get cpu CPU-0 -o yaml
node: 10.5.0.2
metadata:
    namespace: hardware
    type: Processors.hardware.talos.dev
    id: CPU-0
    version: 1
    owner: hardware.SystemInfoController
    phase: running
    created: 2022-05-19T13:58:12Z
    updated: 2022-05-19T13:58:12Z
spec:
    socket: CPU 0
    manufacturer: QEMU
    productName: pc-q35-6.2
    maxSpeed: 2000
    bootSpeed: 2000
    status: 65
    coreCount: 4
    coreEnabled: 4
    threadCount: 1

❯ talosctl -n 10.5.0.2 get ram DIMM-0 -o yaml
node: 10.5.0.2
metadata:
    namespace: hardware
    type: Memories.hardware.talos.dev
    id: DIMM-0
    version: 1
    owner: hardware.SystemInfoController
    phase: running
    created: 2022-05-19T13:58:12Z
    updated: 2022-05-19T13:58:12Z
spec:
    size: 2048
    deviceLocator: DIMM 0
    bankLocator: ""
    speed: 0
    manufacturer: QEMU
```

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-05-19 22:48:27 +05:30
Artem Chernyshev
396e1386cf
feat: implement network device selector
Fixes: https://github.com/siderolabs/talos/issues/4203

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-05-18 13:46:52 +03:00
Andrey Smirnov
c156580a38
fix: split regular network operation configuration and virtual IP
The problem is that Virtual IP operator configuration might require
accessing platform metadata server (e.g. on Equinix Metal), while
regular operator sets up critical operators like DHCP.

The issue observed on Equinix Metal without the split:

* on initial boot, DHCP is set up on `eth2`
* platform network configuration is fetched and `bond0` configuration is
created
* node IP is assigned both to `eth2` and `bond0`, while `eth2` is a
slave to `bond0`
* networking is broken
* operator config controller is stuck trying to fetch EM VIP
configuration, as the network is broken, it fails to do so, but retries
for 3 minutes (in `download.Download`)
* network is broken for 3 minutes until `OperatorConfig` controller is
unblocked and cleans up DHCP operator for `eth2` as it should

The issue here is that DHCP operator setup is much more tricky on one
hand (depends on link status, other configuration items, etc.), while
VIP operator depends on DHCP operator setup, as it needs outbound
networking.

By splitting the controllers, we split the flows and remove
dependencies.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-18 20:24:47 +03:00
Andrey Smirnov
7917b1aca0
feat: support admission control configuration and Pod Security admission
Fixes #5003

This implements a way to configure API server admission plugins via
Talos machine configuration.

If Pod Security admission is enabled, default cluster-wide policy is
generated which enforces baseline policy.

Policy can be overridden per-namespace.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-24 16:18:15 +03:00
Andrey Smirnov
b2bf3117ff
feat: implement extension services
Fixes #4694

User services run alongside with Talos system services.
Every user service container root filesystem should be already present
in the Talos root filesystem.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-22 23:11:20 +03:00
Seán C McCord
a5fb271ac8
feat: enable protectKernelDefaults in kubelet_spec
Enable the kubelet's builtin kernel configuration checks.
Also limits streaming connection timeout.

Fixes #5002
Fixes #4990

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2022-02-18 11:03:06 -05:00
Andrey Smirnov
492b156dab
feat: implement static pods via machine configuration
Fixes #4727

On worker nodes, static pods are injected, but status can't be monitored
by Talos. On control plane nodes full status is available via
`StaticPodStatus`.

Pod definition is left as `Unstructured` in the machine configuration,
and no specific validation is performed to avoid pulling in Kubernetes
libraries into Talos machinery package.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-10 18:37:19 +03:00