6 Commits

Author SHA1 Message Date
Brad Beam
d8249c8779
refactor(init): Allow kubeadm init on controlplane (#658)
* refactor(init): Allow kubeadm init on controlplane

This shifts the cluster formation from init(bootstrap) and join(control plane)
to init(control plane).

This makes use of the previously implemented initToken to provide a TTL for
cluster initialization to take place and allows us to mostly treat all control
plane nodes equal. This also sets up the path for us to handle master upgrades
and not be concerned with odd behavior when upgrading the previously defined
init node.

To facilitate kubeadm init across all control plane nodes, we make use of the
initToken to run `kubeadm init phase certs` command to generate any missing
certificates once. All other control plane nodes will attempt to sync the
necessary certs/files via all defined trustd endpoints and being the startup
process.

* feat(init): Add service runner context to PreFunc

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-24 16:05:49 -05:00
Andrey Smirnov
a0188aff73
feat(init): implement service dependencies, correct start and shutdown (#680)
This PR introduces dependencies between the services. Now each service
has two virtual events associated with it: 'up' (running and healthy)
and 'down' (finished or failed). These events are used to establish
correct order via conditions abstraction.

Service image unpacking was moved into 'pre' stage simplifying
`init/main.go`, service images are now closer to the code which runs the
service itself.

Step 'pre' now runs after 'wait' step, and service dependencies are now
mixed into other conditions of 'wait' step on startup.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-24 19:17:52 +03:00
Andrey Smirnov
06bff97a3f
refactor: change conditions to be interface, add descriptions (#677)
Conditions are now implemented as interface with two methods: `Wait` for
condition to be true (cancelable via context) and 'String' which
describes what condition is waiting for.

Generic 'WaitForAll' was implemented to wait for multiple conditions at
once.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-21 21:25:08 +03:00
Andrey Smirnov
75b2ce7fd2
feat(init): implement services list API and osctl service CLI (#662)
This returns list of all the services registered, with their current
status, past events, health state, etc.

New CLI is `osctl service [<id>]`: without `<id>` it prints list of all
the services, with specific `<id>` it provides details for a service.

I decided to create "parallel" data structures in protobuf as Go
structures don't map nicely onto what protoc generates: pointers vs.
values, additional fields like mutexes, etc. Probably there's a better
approach, I'm open for it.

For CLI, I tried to keep CLI stuff in `cmd/` package, and I also created
simple wrapper to remove duplicated code which sets up client for each
command.

Examples:

```
$ osctl service
SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
containerd   Running   OK       21s ago       Health check successful
kubeadm      Running   ?        2s ago        Started task kubeadm (PID 280) for container kubeadm
kubelet      Running   ?        0s ago        Started task kubelet (PID 383) for container kubelet
ntpd         Running   ?        14s ago       Started task ntpd (PID 129) for container ntpd
osd          Running   ?        14s ago       Started task osd (PID 126) for container osd
proxyd       Waiting   ?        14s ago       Waiting for conditions
trustd       Running   ?        14s ago       Started task trustd (PID 125) for container trustd
udevd        Running   ?        14s ago       Started task udevd (PID 130) for container udevd
```

```
$ osctl service proxyd
ID       proxyd
STATE    Running
HEALTH   ?
EVENTS   [Preparing]: Running pre state (22s ago)
         [Waiting]: Waiting for conditions (22s ago)
         [Preparing]: Creating service runner (6s ago)
         [Running]: Started task proxyd (PID 461) for container proxyd (6s ago)
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-17 18:01:12 +03:00
Andrey Smirnov
1dde9f8cc0 feat(init): implement health checks for services (#656)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-15 09:30:35 -07:00
Andrey Smirnov
505b5022c4
feat(init): implement graceful shutdown of 'init' (#562)
Most crucial changes in `init/main.go`: on shutdown now Talos tries
to stop gracefully all the services. All the shutdown paths are unified,
including poweroff, reboot and panic handling on startup.

While I was at it, I also fixed bug with containers failing to start
when old snapshot is still around.

Service lifecycle is wrapped with `ServiceRunner` object now which
handles state transitions and captures events related to state changes.
Every change goes to the log as well.

There's no way to capture service state yet, but that is planned to be
implemented as RPC API for `init` which is exposed via `osd` to `osctl`.

Future steps:

1. Implement service dependencies for correct startup order and
shutdown order.

2. Implement service health, so that we can say "start trustd when
containerd is up and healthy".

3. Implement gRPC API for init, expose via osd (service status, restart,
poweroff, ...)

4. Impement 'String()' for conditions, so that we can see what service
is waiting on right now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-26 16:53:19 +03:00