This reverts commit f200eb7a8a0b7c2d29710f695000eb7680ce8b7d.
grpc can't send back both response and an error.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
For #711, this should be a complete fix - waiting for container to be
started.
For #712, this should be more of a workaround - playing with timeouts to
hit the failure less likely. Idea of the test is that health check
should be aborted on timeout (1ms) while health check succeeds if not
aborted in 50ms. Before the fix it was 1ms/10ms, but still concurrently
there was a chance that goroutine exits successfully after 10ms while
1ms context deadline is not reached.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: #689, #690
Refactor container inspection code into a package of its own with some
rudimentary tests. Use this package consistently in osd commands dealing
with containers.
Improvements for the next PRs:
* implement API to fetch info about container by ID (to avoid fetching
full list)
* handle and display errors on client side, not to the log of the
server
* more tests, including k8s containers (how can we do that?)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This resolves extra messages when user does ^C to stop osctl. Message is
still printed on the second ^C and process is aborted on the third.
For the `logs` command, as it is streaming, suppress context canceled
error (before context changes process was crashing before printing an error).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
If some of the conditions are already satisfied, we can update the
description to reflect that, e.g.:
```
EVENTS [Waiting]: Waiting for service "containerd" to be "up", service "networkd" to be "up", service "trustd" to be "up" (14m17s ago)
[Waiting]: Waiting for service "trustd" to be "up" (14m16s ago)
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes a race condition when kubeadm doesn't start waiting for
networkd forever.
If one service enters 'Waiting' state after another service already
finishes running, we should consider 'Finished' service to be 'up' in
the sense it has finished successfully (vs. 'Failed' state which is not
'up').
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This provides a bit better handling for the handing grpc
requests (or just slow requests):
```
$ osctl-linux-amd64 --talosconfig talosconfig version
Client:
Tag: ad410fb-dirty
SHA: ad410fb-dirty
Built:
Go version: go1.12.5
OS/Arch: linux/amd64
^CSignal received, aborting, press Ctrl+C once again to abort immediately...
error getting version: rpc error: code = Canceled desc = context canceled
```
For now we catch `SIGINT` & `SIGTERM`. Second signal kills process
immediately as signal handler is removed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves cli code (rendering output, etc.) out of 'client' package, so
that client package is usable outside of cli.
Consistently accept context as first param to API methods, so that we
can build graceful request cancellation.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Remove duplicated code which was setting up grpc client with common
method. Should have no functional changes otherwise.
Add args len check where missing.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds generic goroutine runner which simply wraps service as process
goroutine. It supports log redirection and basic panic handling.
DHCP-related part of the network package was slightly adjusted to run as
service with logging updates (to redirect logs to a file) and context
canceling.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
* refactor(init): Allow kubeadm init on controlplane
This shifts the cluster formation from init(bootstrap) and join(control plane)
to init(control plane).
This makes use of the previously implemented initToken to provide a TTL for
cluster initialization to take place and allows us to mostly treat all control
plane nodes equal. This also sets up the path for us to handle master upgrades
and not be concerned with odd behavior when upgrading the previously defined
init node.
To facilitate kubeadm init across all control plane nodes, we make use of the
initToken to run `kubeadm init phase certs` command to generate any missing
certificates once. All other control plane nodes will attempt to sync the
necessary certs/files via all defined trustd endpoints and being the startup
process.
* feat(init): Add service runner context to PreFunc
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This PR introduces dependencies between the services. Now each service
has two virtual events associated with it: 'up' (running and healthy)
and 'down' (finished or failed). These events are used to establish
correct order via conditions abstraction.
Service image unpacking was moved into 'pre' stage simplifying
`init/main.go`, service images are now closer to the code which runs the
service itself.
Step 'pre' now runs after 'wait' step, and service dependencies are now
mixed into other conditions of 'wait' step on startup.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Conditions are now implemented as interface with two methods: `Wait` for
condition to be true (cancelable via context) and 'String' which
describes what condition is waiting for.
Generic 'WaitForAll' was implemented to wait for multiple conditions at
once.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Stream chunker should be cancellable at any point of execution, plus it
should be stop chunking on EOF.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This started as a simple unit-test for file chunker, but the first test
hung immediately, so I started looking into the code.
One problem was that when entering inotify() code, ctx cancel wasn't
considered. Another problem is that remove fsnotify was never triggered,
but I saw that with unit-test later.
Small nit was that inotify() was initialized every time we got to EOF,
which is not efficient for "follow" mode.
So I moved inotify into the main loop, and plugged context cancel watch
into the place when chunk is delivered. Chunker code is supposed to
block in two places: when it tries to deliver next chunk (as client
might be slow to recieve buffers) or when there's no new data (on
inotify). So it makes sense to assert context canceled condition in both
cases.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This returns list of all the services registered, with their current
status, past events, health state, etc.
New CLI is `osctl service [<id>]`: without `<id>` it prints list of all
the services, with specific `<id>` it provides details for a service.
I decided to create "parallel" data structures in protobuf as Go
structures don't map nicely onto what protoc generates: pointers vs.
values, additional fields like mutexes, etc. Probably there's a better
approach, I'm open for it.
For CLI, I tried to keep CLI stuff in `cmd/` package, and I also created
simple wrapper to remove duplicated code which sets up client for each
command.
Examples:
```
$ osctl service
SERVICE STATE HEALTH LAST CHANGE LAST EVENT
containerd Running OK 21s ago Health check successful
kubeadm Running ? 2s ago Started task kubeadm (PID 280) for container kubeadm
kubelet Running ? 0s ago Started task kubelet (PID 383) for container kubelet
ntpd Running ? 14s ago Started task ntpd (PID 129) for container ntpd
osd Running ? 14s ago Started task osd (PID 126) for container osd
proxyd Waiting ? 14s ago Waiting for conditions
trustd Running ? 14s ago Started task trustd (PID 125) for container trustd
udevd Running ? 14s ago Started task udevd (PID 130) for container udevd
```
```
$ osctl service proxyd
ID proxyd
STATE Running
HEALTH ?
EVENTS [Preparing]: Running pre state (22s ago)
[Waiting]: Waiting for conditions (22s ago)
[Preparing]: Creating service runner (6s ago)
[Running]: Started task proxyd (PID 461) for container proxyd (6s ago)
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>