Service `osd` doesn't have access to rootfs, as it is running in a
container, so move API to `init` which has unconstrained access to
rootfs. (This is in line with another API, `osctl cp`).
Fixes: #752
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Actual API is implemented in the `init`, as it has access to root
filesystem. `osd` proxies API back to `init` with some tricks to support
grpc streaming.
Given some absolute path, `init` produces and streams back .tar.gz
archive with filesystem contents.
`osctl cp` works in two modes. First mode streams data to stdout, so
that we can do e.g.: `osctl cp /etc - | tar tz`. Second mode extracts
archive to specified location, dropping ownership info and adjusting
permissions a bit. Timestamps are not preserved.
If full dump with owner/permisisons is required, it's better to stream
data to `tar xz`, for quick and dirty look into filesystem contents
under unprivileged user it's easier to use in-place extraction.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
We run tests in parallel mode (`go test -p 4`), default is to run in
parallel in fact. But tests are not isolated, as some of them launch
containerd on a fixed file socket (as socket path is hardcoded in
Talos), and that might lead to any weirdness when tests try to
launch containerd concurrently on the same file socket.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Instead of pulling a full list of containers, implement inspector query
for a single container following the spec to build display name.
Also adds many more tests for the container inspector.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
I couldn't find any use for the `timeout` flag nor the value passed in
the API, but it block much more useful and present in other commands
flag 'target'.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This reverts commit f200eb7a8a0b7c2d29710f695000eb7680ce8b7d.
grpc can't send back both response and an error.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
For #711, this should be a complete fix - waiting for container to be
started.
For #712, this should be more of a workaround - playing with timeouts to
hit the failure less likely. Idea of the test is that health check
should be aborted on timeout (1ms) while health check succeeds if not
aborted in 50ms. Before the fix it was 1ms/10ms, but still concurrently
there was a chance that goroutine exits successfully after 10ms while
1ms context deadline is not reached.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: #689, #690
Refactor container inspection code into a package of its own with some
rudimentary tests. Use this package consistently in osd commands dealing
with containers.
Improvements for the next PRs:
* implement API to fetch info about container by ID (to avoid fetching
full list)
* handle and display errors on client side, not to the log of the
server
* more tests, including k8s containers (how can we do that?)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This resolves extra messages when user does ^C to stop osctl. Message is
still printed on the second ^C and process is aborted on the third.
For the `logs` command, as it is streaming, suppress context canceled
error (before context changes process was crashing before printing an error).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
If some of the conditions are already satisfied, we can update the
description to reflect that, e.g.:
```
EVENTS [Waiting]: Waiting for service "containerd" to be "up", service "networkd" to be "up", service "trustd" to be "up" (14m17s ago)
[Waiting]: Waiting for service "trustd" to be "up" (14m16s ago)
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes a race condition when kubeadm doesn't start waiting for
networkd forever.
If one service enters 'Waiting' state after another service already
finishes running, we should consider 'Finished' service to be 'up' in
the sense it has finished successfully (vs. 'Failed' state which is not
'up').
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This provides a bit better handling for the handing grpc
requests (or just slow requests):
```
$ osctl-linux-amd64 --talosconfig talosconfig version
Client:
Tag: ad410fb-dirty
SHA: ad410fb-dirty
Built:
Go version: go1.12.5
OS/Arch: linux/amd64
^CSignal received, aborting, press Ctrl+C once again to abort immediately...
error getting version: rpc error: code = Canceled desc = context canceled
```
For now we catch `SIGINT` & `SIGTERM`. Second signal kills process
immediately as signal handler is removed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves cli code (rendering output, etc.) out of 'client' package, so
that client package is usable outside of cli.
Consistently accept context as first param to API methods, so that we
can build graceful request cancellation.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Remove duplicated code which was setting up grpc client with common
method. Should have no functional changes otherwise.
Add args len check where missing.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds generic goroutine runner which simply wraps service as process
goroutine. It supports log redirection and basic panic handling.
DHCP-related part of the network package was slightly adjusted to run as
service with logging updates (to redirect logs to a file) and context
canceling.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
* refactor(init): Allow kubeadm init on controlplane
This shifts the cluster formation from init(bootstrap) and join(control plane)
to init(control plane).
This makes use of the previously implemented initToken to provide a TTL for
cluster initialization to take place and allows us to mostly treat all control
plane nodes equal. This also sets up the path for us to handle master upgrades
and not be concerned with odd behavior when upgrading the previously defined
init node.
To facilitate kubeadm init across all control plane nodes, we make use of the
initToken to run `kubeadm init phase certs` command to generate any missing
certificates once. All other control plane nodes will attempt to sync the
necessary certs/files via all defined trustd endpoints and being the startup
process.
* feat(init): Add service runner context to PreFunc
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This PR introduces dependencies between the services. Now each service
has two virtual events associated with it: 'up' (running and healthy)
and 'down' (finished or failed). These events are used to establish
correct order via conditions abstraction.
Service image unpacking was moved into 'pre' stage simplifying
`init/main.go`, service images are now closer to the code which runs the
service itself.
Step 'pre' now runs after 'wait' step, and service dependencies are now
mixed into other conditions of 'wait' step on startup.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>