49 Commits

Author SHA1 Message Date
Andrew Rynhard
bf8fc1dcbd chore: lint protobuf definitions
This adds linting to our protobuf definitions via prototool.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:12:36 -07:00
Brad Beam
692571bdec feat(networkd): Add grpc endpoint
Allows us to list routes and interface details

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-25 19:48:08 -07:00
Brad Beam
d36007fb29 feat(osd): Add ntpd client
Allows us to access ntp api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-25 13:38:34 -07:00
Andrew Rynhard
794c7231f5 feat: run dedicated instance of containerd for system services
In order to facilitate upgrades and resets that are capable of
manipulating the system block device, we need to run an instance of
containerd that has zero dependencies on the disk. We run containerd
purely in memory for running system services.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 12:32:59 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrey Smirnov
9c63f4ed0a feat(init): implement complete API for service lifecycle (start/stop)
It is now possible to `start`/`stop`/`restart` any service via `osctl`
commands.

There are some changes in `ServiceRunner` to support re-use (re-entering
running state). `Services` singleton now tracks service running state to
avoid calling `Start()` on already running `ServiceRunner` instance.
Method `Start()` was renamed to `LoadAndStart()` to break up service
loading (adding to the list of service) and actual service start.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-01 11:16:57 -07:00
Andrew Rynhard
b4383e35db feat: move df API to init
This change allows for more accurate mount reporting as /proc/mounts is
a symlink to /proc/self/mounts and contains mounts that are relative to
the running process. In our case this was osd. This caused inaccurate
reporting of mounts since they were relative to osd when we really
wanted mounts relative to machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-24 15:28:37 -07:00
Andrew Rynhard
8e8aae98dd feat: add machined
This commit splits our current init into init and machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-16 13:12:21 -07:00
Brad Beam
58537faa8b fix(init): Fix routes endpoint
Temporary workaround while we get more information on the
specifics for what is failing.

Ref: #795
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-07-15 07:35:56 -07:00
Andrew Rynhard
5d8ee0a3a5 fix: use existing logic to perform reset
This PR moves the reset API to the init API definition.
It leverages the same code we use for upgrades.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-04 18:26:14 -07:00
Andrey Smirnov
5d91d762ce feat(osd): implement container metrics for CRI inspector (#824)
This refactors metrics interface to remove containerd-specific stuff and
make it common for CRI & containerd.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-04 11:25:15 -07:00
Andrey Smirnov
237e903f91 feat(osd): implement CRI inspector for containers (#817)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-02 15:48:00 -07:00
Seán C. McCord
91d5e7e6ef TLS renew (#807)
Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-07-02 15:35:27 -07:00
Andrey Smirnov
0662af19d1 chore: seed math.rand PRNG on startup in every service (#801)
This is important as otherwise `math/rand` outputs predictable sequence
each time.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-28 11:03:15 -07:00
Andrey Smirnov
17f28d3461 feat(osctl): improve output of stats and ps commands (#788)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-26 15:37:54 -07:00
Seán C. McCord
81163cefb4 feat(osd): extend Routes API (#756)
Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-06-22 08:03:13 -07:00
Andrey Smirnov
76071abbb8
feat(init): move 'ls' API to init from osd (#755)
Service `osd` doesn't have access to rootfs, as it is running in a
container, so move API to `init` which has unconstrained access to
rootfs. (This is in line with another API, `osctl cp`).

Fixes: #752

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-21 22:29:39 +03:00
Andrey Smirnov
9ed45f7090 feat(osctl): implement 'cp' to copy files out of the Talos node (#740)
Actual API is implemented in the `init`, as it has access to root
filesystem. `osd` proxies API back to `init` with some tricks to support
grpc streaming.

Given some absolute path, `init` produces and streams back .tar.gz
archive with filesystem contents.

`osctl cp` works in two modes. First mode streams data to stdout, so
that we can do e.g.: `osctl cp /etc - | tar tz`. Second mode extracts
archive to specified location, dropping ownership info and adjusting
permissions a bit. Timestamps are not preserved.

If full dump with owner/permisisons is required, it's better to stream
data to `tar xz`, for quick and dirty look into filesystem contents
under unprivileged user it's easier to use in-place extraction.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-20 17:02:58 -07:00
Andrey Smirnov
070cbc9d60
refactor(osd): implement container inspector for a single container (#720)
Instead of pulling a full list of containers, implement inspector query
for a single container following the spec to build display name.

Also adds many more tests for the container inspector.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-17 17:54:28 +03:00
Andrey Smirnov
0c0a0340b2
fix(osctl): allow '-target' flag for osctl restart (#732)
I couldn't find any use for the `timeout` flag nor the value passed in
the API, but it block much more useful and present in other commands
flag 'target'.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-14 21:37:57 +03:00
Andrey Smirnov
fb320a894b
fix(osctl): Revert "display non-fatal errors from ps/stats in osctl (#724)" (#727)
This reverts commit f200eb7a8a0b7c2d29710f695000eb7680ce8b7d.

grpc can't send back both response and an error.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-07 22:50:05 +03:00
Seán C. McCord
532a53bfaf feat(init): Implement 'ls' command (#721)
Fixes #719

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-06-07 10:19:20 -07:00
Andrey Smirnov
f200eb7a8a
fix(osctl): display non-fatal errors from ps/stats in osctl (#724)
Logging those errors in osd makes them hard to discover.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-07 17:10:18 +03:00
Andrey Smirnov
d9f4f378c2 fix(osd): consistent container ids in stats, ps and reset (#707)
Fixes: #689, #690

Refactor container inspection code into a package of its own with some
rudimentary tests. Use this package consistently in osd commands dealing
with containers.

Improvements for the next PRs:

* implement API to fetch info about container by ID (to avoid fetching
full list)

* handle and display errors on client side, not to the log of the
server

* more tests, including k8s containers (how can we do that?)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-03 20:51:01 -05:00
Brad Beam
b0dab6e021
fix(osd): Sanitize request.id for log streams (#673)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-20 14:46:05 -05:00
Andrey Smirnov
75b2ce7fd2
feat(init): implement services list API and osctl service CLI (#662)
This returns list of all the services registered, with their current
status, past events, health state, etc.

New CLI is `osctl service [<id>]`: without `<id>` it prints list of all
the services, with specific `<id>` it provides details for a service.

I decided to create "parallel" data structures in protobuf as Go
structures don't map nicely onto what protoc generates: pointers vs.
values, additional fields like mutexes, etc. Probably there's a better
approach, I'm open for it.

For CLI, I tried to keep CLI stuff in `cmd/` package, and I also created
simple wrapper to remove duplicated code which sets up client for each
command.

Examples:

```
$ osctl service
SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
containerd   Running   OK       21s ago       Health check successful
kubeadm      Running   ?        2s ago        Started task kubeadm (PID 280) for container kubeadm
kubelet      Running   ?        0s ago        Started task kubelet (PID 383) for container kubelet
ntpd         Running   ?        14s ago       Started task ntpd (PID 129) for container ntpd
osd          Running   ?        14s ago       Started task osd (PID 126) for container osd
proxyd       Waiting   ?        14s ago       Waiting for conditions
trustd       Running   ?        14s ago       Started task trustd (PID 125) for container trustd
udevd        Running   ?        14s ago       Started task udevd (PID 130) for container udevd
```

```
$ osctl service proxyd
ID       proxyd
STATE    Running
HEALTH   ?
EVENTS   [Preparing]: Running pre state (22s ago)
         [Waiting]: Waiting for conditions (22s ago)
         [Preparing]: Creating service runner (6s ago)
         [Running]: Started task proxyd (PID 461) for container proxyd (6s ago)
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-17 18:01:12 +03:00
Brad Beam
dd3d3fac9c
fix(osd): Read talos service logs from file (#663)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-16 20:05:23 -05:00
Brad Beam
0b33280915
feat(init): Add upgrade endpoint (#623)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-13 15:15:25 -05:00
Brad Beam
a6989db1d1
fix(osd): Use correct context in stats endpoint (#644)
Without this we never set the namespace for the context which prevents it from functioning at all

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-11 14:26:23 -05:00
Andrey Smirnov
ab2917e833
feat(init): implement init gRPC API, forward reboot to init (#579)
This implements insecure over-file-socket gRPC API for init with two
first simplest APIs: reboot and shutdown (poweroff).

File socket is mounted only to `osd` service, so it is the only service
which can access init API. Osd forwards reboot/shutdown already
implemented APIs to init which actually executes these.

This enables graceful shutdown/reboot with service shutdown, sync, etc.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-26 23:04:24 +03:00
Andrew Rynhard
fc05224b4f
feat: add shutdown command (#577)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-26 08:53:12 -07:00
Andrew Rynhard
a8fa1f5cd1
feat(osctl): add df command (#569)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-26 08:24:31 -07:00
Andrey Smirnov
505b5022c4
feat(init): implement graceful shutdown of 'init' (#562)
Most crucial changes in `init/main.go`: on shutdown now Talos tries
to stop gracefully all the services. All the shutdown paths are unified,
including poweroff, reboot and panic handling on startup.

While I was at it, I also fixed bug with containers failing to start
when old snapshot is still around.

Service lifecycle is wrapped with `ServiceRunner` object now which
handles state transitions and captures events related to state changes.
Every change goes to the log as well.

There's no way to capture service state yet, but that is planned to be
implemented as RPC API for `init` which is exposed via `osd` to `osctl`.

Future steps:

1. Implement service dependencies for correct startup order and
shutdown order.

2. Implement service health, so that we can say "start trustd when
containerd is up and healthy".

3. Implement gRPC API for init, expose via osd (service status, restart,
poweroff, ...)

4. Impement 'String()' for conditions, so that we can see what service
is waiting on right now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-26 16:53:19 +03:00
Brad Beam
3f358b12ae
feat(osctl): Add osctl top (#560)
Also adds pkg/proc as the backing package for top data

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-04-23 21:25:41 -05:00
Andrey Smirnov
a858cb4986
refactor: extract 'restart' piece of the runners into wrapper runner (#559)
This changes `runner.Runner` API to support more methods to allow for
containerd runner to create container object only once, and start/stop
tasks to implement restarts.

New API: `Open()` (initialize), `Run()` (run once until exits), `Stop()`
(stop running instance), `Close()` (free resource, no longer available
for new `Run()`).

So the sequence might be: `Open`, `Run`, `Stop`, `Run`, `Stop`, `Close`.

Process and containerd runners were updated for the new API, and
'restart' part was removed, now both runners only run the task once.

Restart piece was implemented in an abstract way for any wrapped
`runner.Runner` in the `runner/restart` package. Restart supports three
restart policies: `Once`, `UntilSuccess` and `Forever`.

Service API was changed slightly to return the `runner.Runner`
interface, and `system.Services` now handles running the service.

For all the services, code was adjusted to either return runner (run
once), or was wrapped with `restart` runner to provide restart policy.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-23 01:25:26 +03:00
Brad Beam
271d28244b fix(osd): Fix k8s.io namespace logs (#557)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-04-18 08:49:33 -07:00
Brad Beam
46bdf2371c
fix(osd): Fix osctl ps output (#554)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-04-17 08:51:19 -05:00
Andrey Smirnov
d29e27ee33 refactor: containerd runner refactoring and unit-tests (#551)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-16 13:56:52 -07:00
Andrew Rynhard
e18b5086a9
chore: update org to new name (#480)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-03 18:29:21 -07:00
Andrew Rynhard
455aeb742c
chore: expose userdata and osctl client packages (#471)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-02 17:11:17 -07:00
Andrew Rynhard
9e947c3fa5
feat: add automated PKI for joining nodes (#406)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-02-23 23:17:56 -08:00
Spencer Smith
a2704eeaca feat: add route printing to osctl (#404)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-02-22 06:16:01 -08:00
Andrew Rynhard
62bb226c0b
feat(osctl): add stats command (#314)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-16 17:17:15 -08:00
Andrew Rynhard
b410ff35cc
chore: add nolint annotation (#313)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-16 07:28:33 -08:00
Andrew Rynhard
3c5f99fede
feat(osctl): output namespace (#312)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-15 23:50:56 -08:00
Andrew Rynhard
94b011c724
refactor: use containerd exported defaults (#310)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-15 20:29:13 -08:00
Andrew Rynhard
25fca3d68d
feat: import core service containers from local store (#309)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-15 18:46:41 -08:00
Andrew Rynhard
ee226dddac
chore: enforce commit and license policies (#304)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-13 16:10:49 -08:00
Andrew Rynhard
72eb1b34f5
chore: use buildkit for builds (#295) 2018-12-19 22:22:05 -08:00