talos

mirror of https://github.com/siderolabs/talos.git synced 2025-08-31 11:31:10 +02:00

Author	SHA1	Message	Date
Andrew Rynhard	bf8fc1dcbd	chore: lint protobuf definitions This adds linting to our protobuf definitions via prototool. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-27 18:12:36 -07:00
Brad Beam	692571bdec	feat(networkd): Add grpc endpoint Allows us to list routes and interface details Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-25 19:48:08 -07:00
Brad Beam	d36007fb29	feat(osd): Add ntpd client Allows us to access ntp api Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-25 13:38:34 -07:00
Andrew Rynhard	794c7231f5	feat: run dedicated instance of containerd for system services In order to facilitate upgrades and resets that are capable of manipulating the system block device, we need to run an instance of containerd that has zero dependencies on the disk. We run containerd purely in memory for running system services. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-19 12:32:59 -07:00
Andrew Rynhard	90c91807bd	refactor: restructure the project layout This change moves packages into more appropriate places. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-01 22:19:42 -07:00
Andrey Smirnov	9c63f4ed0a	feat(init): implement complete API for service lifecycle (start/stop) It is now possible to `start`/`stop`/`restart` any service via `osctl` commands. There are some changes in `ServiceRunner` to support re-use (re-entering running state). `Services` singleton now tracks service running state to avoid calling `Start()` on already running `ServiceRunner` instance. Method `Start()` was renamed to `LoadAndStart()` to break up service loading (adding to the list of service) and actual service start. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-01 11:16:57 -07:00
Andrew Rynhard	b4383e35db	feat: move df API to init This change allows for more accurate mount reporting as /proc/mounts is a symlink to /proc/self/mounts and contains mounts that are relative to the running process. In our case this was osd. This caused inaccurate reporting of mounts since they were relative to osd when we really wanted mounts relative to machined. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-07-24 15:28:37 -07:00
Andrew Rynhard	8e8aae98dd	feat: add machined This commit splits our current init into init and machined. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-07-16 13:12:21 -07:00
Brad Beam	58537faa8b	fix(init): Fix routes endpoint Temporary workaround while we get more information on the specifics for what is failing. Ref: #795 Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-07-15 07:35:56 -07:00
Andrew Rynhard	5d8ee0a3a5	fix: use existing logic to perform reset This PR moves the reset API to the init API definition. It leverages the same code we use for upgrades. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-07-04 18:26:14 -07:00
Andrey Smirnov	5d91d762ce	feat(osd): implement container metrics for CRI inspector (#824 ) This refactors metrics interface to remove containerd-specific stuff and make it common for CRI & containerd. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-07-04 11:25:15 -07:00
Andrey Smirnov	237e903f91	feat(osd): implement CRI inspector for containers (#817 ) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-07-02 15:48:00 -07:00
Seán C. McCord	91d5e7e6ef	TLS renew (#807 ) Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-07-02 15:35:27 -07:00
Andrey Smirnov	0662af19d1	chore: seed math.rand PRNG on startup in every service (#801 ) This is important as otherwise `math/rand` outputs predictable sequence each time. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-28 11:03:15 -07:00
Andrey Smirnov	17f28d3461	feat(osctl): improve output of `stats` and `ps` commands (#788 ) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-26 15:37:54 -07:00
Seán C. McCord	81163cefb4	feat(osd): extend Routes API (#756 ) Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-06-22 08:03:13 -07:00
Andrey Smirnov	76071abbb8	feat(init): move 'ls' API to init from osd (#755 ) Service `osd` doesn't have access to rootfs, as it is running in a container, so move API to `init` which has unconstrained access to rootfs. (This is in line with another API, `osctl cp`). Fixes: #752 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-21 22:29:39 +03:00
Andrey Smirnov	9ed45f7090	feat(osctl): implement 'cp' to copy files out of the Talos node (#740 ) Actual API is implemented in the `init`, as it has access to root filesystem. `osd` proxies API back to `init` with some tricks to support grpc streaming. Given some absolute path, `init` produces and streams back .tar.gz archive with filesystem contents. `osctl cp` works in two modes. First mode streams data to stdout, so that we can do e.g.: `osctl cp /etc - \| tar tz`. Second mode extracts archive to specified location, dropping ownership info and adjusting permissions a bit. Timestamps are not preserved. If full dump with owner/permisisons is required, it's better to stream data to `tar xz`, for quick and dirty look into filesystem contents under unprivileged user it's easier to use in-place extraction. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-20 17:02:58 -07:00
Andrey Smirnov	070cbc9d60	refactor(osd): implement container inspector for a single container (#720 ) Instead of pulling a full list of containers, implement inspector query for a single container following the spec to build display name. Also adds many more tests for the container inspector. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-17 17:54:28 +03:00
Andrey Smirnov	0c0a0340b2	fix(osctl): allow '-target' flag for `osctl restart` (#732 ) I couldn't find any use for the `timeout` flag nor the value passed in the API, but it block much more useful and present in other commands flag 'target'. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-14 21:37:57 +03:00
Andrey Smirnov	fb320a894b	fix(osctl): Revert "display non-fatal errors from ps/stats in osctl (#724 )" (#727 ) This reverts commit f200eb7a8a0b7c2d29710f695000eb7680ce8b7d. grpc can't send back both response and an error. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-07 22:50:05 +03:00
Seán C. McCord	532a53bfaf	feat(init): Implement 'ls' command (#721 ) Fixes #719 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-06-07 10:19:20 -07:00
Andrey Smirnov	f200eb7a8a	fix(osctl): display non-fatal errors from ps/stats in osctl (#724 ) Logging those errors in osd makes them hard to discover. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-07 17:10:18 +03:00
Andrey Smirnov	d9f4f378c2	fix(osd): consistent container ids in stats, ps and reset (#707 ) Fixes: #689, #690 Refactor container inspection code into a package of its own with some rudimentary tests. Use this package consistently in osd commands dealing with containers. Improvements for the next PRs: * implement API to fetch info about container by ID (to avoid fetching full list) * handle and display errors on client side, not to the log of the server * more tests, including k8s containers (how can we do that?) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-03 20:51:01 -05:00
Brad Beam	b0dab6e021	fix(osd): Sanitize request.id for log streams (#673 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-20 14:46:05 -05:00
Andrey Smirnov	75b2ce7fd2	feat(init): implement services list API and osctl service CLI (#662 ) This returns list of all the services registered, with their current status, past events, health state, etc. New CLI is `osctl service [<id>]`: without `<id>` it prints list of all the services, with specific `<id>` it provides details for a service. I decided to create "parallel" data structures in protobuf as Go structures don't map nicely onto what protoc generates: pointers vs. values, additional fields like mutexes, etc. Probably there's a better approach, I'm open for it. For CLI, I tried to keep CLI stuff in `cmd/` package, and I also created simple wrapper to remove duplicated code which sets up client for each command. Examples: ``` $ osctl service SERVICE STATE HEALTH LAST CHANGE LAST EVENT containerd Running OK 21s ago Health check successful kubeadm Running ? 2s ago Started task kubeadm (PID 280) for container kubeadm kubelet Running ? 0s ago Started task kubelet (PID 383) for container kubelet ntpd Running ? 14s ago Started task ntpd (PID 129) for container ntpd osd Running ? 14s ago Started task osd (PID 126) for container osd proxyd Waiting ? 14s ago Waiting for conditions trustd Running ? 14s ago Started task trustd (PID 125) for container trustd udevd Running ? 14s ago Started task udevd (PID 130) for container udevd ``` ``` $ osctl service proxyd ID proxyd STATE Running HEALTH ? EVENTS [Preparing]: Running pre state (22s ago) [Waiting]: Waiting for conditions (22s ago) [Preparing]: Creating service runner (6s ago) [Running]: Started task proxyd (PID 461) for container proxyd (6s ago) ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-17 18:01:12 +03:00
Brad Beam	dd3d3fac9c	fix(osd): Read talos service logs from file (#663 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-16 20:05:23 -05:00
Brad Beam	0b33280915	feat(init): Add upgrade endpoint (#623 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-13 15:15:25 -05:00
Brad Beam	a6989db1d1	fix(osd): Use correct context in stats endpoint (#644 ) Without this we never set the namespace for the context which prevents it from functioning at all Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-11 14:26:23 -05:00
Andrey Smirnov	ab2917e833	feat(init): implement init gRPC API, forward reboot to init (#579 ) This implements insecure over-file-socket gRPC API for init with two first simplest APIs: reboot and shutdown (poweroff). File socket is mounted only to `osd` service, so it is the only service which can access init API. Osd forwards reboot/shutdown already implemented APIs to init which actually executes these. This enables graceful shutdown/reboot with service shutdown, sync, etc. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-04-26 23:04:24 +03:00
Andrew Rynhard	fc05224b4f	feat: add shutdown command (#577 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-04-26 08:53:12 -07:00
Andrew Rynhard	a8fa1f5cd1	feat(osctl): add df command (#569 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-04-26 08:24:31 -07:00
Andrey Smirnov	505b5022c4	feat(init): implement graceful shutdown of 'init' (#562 ) Most crucial changes in `init/main.go`: on shutdown now Talos tries to stop gracefully all the services. All the shutdown paths are unified, including poweroff, reboot and panic handling on startup. While I was at it, I also fixed bug with containers failing to start when old snapshot is still around. Service lifecycle is wrapped with `ServiceRunner` object now which handles state transitions and captures events related to state changes. Every change goes to the log as well. There's no way to capture service state yet, but that is planned to be implemented as RPC API for `init` which is exposed via `osd` to `osctl`. Future steps: 1. Implement service dependencies for correct startup order and shutdown order. 2. Implement service health, so that we can say "start trustd when containerd is up and healthy". 3. Implement gRPC API for init, expose via osd (service status, restart, poweroff, ...) 4. Impement 'String()' for conditions, so that we can see what service is waiting on right now. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-04-26 16:53:19 +03:00
Brad Beam	3f358b12ae	feat(osctl): Add osctl top (#560 ) Also adds pkg/proc as the backing package for top data Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-04-23 21:25:41 -05:00
Andrey Smirnov	a858cb4986	refactor: extract 'restart' piece of the runners into wrapper runner (#559 ) This changes `runner.Runner` API to support more methods to allow for containerd runner to create container object only once, and start/stop tasks to implement restarts. New API: `Open()` (initialize), `Run()` (run once until exits), `Stop()` (stop running instance), `Close()` (free resource, no longer available for new `Run()`). So the sequence might be: `Open`, `Run`, `Stop`, `Run`, `Stop`, `Close`. Process and containerd runners were updated for the new API, and 'restart' part was removed, now both runners only run the task once. Restart piece was implemented in an abstract way for any wrapped `runner.Runner` in the `runner/restart` package. Restart supports three restart policies: `Once`, `UntilSuccess` and `Forever`. Service API was changed slightly to return the `runner.Runner` interface, and `system.Services` now handles running the service. For all the services, code was adjusted to either return runner (run once), or was wrapped with `restart` runner to provide restart policy. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-04-23 01:25:26 +03:00
Brad Beam	271d28244b	fix(osd): Fix k8s.io namespace logs (#557 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-04-18 08:49:33 -07:00
Brad Beam	46bdf2371c	fix(osd): Fix osctl ps output (#554 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-04-17 08:51:19 -05:00
Andrey Smirnov	d29e27ee33	refactor: containerd runner refactoring and unit-tests (#551 ) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-04-16 13:56:52 -07:00
Andrew Rynhard	e18b5086a9	chore: update org to new name (#480 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-04-03 18:29:21 -07:00
Andrew Rynhard	455aeb742c	chore: expose userdata and osctl client packages (#471 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-04-02 17:11:17 -07:00
Andrew Rynhard	9e947c3fa5	feat: add automated PKI for joining nodes (#406 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-02-23 23:17:56 -08:00
Spencer Smith	a2704eeaca	feat: add route printing to osctl (#404 ) Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-02-22 06:16:01 -08:00
Andrew Rynhard	62bb226c0b	feat(osctl): add stats command (#314 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-01-16 17:17:15 -08:00
Andrew Rynhard	b410ff35cc	chore: add nolint annotation (#313 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-01-16 07:28:33 -08:00
Andrew Rynhard	3c5f99fede	feat(osctl): output namespace (#312 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-01-15 23:50:56 -08:00
Andrew Rynhard	94b011c724	refactor: use containerd exported defaults (#310 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-01-15 20:29:13 -08:00
Andrew Rynhard	25fca3d68d	feat: import core service containers from local store (#309 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-01-15 18:46:41 -08:00
Andrew Rynhard	ee226dddac	chore: enforce commit and license policies (#304 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-01-13 16:10:49 -08:00
Andrew Rynhard	72eb1b34f5	chore: use buildkit for builds (#295 )	2018-12-19 22:22:05 -08:00

49 Commits