talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-17 10:31:19 +02:00

Author	SHA1	Message	Date
Brad Beam	c88b6fc422	fix(proxyd): Fix backend deletion (#729 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-06-07 14:34:47 -07:00
Andrey Smirnov	fb320a894b	fix(osctl): Revert "display non-fatal errors from ps/stats in osctl (#724 )" (#727 ) This reverts commit f200eb7a8a0b7c2d29710f695000eb7680ce8b7d. grpc can't send back both response and an error. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-07 22:50:05 +03:00
Seán C. McCord	532a53bfaf	feat(init): Implement 'ls' command (#721 ) Fixes #719 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-06-07 10:19:20 -07:00
Andrey Smirnov	f5969d2c6c	fix(osctl): avoid panic on empty 'talosconfig' (#725 ) When talosconfig doesn't exist, `osctl` creates empty one behind the scenes, but that leads to immediate panic if the command tries to build osd client: ``` panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11d6786] goroutine 1 [running]: github.com/talos-systems/talos/cmd/osctl/pkg/client.NewDefaultClientCredentials(0x7ffd720f5100, 0xb, 0xc000559ce8, 0x757014, 0xc0000d5500) /src/cmd/osctl/pkg/client/client.go:50 +0xa6 github.com/talos-systems/talos/cmd/osctl/cmd.setupClient(0x16ca3f0) /src/cmd/osctl/cmd/root.go:100 +0x3d github.com/talos-systems/talos/cmd/osctl/cmd.glob..func22(0x24ad7c0, 0xc00058c240, 0x0, 0x3) /src/cmd/osctl/cmd/ps.go:32 +0x37 github.com/spf13/cobra.(Command).execute(0x24ad7c0, 0xc0005f8a00, 0x3, 0x4, 0x24ad7c0, 0xc0005f8a00) /toolchain/gopath/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:766 +0x2ae github.com/spf13/cobra.(Command).ExecuteC(0x24ae140, 0x2507030, 0x162f2d7, 0xb) /toolchain/gopath/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:852 +0x2ec github.com/spf13/cobra.(*Command).Execute(...) /toolchain/gopath/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:800 github.com/talos-systems/talos/cmd/osctl/cmd.Execute() /src/cmd/osctl/cmd/root.go:93 +0x24f main.main() /src/cmd/osctl/main.go:10 +0x20 ``` Fix that by returning explicit error: ``` error getting client credentials: 'context' key is not set in the config ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-07 17:40:28 +03:00
Andrey Smirnov	f200eb7a8a	fix(osctl): display non-fatal errors from ps/stats in osctl (#724 ) Logging those errors in osd makes them hard to discover. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-07 17:10:18 +03:00
Brad Beam	0d5f521291	feat(init): Add support for kubeadm reset during upgrade (#714 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-06-06 22:41:22 -05:00
Spencer Smith	95b107d884	chore(ci): modularize integration test (#722 ) Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-06-06 09:28:53 -04:00
Brad Beam	8a5acff119	fix: Add gitmeta as dependency for push (#718 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-06-05 16:01:46 -05:00
Brad Beam	d68e303f27	feat(init): Add service stop api (#708 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-06-05 14:49:03 -05:00
Andrew Rynhard	ecfa945fc8	chore: download official gitmeta to BINDIR (#717 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-06-05 11:44:15 -07:00
Andrey Smirnov	7a4a677f04	fix(init): use 127.0.0.1 IP in healthchecks to avoid resolver weirdness (#715 ) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-05 19:30:28 +03:00
Spencer Smith	921114dd99	fix: ensure index remains in bounds for ud gen (#710 ) Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-06-04 17:37:54 -04:00
Brad Beam	1a01440482	feat(init): Add support for stopping individual services (#706 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-06-04 15:51:30 -05:00
Andrey Smirnov	bf6ef7043c	chore: address flaky tests instability (#713 ) For #711, this should be a complete fix - waiting for container to be started. For #712, this should be more of a workaround - playing with timeouts to hit the failure less likely. Idea of the test is that health check should be aborted on timeout (1ms) while health check succeeds if not aborted in 50ms. Before the fix it was 1ms/10ms, but still concurrently there was a chance that goroutine exits successfully after 10ms while 1ms context deadline is not reached. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-04 23:22:05 +03:00
Andrew Rynhard	84616b48b3	chore: prepare release v0.1.0-alpha.28 (#687 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com> v0.1.0-alpha.28	2019-06-03 20:07:18 -07:00
Andrey Smirnov	d9f4f378c2	fix(osd): consistent container ids in stats, ps and reset (#707 ) Fixes: #689, #690 Refactor container inspection code into a package of its own with some rudimentary tests. Use this package consistently in osd commands dealing with containers. Improvements for the next PRs: * implement API to fetch info about container by ID (to avoid fetching full list) * handle and display errors on client side, not to the log of the server * more tests, including k8s containers (how can we do that?) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-06-03 20:51:01 -05:00
Spencer Smith	16530db722	fix: don't set BUILDKIT_CACHE to empty string in Makefile (#705 ) Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-06-03 13:59:39 -04:00
Andrey Smirnov	ee297da1a2	chore: enable GOPROXY for go modules (#703 ) Announcement: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/golang-announce/0wo8cOhGuAI/v96KeTYtBwAJ This should improve module download time as `go` no longer needs to clone full repositories. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-31 23:59:52 +03:00
Andrey Smirnov	f96d3ce7cb	fix(osctl): don't print message on first ^C (#704 ) This resolves extra messages when user does ^C to stop osctl. Message is still printed on the second ^C and process is aborted on the third. For the `logs` command, as it is streaming, suppress context canceled error (before context changes process was crashing before printing an error). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-31 23:37:57 +03:00
Andrew Rynhard	b330d3b778	feat: leave etcd before upgrading (#702 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-05-31 10:59:12 -07:00
Brad Beam	8537e7eeb6	feat(init): Add support for control plane join config (#700 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-31 12:21:00 -05:00
Andrey Smirnov	dc79b0ad05	refactor(init): use 'switch' instead of long condition (#701 ) Based on feedback from #699 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-31 17:39:38 +03:00
Andrey Smirnov	32826e3d14	feat(init): update 'waiting' state descritpion when conditions change (#698 ) If some of the conditions are already satisfied, we can update the description to reflect that, e.g.: ``` EVENTS [Waiting]: Waiting for service "containerd" to be "up", service "networkd" to be "up", service "trustd" to be "up" (14m17s ago) [Waiting]: Waiting for service "trustd" to be "up" (14m16s ago) ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-31 16:40:32 +03:00
Spencer Smith	313a988292	fix: ensure shebang at top of userdata (#695 ) Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-05-31 09:02:55 -04:00
Andrew Rynhard	f95f8f87a4	feat: upgrade Kubernetes to v1.15.0-beta.1 (#696 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-05-30 18:56:30 -07:00
Andrey Smirnov	7b7f4d4484	fix(init): consider 'finished' services to be 'up' (#699 ) This fixes a race condition when kubeadm doesn't start waiting for networkd forever. If one service enters 'Waiting' state after another service already finishes running, we should consider 'Finished' service to be 'up' in the sense it has finished successfully (vs. 'Failed' state which is not 'up'). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-30 20:02:03 -05:00
Brad Beam	a1e635a4b2	feat(init): Prioritize usage of local userdata (#694 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-30 09:56:14 -05:00
Andrey Smirnov	ca95469247	feat(osctl): handle ^C by aborting context (#693 ) This provides a bit better handling for the handing grpc requests (or just slow requests): ``` $ osctl-linux-amd64 --talosconfig talosconfig version Client: Tag: ad410fb-dirty SHA: ad410fb-dirty Built: Go version: go1.12.5 OS/Arch: linux/amd64 ^CSignal received, aborting, press Ctrl+C once again to abort immediately... error getting version: rpc error: code = Canceled desc = context canceled ``` For now we catch `SIGINT` & `SIGTERM`. Second signal kills process immediately as signal handler is removed. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-30 00:11:58 +03:00
Andrey Smirnov	ad410fb7f2	refactor(osctl): move cli code out of 'client' package (#692 ) This moves cli code (rendering output, etc.) out of 'client' package, so that client package is usable outside of cli. Consistently accept context as first param to API methods, so that we can build graceful request cancellation. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-29 01:10:25 +03:00
Andrey Smirnov	20f4d77d39	fix(init): move directory creation to kubeadm pre-func (#688 ) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-28 09:51:38 -07:00
Brad Beam	6cf260c5af	fix(osctl): Generate correct config with master IPs (#681 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-27 18:59:41 -07:00
Andrey Smirnov	f704cb2cc3	refactor(osctl): DRY up osctl sources by using common client setup (#686 ) Remove duplicated code which was setting up grpc client with common method. Should have no functional changes otherwise. Add args len check where missing. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-27 22:55:20 +03:00
Andrew Rynhard	90b5b83b8d	chore: improve the basic integration test (#685 ) This PR ensures that we have 3 master nodes. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-05-27 10:44:09 -07:00
Andrey Smirnov	40a5b7c177	feat(init): expose networkd as goroutine-based server (#682 ) This adds generic goroutine runner which simply wraps service as process goroutine. It supports log redirection and basic panic handling. DHCP-related part of the network package was slightly adjusted to run as service with logging updates (to redirect logs to a file) and context canceling. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-27 17:07:28 +03:00
Brad Beam	d8249c8779	refactor(init): Allow kubeadm init on controlplane (#658 ) * refactor(init): Allow kubeadm init on controlplane This shifts the cluster formation from init(bootstrap) and join(control plane) to init(control plane). This makes use of the previously implemented initToken to provide a TTL for cluster initialization to take place and allows us to mostly treat all control plane nodes equal. This also sets up the path for us to handle master upgrades and not be concerned with odd behavior when upgrading the previously defined init node. To facilitate kubeadm init across all control plane nodes, we make use of the initToken to run `kubeadm init phase certs` command to generate any missing certificates once. All other control plane nodes will attempt to sync the necessary certs/files via all defined trustd endpoints and being the startup process. * feat(init): Add service runner context to PreFunc Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-24 16:05:49 -05:00
Andrey Smirnov	a0188aff73	feat(init): implement service dependencies, correct start and shutdown (#680 ) This PR introduces dependencies between the services. Now each service has two virtual events associated with it: 'up' (running and healthy) and 'down' (finished or failed). These events are used to establish correct order via conditions abstraction. Service image unpacking was moved into 'pre' stage simplifying `init/main.go`, service images are now closer to the code which runs the service itself. Step 'pre' now runs after 'wait' step, and service dependencies are now mixed into other conditions of 'wait' step on startup. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-24 19:17:52 +03:00
Andrew Rynhard	ecad4e35a9	docs: change meeting times to 24 hour format (#675 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-05-21 13:56:35 -07:00
Andrey Smirnov	06bff97a3f	refactor: change conditions to be interface, add descriptions (#677 ) Conditions are now implemented as interface with two methods: `Wait` for condition to be true (cancelable via context) and 'String' which describes what condition is waiting for. Generic 'WaitForAll' was implemented to wait for multiple conditions at once. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-21 21:25:08 +03:00
Brad Beam	b6a01d6e5b	fix: Address lint warning for unknown linter (#676 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-21 10:59:13 -05:00
Andrew Rynhard	0487099243	docs: add Zoom meeting schedule to README (#674 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-05-20 14:38:35 -07:00
Andrey Smirnov	e35a910da2	refactor: fix stream chunker & provide some tests (#672 ) Stream chunker should be cancellable at any point of execution, plus it should be stop chunking on EOF. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-20 23:53:05 +03:00
Brad Beam	b0dab6e021	fix(osd): Sanitize request.id for log streams (#673 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-20 14:46:05 -05:00
Brad Beam	a64de7ed51	feat(init): Add initToken parameter to userdata (#664 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-20 14:23:38 -05:00
Andrew Rynhard	bbbd1f70d1	chore: prepare release v0.1.0-alpha.27 (#671 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com> v0.1.0-alpha.27	2019-05-20 10:24:18 -07:00
Andrey Smirnov	204873e257	refactor: fix filechunker not exiting on context cancel (#668 ) This started as a simple unit-test for file chunker, but the first test hung immediately, so I started looking into the code. One problem was that when entering inotify() code, ctx cancel wasn't considered. Another problem is that remove fsnotify was never triggered, but I saw that with unit-test later. Small nit was that inotify() was initialized every time we got to EOF, which is not efficient for "follow" mode. So I moved inotify into the main loop, and plugged context cancel watch into the place when chunk is delivered. Chunker code is supposed to block in two places: when it tries to deliver next chunk (as client might be slow to recieve buffers) or when there's no new data (on inotify). So it makes sense to assert context canceled condition in both cases. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-20 18:00:40 +03:00
Andrew Rynhard	496bb83078	feat: add plural alias of service command (#670 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-05-18 09:17:09 -07:00
Andrey Smirnov	54168cef1c	feat(init): implement healthchecks for the services (#667 ) Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-18 08:44:56 -07:00
Andrey Smirnov	75b2ce7fd2	feat(init): implement services list API and osctl service CLI (#662 ) This returns list of all the services registered, with their current status, past events, health state, etc. New CLI is `osctl service [<id>]`: without `<id>` it prints list of all the services, with specific `<id>` it provides details for a service. I decided to create "parallel" data structures in protobuf as Go structures don't map nicely onto what protoc generates: pointers vs. values, additional fields like mutexes, etc. Probably there's a better approach, I'm open for it. For CLI, I tried to keep CLI stuff in `cmd/` package, and I also created simple wrapper to remove duplicated code which sets up client for each command. Examples: ``` $ osctl service SERVICE STATE HEALTH LAST CHANGE LAST EVENT containerd Running OK 21s ago Health check successful kubeadm Running ? 2s ago Started task kubeadm (PID 280) for container kubeadm kubelet Running ? 0s ago Started task kubelet (PID 383) for container kubelet ntpd Running ? 14s ago Started task ntpd (PID 129) for container ntpd osd Running ? 14s ago Started task osd (PID 126) for container osd proxyd Waiting ? 14s ago Waiting for conditions trustd Running ? 14s ago Started task trustd (PID 125) for container trustd udevd Running ? 14s ago Started task udevd (PID 130) for container udevd ``` ``` $ osctl service proxyd ID proxyd STATE Running HEALTH ? EVENTS [Preparing]: Running pre state (22s ago) [Waiting]: Waiting for conditions (22s ago) [Preparing]: Creating service runner (6s ago) [Running]: Started task proxyd (PID 461) for container proxyd (6s ago) ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-05-17 18:01:12 +03:00
Andrew Rynhard	d36d4404bd	fix(osctl): output config without localAPIEndpoint (#665 ) Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-05-16 21:18:42 -07:00
Brad Beam	dd3d3fac9c	fix(osd): Read talos service logs from file (#663 ) Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-05-16 20:05:23 -05:00

1 2 3 4 5 ...

566 Commits