talos

mirror of https://github.com/siderolabs/talos.git synced 2025-08-20 22:21:13 +02:00

Author	SHA1	Message	Date
Spencer Smith	e03a68f8eb	feat: update k8s and sonobuoy versions This PR will update k8s to the latest 1.18 release and bump sonobuoy to help resolve some e2e flakes. Also adds some retry logic around the sonobuoy run. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-06-10 06:47:36 -07:00
Andrew Rynhard	00b7176a8a	feat: upgrade Linux to v5.6.13 This brings in the latest version of Linux. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-05-18 14:41:59 -07:00
Andrew Rynhard	7cf28dc805	refactor: rename ntpd to timed This renames the ntpd application to timed. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-04-13 15:02:26 -07:00
Andrew Rynhard	681b1a8cb2	feat: upgrade Linux to v5.5.15 This brings in the latest 5.5 version of Linux. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-04-07 09:06:18 -07:00
Andrew Rynhard	6fe5fed6f9	fix: make upgrades work with UEFI Since the `--once` option of `extlinux` seems to only work with BIOS, we needed to change to remove any reliance on this option. Instead of booting the upgraded version once, and then making it the default after a successful boot, we now make it the default, and then revert on any boot error. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-03-26 13:34:00 -07:00
Spencer Smith	3a4eaeeef0	feat: upgrade kubernetes to 1.18 This PR will pull in the latest release of k8s 1.18 so we can start validating it through our test suite. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-03-26 14:59:43 -04:00
Spencer Smith	3485ea9f09	fix: update k8s to 1.17.3 This PR will update k8s to v1.17.3 to address CVEs mentioned in https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/kubernetes-security-announce/2UOlsba2g0s Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-03-23 17:08:52 -07:00
Andrew Rynhard	69fa63a7b2	refactor: perform upgrade upon reboot This PR introduces a new strategy for upgrades. Instead of attempting to zap the partition table, create a new one, and then format the partitions, this change will only update the `vmlinuz`, and `initramfs.xz` being used to boot. It introduces an A/B style upgrade process, which will allow for easy rollbacks. One deviation from our original intention with upgrades is that this change does not completely reset a node. It falls just short of that and does not reset the partition table. This forces us to keep the current partition scheme in mind as we make changes in the future, because an upgrade assumes a specific partition scheme. We can improve upgrades further in the future, but this will at least make them more dependable. Finally, one more feature in this PR is the ability to keep state. This enables single node clusters to upgrade since we keep the etcd data around. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-03-20 17:32:18 -07:00
Spencer Smith	2f4ccfda9a	fix: respect dns domain from machine config BREAKING CHANGE: This PR fixes a bug where we were only passing `cluster.local` to the kubelet configuration. It will also pull in a new version of the bootkube fork to ensure that custom domains got propogated down to the API Server certs, as well as the CoreDNS configuration for a cluster. Existing users should be aware that, if they were previously trying to use this option in machine configs, that an upgrade will may break their cluster. It will update a kubelet flag with the new domain, but CoreDNS and API Server certs will not change since bootkube has already run. One option may be to change these values manually inside the Kubernetes cluster. However, it may prove easier to rebuild the cluster if necessary. Additionally, this PR also exposes a flag to `osctl config generate` to allow tweaking this domain value as well. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-03-20 12:28:17 -04:00
Spencer Smith	1cbbf9cd5a	feat: update talos base packages This PR will update the base packages to the latest versions. Updated packages are: - ca-certificates - cni - iptables - kernel - kmod - libseccomp - musl - runc - socat - util-linux - xfsprogs Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-03-17 19:08:13 -04:00
Spencer Smith	853ce16df4	feat: respect panic kernel flag This PR allows Talos to respect the panic=0 flag if users pass that in their kernel args. Doing this makes it easier to catch kernel panics in debug scenarios and allows the user to manually trigger a restart with ctrl+alt+del when they're ready. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-03-10 13:21:34 -04:00
Spencer Smith	b1e4b3891f	chore: cleanup assets dir after bootkube is done This PR will clean up bootkube assets regardless of whether bootkube succeeds. This will allow for a failed bootkube deployment to retry on reboot. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-03-06 14:25:44 -05:00
Spencer Smith	12bfd8dd94	feat: allow for persistence of config data This PR will allow users to set the `persist: true` value in their config data to tell talos not to re-pull the config data at each reboot. The default will still remain as a "pull every time" methodolgy in order to encourage immutability by default. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-03-06 11:42:00 -05:00
Andrey Smirnov	a068acfbe4	feat: split routerd from apid New service `routerd` performs exactly single task: based on incoming API call service name, it routes the requests to the appropriate Talos service (`networkd`, `osd`, etc.) Service `routerd` listens of file socket and routes requests to file sockets. Service `apid` now does single task as well: * it either fans out request to other `apid` services running on other nodes and aggregates responses * or it forwards requests to local `routerd` (when request destination is local node) Cons: * one more proxying layer on request path Pros: * more clear service roles * `routerd` is part of core Talos, services should register with it to expose their API; no auth in the service (not exposed to the world) * `apid` might be replaced with other implementation, it depends on TLS infra, auth, etc. * `apid` is better segregated from other Talos services (can only access `routerd`, can't talk to other Talos services directly, so less exposure in case of a bug) This change is no-op to the end users. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-03-05 22:05:56 +03:00
Andrey Smirnov	bbe2c53d29	feat: generate kubeconfig on the fly on request This extracts admin kubeconfig generation out of bootkube, now based on Talos x509 library. On each API request for `kubeconfig`, config is generated on the fly and sent back on the wire. This fixes two issues: * any master node can now generate `kubeconfig` (worker nodes can do that too, but that should probably change in the future) * after upgrade-and-wipe the disk scenario, `osctl kubeconfig` still works Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-28 21:00:52 +03:00
Andrey Smirnov	e6dc87dfa4	chore: update pkgs & tools for Go 1.14 See also: * https://github.com/talos-systems/tools/pull/89 * https://github.com/talos-systems/pkgs/pull/103 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-27 01:15:46 +03:00
Andrey Smirnov	923ef4537b	test: implement new class of tests: provision tests (upgrades) This class of tests is included/excluded by build tags, but as it is pretty different from other integration tests, we build it as separate executable. Provision tests provision cluster for the test run, perform some actions and verify results (could be upgrade, reset, scale up/down, etc.) There's now framework to implement upgrade tests, first of the tests tests upgrade from latest 0.3 (0.3.2 at the moment) to current version of Talos (being built in CI). Tests starts by booting with 0.3 kernel/initramfs, runs 0.3 installer to install 0.3.2 cluster, wait for bootstrap, followed by upgrade to 0.4 in rolling fashion. As Firecracker supports bootloader, this boots 0.4 system from boot disk (as installed by installer). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-21 07:04:03 -08:00
Andrey Smirnov	fae5e6915d	chore: rework firecracker code around upstream Go SDK + PRs This removes use of private fork with custom `ip=` kernel argument handling and switches fully to upstream version of it. Firecracker Go SDK version is `master` + following PRs: * https://github.com/firecracker-microvm/firecracker-go-sdk/pull/167 * https://github.com/firecracker-microvm/firecracker-go-sdk/pull/177 * https://github.com/firecracker-microvm/firecracker-go-sdk/pull/178 MTU handling support was implemented as well. Changes: * hostname to each node is passed via `talos.hostname=` kernel arg * IP configuration is generated by SDK from CNI result * fixed bugs with wrong netmask * nameservers & MTU is passed via Talos config Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-01-29 02:35:15 +03:00
Andrey Smirnov	9da687d2a3	test: firecracker provisioner fixes, implement cluster destroy This implements `osctl cluster destroy` for Firecracker, adds new utility command `osctl cluser show`. Firecracker mode now has control process for firecracker VMs, allowing clean reboots and background operations. Lots of small fixes to Firecracker mode, clean CNI shutdown, cleaning up netns, etc. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-01-21 17:11:06 -08:00
Spencer Smith	67e50f6f50	feat: allow for bootkube images to be customized This PR allows for pod checkpointer and coredns images to be customized for bootkube. We can already customize the hyperkube image and all other images used by bootkube are CNI-related and can be customized with the "custom" CNI setup. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-01-21 11:17:28 -08:00
Spencer Smith	60260c85d1	feat: upgrade kubernetes version to 1.17.1 This PR will bring in the latest point release of k8s 1.17 Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2020-01-17 09:39:26 -08:00
Andrey Smirnov	2bf8540855	test: provision Talos clusters via Firecracker VMs This is initial PR to push the initial code, it has several known problems which are going to be addressed in follow-up PRs: 1. there's no "cluster destroy", so the only way to stop the VMs is to `pkill firecracker` 2. provisioner creates state in `/tmp` and never deletes it, that is required to keep cluster running when `osctl cluster create` finishes 3. doesn't run any controller process around firecracker to support reboots/CNI cleanup (vethxyz interfaces are lingering on the host as they're never cleaned up) The plan is to create some structure in `~/.talos` to manage cluster state, e.g. `~/.talos/clusters/<name>` which will contain all the required files (disk images, file sockets, VM logs, etc.). This directory structure will also work as a way to detect running clusters and clean them up. For point number 3, `osctl cluster create` is going to exec lightweight process to control the firecracker VM process and to simulate VM reboots if firecracker finishes cleanly (when VM reboots). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-01-16 00:27:08 +03:00
Andrew Rynhard	cb93646c07	fix: update kernel version constant This needs to be updated for integrations tests. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-01-12 09:21:19 -08:00
Andrew Rynhard	7edd96947a	feat: upgrade Linux to v5.4.10 This brings in the latest stable Linux. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-01-10 20:51:07 -08:00
Andrew Rynhard	4242acd085	feat: upgrade linux to v5.4.8 This brings in the latest 5.4 kernel. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-01-08 11:59:05 -06:00
Andrew Rynhard	e4a1bc3cf9	chore: add help menu to the Makefile This adds a help menu to the Makefile. It documents all build dependencies, and how to get started. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-12-25 11:11:41 -08:00
Andrew Rynhard	907f87d8e0	feat: upgrade Linux to v5.4.5 This brings in the latest stable version of Linux. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-12-19 17:43:34 -08:00
Brad Beam	9584b47cd7	feat: Upgrade kubernetes to 1.17.0 Primarily doc/constant changes. Added additionnal bits to `docs` target in makefile to generate osctl docs as well as config files. Explicitly define a HOME variable so we get consistent home directories for talosconfig variables in our docs. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-12-10 16:03:35 -08:00
Andrew Rynhard	fa515b8117	fix: kill POD network mode pods first on upgrades When we upgrade a node, we kill off all pods before performing a fresh install. The issue with this is that we run the risk of killing the CNI pod before we finish killing all other pods, leaving the CRI unable to teardown the pod's networking. This works around that by first killing any pods running without host networking so that the CNI can do its' job, and then removing the remaining pods. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-12-09 13:45:31 -08:00
Spencer Smith	92b5bd9b2b	feat: allow ability to specify custom CNIs This PR will allow users to specify one or many URLs for CNI so that they can bypass bootkube deploying flannel and bring their own. Will close #1593 Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-12-06 15:27:36 -05:00
Andrew Rynhard	7b6a1fdc94	fix: update kernel version constant This is required to pass integration tests. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-12-04 20:27:53 -08:00
Andrew Rynhard	d4c202438c	refactor: set CRI config to /etc/cri/containerd.toml This changes the CRI specific containerd instance's config to a different path. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-12-04 19:32:00 -08:00
Andrew Rynhard	43e6703b8b	feat: upgrade containerd to v1.3.2 This brings in the latest version of Containerd. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-12-04 10:19:51 -08:00
Andrew Rynhard	9745c3a504	fix: update kernel version constant This is needed in order for integration tests to pass. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-12-02 15:26:28 -08:00
Andrey Smirnov	5b7bea2471	feat: use grpc-proxy in apid This replaces codegen version of apid proxying with talos-systems/grpc-proxy based version. Proxying is transparent, it doesn't require exact information about methods and response types. It requires some common layout response to enhance it properly with node metadata or errors. There should be no signifcant changes to the API with the previous version, but it's worth mentioning a few changes: 1. grpc.ClientConn is established just once per upstream (either local service or remote apid instance). 2. When called without `-t` (`targets`), apid proxies immediately down to local service skipping proxying to itself (as before), which results in empty node metadata in response (before it had local node IP). Might revert this later to proxy to itself (?). 3. Streaming APIs are now fully supported with multiple targets, but message definition doesn't contain `ResponseMetadata`, so streaming APIs are broken now with targets (needs a fix). 4. Errors are now returned as responses with `Error` field set in `ResponseMetadata`, this requires client library update and `osctl` to handle it properly. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-11-29 22:57:25 +03:00
Andrew Rynhard	e78e1655f1	feat: upgrade packages This brings in the following changes: - Linux 5.3.13 - Containerd 1.3.1 Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-25 10:41:47 -08:00
Andrey Smirnov	63212ab17e	test: fix integration test for k8s version Push versions to constants, introduce 'platform' to version API to discover node mode. Check kernel version for non-containers. A bit of refactoring on version package to expose something closer to a single response. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-11-11 13:42:21 -08:00
Andrew Rynhard	17cce5468f	feat: add metadata file to boot partition This introduces the notion of metadata for a node. In this initial pass there are only two fields. A timestamp to indicate when the install was performed, and a field to indicate if the install was performed as part of an upgrade. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-05 17:59:45 -08:00
Andrew Rynhard	5abbb9b041	fix: Avoid running bootkube on reboots Since bootkube should only be ran once, we need a way to determine if it has already been ran. This makes use of etcd to store a key-value pair indicating that the cluster has been initialized. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-01 15:20:43 -07:00
Andrew Rynhard	3c6d0135d0	feat: upgrade Kubernetes to 1.16.2 This brings in 1.16.2 modules and bumps the default hyperkube image. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-30 06:35:12 -07:00
Brad Beam	457c6416a6	feat: Add network api to apid This extends apid to include the network api Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-10-28 04:21:48 -07:00
Brad Beam	ee24e42319	feat: Add time api to apid This extends apid to cover the time api. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-10-25 14:35:14 -07:00
Andrey Smirnov	d3d011c8d2	chore: replace `/* */` comments with `//` comments in license header This fixes issues with `// +build` directives not being recognized in source files. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-10-25 14:15:17 -07:00
Brad Beam	573cce8d18	feat: Add APId This PR introduces APId. This service replaces the frontend functionality previously provided by OSD. The main driver for this is two fold: 1. Create a single purpose application to expose the talos api 2. Make use of code generation to DRY api changes Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-10-25 13:02:33 -05:00
Andrew Rynhard	10b6202c4f	refactor: improve metal platform This brings in a few minor improvements to the metal platform. The first is to use talos.config=metal-iso to indicate that the machine's config can be found in an ISO image. The second is a fix to ensure that /mnt exists. This adds support for creating more than one node using the qemu-boot.sh script. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-14 22:05:56 -07:00
Andrew Rynhard	80e3876df5	feat: remove proxyd We have decided that proxyd is not the best architectue for HA Kubernetes. Our recommendation to users will be to create a load balancer instead. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-14 08:11:00 -07:00
Brad Beam	d3f20db0aa	fix: Use correct names for kubelet config With the change to bootkube, kubelet.conf has changed names and is now kubelet-kubeconfig. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-10-11 07:42:32 -07:00
Andrey Smirnov	bb5f5cc754	chore: bump golangci-lint to 1.20 Memory usage reduced around 8-10x: now it stays stable at 1GB. I disabled some of the new linters, and one rule which is violated a lot. I might make sense to go back and enable `wsl` fixing all the issues (leaving that for another PR). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-10-09 22:21:08 +03:00
Andrew Rynhard	04313bd48c	feat: add CNI, and pod and service CIDR to configurator This adds more methods to the Cluster interface that allows for more granular control of the cluster network settings. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-08 07:53:27 -07:00
Andrew Rynhard	b29391f0be	feat: use bootkube for cluster creation This replaces kubeadm with bootkube. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-07 17:17:57 -07:00

1 2

75 Commits