talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-07 13:41:20 +02:00

Author	SHA1	Message	Date
Andrey Smirnov	0af7624c7d	fix: resolve race condition in createNodes Due to the race, main goroutine might consume all the errors from `errCh` and close `nodesCh`, so node goroutine might hit panic on send to closed channel. ``` panic: send on closed channel goroutine 40 [running]: github.com/talos-systems/talos/internal/pkg/provision/providers/firecracker.(provisioner).createNodes.func1(0x26ab668, 0xc00025a000, 0xc0005a83c0, 0xc00029d540, 0xc000536120, 0xc000464540, 0xc000041d80, 0x18, 0xc0006d406c, 0x4, ...) /src/internal/pkg/provision/providers/firecracker/node.go:55 +0x1fa created by github.com/talos-systems/talos/internal/pkg/provision/providers/firecracker.(provisioner).createNodes /src/internal/pkg/provision/providers/firecracker/node.go:50 +0x1ca ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-04-10 14:15:41 -07:00
Andrew Rynhard	6fe5fed6f9	fix: make upgrades work with UEFI Since the `--once` option of `extlinux` seems to only work with BIOS, we needed to change to remove any reliance on this option. Instead of booting the upgraded version once, and then making it the default after a successful boot, we now make it the default, and then revert on any boot error. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-03-26 13:34:00 -07:00
Andrew Rynhard	5dbc26c7a3	feat: rename osctl to talosctl This is a rename of the osctl binary. We decided that talosctl is a better name for the Talos CLI. This does not break any APIs, but does make older documentation only accurate for previous versions of Talos. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-03-20 19:07:39 -07:00
Andrew Rynhard	69fa63a7b2	refactor: perform upgrade upon reboot This PR introduces a new strategy for upgrades. Instead of attempting to zap the partition table, create a new one, and then format the partitions, this change will only update the `vmlinuz`, and `initramfs.xz` being used to boot. It introduces an A/B style upgrade process, which will allow for easy rollbacks. One deviation from our original intention with upgrades is that this change does not completely reset a node. It falls just short of that and does not reset the partition table. This forces us to keep the current partition scheme in mind as we make changes in the future, because an upgrade assumes a specific partition scheme. We can improve upgrades further in the future, but this will at least make them more dependable. Finally, one more feature in this PR is the ability to keep state. This enables single node clusters to upgrade since we keep the etcd data around. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-03-20 17:32:18 -07:00
Andrey Smirnov	d5f80858dd	test: add 'reset' integration test for Reset() API Every node is reset, rebooted and it comes back up again except for the init node due to known issues with init node boostrapping etcd cluster from scratch when metadata is missing (as node was wiped). Planned workaround is to prohibit resetting init node (should be coming next). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-03-06 23:05:46 +03:00
Andrey Smirnov	bbe2c53d29	feat: generate kubeconfig on the fly on request This extracts admin kubeconfig generation out of bootkube, now based on Talos x509 library. On each API request for `kubeconfig`, config is generated on the fly and sent back on the wire. This fixes two issues: * any master node can now generate `kubeconfig` (worker nodes can do that too, but that should probably change in the future) * after upgrade-and-wipe the disk scenario, `osctl kubeconfig` still works Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-28 21:00:52 +03:00
Andrey Smirnov	d5d3035c8c	test: enable upgrade tests 0.4.x -> latest With the fix #1904, it's now possible to upgrade 0.4.x with `machine.File` extra files (caused by registry mirror for registry.ci.svc). Bump resources for upgrade tests in attempt to speed it up. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-26 00:09:32 +03:00
Andrew Rynhard	64b5b32732	refactor: use go-procfs This makes use of the external procfs pacakge that is based on the pacakge we are removing here. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2020-02-19 15:58:57 -08:00
Andrey Smirnov	afea21bc5a	fix: stop firecracker launcher on signal When inner function was added, `return nil` was not aborting launch sequence, but rather leading to VM restart. `cluster destroy` still worked fine, as it removes state directory and launcher exits on failure. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-19 18:04:48 +03:00
Andrey Smirnov	33332f4c74	chore: support bootloader emulation in firecracker provisioner Firecracker launches tries to open VM disk image before every boot, parses partition table, finds boot partition, tries to read it as FAT32 filesystem, extracts uncompressed kernel from `bzImage` (firecracker doesn't support `bzImage` yet), extracts initramfs and passes it to firecracker binary. This flow allows for extended tests, e.g. testing installer, upgrade and downgrade tests, etc. Bootloader emulation is disabled by default for now, can be enabled via `--with-bootloader-emulation` flag to `osctl cluster create`. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-13 23:21:37 +03:00
Andrey Smirnov	76c2038b13	chore: implement loadbalancer for firecracker provisioner This PR contains generic simple TCP loadbalancer code, and glue code for firecracker provisioner to use this loadbalancer. K8s control plane is passed through the load balancer, and Talos API is passed only to the init node (for now, as some APIs, including kubeconfig, don't work with non-init node). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-02-13 23:07:13 +03:00
Andrey Smirnov	fae5e6915d	chore: rework firecracker code around upstream Go SDK + PRs This removes use of private fork with custom `ip=` kernel argument handling and switches fully to upstream version of it. Firecracker Go SDK version is `master` + following PRs: * https://github.com/firecracker-microvm/firecracker-go-sdk/pull/167 * https://github.com/firecracker-microvm/firecracker-go-sdk/pull/177 * https://github.com/firecracker-microvm/firecracker-go-sdk/pull/178 MTU handling support was implemented as well. Changes: * hostname to each node is passed via `talos.hostname=` kernel arg * IP configuration is generated by SDK from CNI result * fixed bugs with wrong netmask * nameservers & MTU is passed via Talos config Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-01-29 02:35:15 +03:00
Andrey Smirnov	cdfc0b8099	chore: remove Firecracker bridge interface in osctl cluster destroy Cleaning things up so that IP network can be re-used with another network name (and inteface name). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-01-28 17:18:45 +03:00
Andrey Smirnov	9da687d2a3	test: firecracker provisioner fixes, implement cluster destroy This implements `osctl cluster destroy` for Firecracker, adds new utility command `osctl cluser show`. Firecracker mode now has control process for firecracker VMs, allowing clean reboots and background operations. Lots of small fixes to Firecracker mode, clean CNI shutdown, cleaning up netns, etc. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-01-21 17:11:06 -08:00
Andrey Smirnov	2bf8540855	test: provision Talos clusters via Firecracker VMs This is initial PR to push the initial code, it has several known problems which are going to be addressed in follow-up PRs: 1. there's no "cluster destroy", so the only way to stop the VMs is to `pkill firecracker` 2. provisioner creates state in `/tmp` and never deletes it, that is required to keep cluster running when `osctl cluster create` finishes 3. doesn't run any controller process around firecracker to support reboots/CNI cleanup (vethxyz interfaces are lingering on the host as they're never cleaned up) The plan is to create some structure in `~/.talos` to manage cluster state, e.g. `~/.talos/clusters/<name>` which will contain all the required files (disk images, file sockets, VM logs, etc.). This directory structure will also work as a way to detect running clusters and clean them up. For point number 3, `osctl cluster create` is going to exec lightweight process to control the firecracker VM process and to simulate VM reboots if firecracker finishes cleanly (when VM reboots). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2020-01-16 00:27:08 +03:00

15 Commits