talos

mirror of https://github.com/siderolabs/talos.git synced 2025-10-11 07:31:18 +02:00

Author	SHA1	Message	Date
Andrew Rynhard	85638f5d90	fix: pass x509 options to NewCertificateFromCSR This ensures that certificates are generated with the supplied options. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-06 19:43:56 -08:00
Andrey Smirnov	cdda81df66	test: add k8s integration tests Once again, mostly groundwork and one simple test for node versions. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-11-06 17:08:44 -08:00
Brad Beam	6519c575f8	feat: Add support for setting container output to stdout This allows the config.Debug setting to control container output to allow better troubleshooting. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-06 10:15:49 -06:00
Andrey Smirnov	27235b9ae1	chore: add simple health check for etcd service Fixes #1419 This is required to avoid later startup failures while trying to connect to etcd if it hasn't actually bootstrapped. This health check does just connectivity check, no quorum/leader checks, as they should depend on cluster state in general. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-11-06 05:50:40 -08:00
Andrey Smirnov	e2d9cc5438	fix: remove global variable in bootkube Just a small nit, as all the services share same package, global variable with generic name might lead to fun collisions. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-11-06 05:50:25 -08:00
Andrew Rynhard	8ca4d49347	fix: conditionally create a new etcd cluster This fixes a long standing issue with upgrading the init node. We currently have no way of knowing whether the init node should join an existing etcd cluster, or create a new one. This makes use of the node's metadata to determine if the node has already created the etcd cluster. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-05 19:05:02 -08:00
Andrew Rynhard	17cce5468f	feat: add metadata file to boot partition This introduces the notion of metadata for a node. In this initial pass there are only two fields. A timestamp to indicate when the install was performed, and a field to indicate if the install was performed as part of an upgrade. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-05 17:59:45 -08:00
Andrey Smirnov	551fa45d33	test: add CLI integration test This starts with a very simple test for `osctl version` using regexps as output of the command depends a lot on current version. We might use more of 'gold' matches for other commands potentially. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-11-05 17:59:23 -08:00
Spencer Smith	ce7a0e36cc	chore: re-enable e2e testing This PR will re-enable e2e testing by using the new cluster api bootstrap provider and various infra providers. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-11-05 16:53:38 -05:00
Brad Beam	988acfee51	docs: Add machine.env section Adds information about supported environment variables. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-05 12:41:49 -08:00
Andrew Rynhard	7419281fa5	chore: prepare release v0.3.0-alpha.6 This is the official v0.3.0-alpha.6 release. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com> v0.3.0-alpha.6	2019-11-05 11:21:49 -08:00
Brad Beam	4b3cc34ab0	fix: Disable support for proxy variables for apid. Since APId/gRPC connections should never go through a proxy, we will explicitly exclude these environment variables from apid. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-05 10:34:33 -08:00
Andrew Rynhard	06009f66c8	fix: sleep in NTP query loop We need to sleep between successful queries so we don't hit the NTP server too often. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-05 09:45:53 -08:00
Brad Beam	db00c83207	fix: Add host network namespace to networkd and ntpd Without host network namespace, networkd and ntpd didnt work properly. NTP failed to start up because it couldnt reach the ntp servers and networkd failed to configure the interfaces and display interface information. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-05 09:45:15 -08:00
Andrey Smirnov	b0aef2cf22	test: add integration test framework This is just first steps and core foundation. It can be used like: ``` make integration.test osctl cluster create build/integration.test -test.v ``` This should run the test against the Docker instance. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-11-05 17:21:38 +03:00
Andrew Rynhard	03a09c2294	refactor: rename Helper to Client The name helper isn't very good. This renames it to Client. A new func was also added, NewForConfig, that will allow for the creation of the helper client from an arbitrary Kubernetes REST config. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 19:31:27 -08:00
Andrew Rynhard	c9732458c1	fix: verify that all etcd members are running before upgrading This verifies that all etcd members are running before performing an upgrade. Without this we run the risk of destroying the etcd cluster. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 18:17:13 -08:00
Andrew Rynhard	33468f4d6a	fix: don't use 127.0.0.1 for etcd client We should use 127.0.0.1 only in special cases (like when bootstrapping the cluster). There is the potential that the local etcd member is unhealthy and/or not responsive. This adds function for creating an etcd client configured with all control plane node IPs in order to better handle this case. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 16:54:15 -08:00
Andrew Rynhard	2febace0a4	chore: remove bind mounts from OSD Now that the APIs have been moved, we no longer need these bind mounts. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 15:27:18 -08:00
Andrew Rynhard	a82ed0c5b7	fix: add etcd member conditionally We should add an etcd member only if it has not already been added. When a control plane node is rebooted, or down for whatever reason, when it comes back up it will attempt to add itself again. When it does so, the cluster is unhelathy due to the fact that the node was down. A feature of etcd called "strict-reconfig-check" prevents any member adds when the cluster is unhealthy since doing so would cause the cluster to lose quorum. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 15:24:30 -08:00
Brad Beam	41a4741bca	refactor: Move logs to machined This moves Logs endpoint to machined to reduce the mount footprint of osd. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-04 15:04:13 -08:00
Brad Beam	a4e1479b07	refactor: Move kubeconfig to machined This moves the Kubeconfig api endpoint to machined and consolidates the "read a file" code into machined. This also changes Kubeconfig to use the CopyOut method which changes Kubeconfig to a streaming grpc call. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-04 14:45:23 -08:00
Brad Beam	3fd8abf426	chore: Move data messages to common proto This is to allows reuse across multiple apis. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-04 14:24:41 -06:00
Andrew Rynhard	18f5c50a32	fix: stop etcd and remove data-dir We need to stop etcd earlier in the upgrade sequence to prevent machined from trying to restart it after leaving the etcd cluster. We also need to remove the data-dir since all the data becomes invalid once we leave the etcd cluster. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 11:48:28 -08:00
Andrew Rynhard	8f10462795	fix: use CRI to stop containers Using the CRI seems to be more dependable in ensuring that we don't hit EBUSY when trying to reset the system disk after stopping all containers. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 04:55:37 -08:00
Andrew Rynhard	7eb5b6b748	fix: verify system disk not in use This adds an extra phase to the upgrade sequence that ensures we don't hit EBUSY when attempting to delete the ephemeral partition. This is crucial because if we fail to do so, the disk does not have a bootloader and we effectively destroy the machine. It works by attempting to open the block device with O_EXCL: If the block device is in use by the system (e.g., mounted) , open() fails with the error EBUSY. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-04 04:46:01 -08:00
Andrew Rynhard	f43e42d845	chore: install customization requirements with ONBUILD There is no need for these packages to be in the base image. This moves to installing them using ONBUILD. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 22:51:05 -08:00
Andrew Rynhard	eb75d1fb47	refactor: use retry package in ntpd This moves to using the retry package for retrying NTP queries. It also adds some additional logging that is useful when NTP queries fail. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 19:21:39 -08:00
Andrew Rynhard	e9296bed6e	fix: retry BLKPG operations There are cases where we can see EBUSY when attempting to use the BLKPG ioctl. The recommendation seems to be to retry when this happens. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 18:20:54 -08:00
Andrew Rynhard	22f073b32e	refactor: unify service stop on upgrade This simplifies service shutdown tasks. Shutdown and upgrade events now use the same task. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 18:05:46 -08:00
Andrew Rynhard	f411491484	fix: stop leaking file descriptors This ensures that probed block devices are closed. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 17:15:54 -08:00
Andrew Rynhard	e81b3d11a8	feat: output machined logs to /dev/kmsg and file Since dmesg is not streamed, it becomes difficult to debug issues with machined. This fixes that by setting up the logging of machine to go to /dev/kmsg and to a log file. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 12:53:13 -08:00
Andrew Rynhard	3887b1e5b6	chore: force overwrite of output file This adds the force option to gzip. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 10:31:17 -08:00
Andrew Rynhard	eb0c8e9e4b	refactor: use constants.SystemContainerdNamespace This replaces hardcoded instances of "system" with constants.SystemContainerdNamespace. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 10:14:43 -08:00
Andrew Rynhard	3ce6f34995	feat: add timestamp to installed file This adds a timestamp to /boot/installed. It can be useful for determining the last known successful install. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 10:06:26 -08:00
Andrew Rynhard	45a3406fba	fix: send SIGKILL to hanging containers This addresses an issue caused by containers that refuse to exit with SIGTERM. After sending SIGTERM, we send SIGKILL after a timeout of one minute. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 09:44:49 -08:00
Andrew Rynhard	d15e226998	fix: be explicit about installs Trying to be smart about whether our not an install is being performed as part of an upgrade has proven to be error prone. This moves to perform installs with explicit args. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 09:31:14 -08:00
Andrew Rynhard	38692847b3	refactor: pass runtime to initializer By passing the runtime to the initializer we can flex on install options better. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 09:23:37 -08:00
Andrew Rynhard	326702925a	refactor: align platform names with kernel args This aligns platform names with tals.platofrm kernel arg. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-03 09:13:07 -08:00
Andrew Rynhard	ce911c02da	refactor: use etcd package This DRYs things up by using the etcd package for client creation. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-01 21:02:44 -07:00
Brad Beam	4653745acd	fix(osd): Add additional capabilities for osd This adds `CAP_DAC_READ_SEARCH`, `CAP_DAC_OVERRIDE`, and `CAP_SYSLOG` capabilities to osd. This fixes the ability to read dmesg and kubeconfig. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-11-01 20:45:43 -07:00
Andrew Rynhard	5abbb9b041	fix: Avoid running bootkube on reboots Since bootkube should only be ran once, we need a way to determine if it has already been ran. This makes use of etcd to store a key-value pair indicating that the cluster has been initialized. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-11-01 15:20:43 -07:00
Tim Gerla	c3a0302f17	docs: various layout and responsiveness fixes - adjust ul margin to keep the bullets inside the content area - fix a few docs page responsiveness problems on small screens - adjust the layout of the logo relative to the docs sidebar - clean up some vestigial CSS classes Signed-off-by: Tim Gerla <tim@gerla.net>	2019-11-01 05:58:15 -07:00
Andrew Rynhard	dc3870453b	feat: create cluster with default PSP This adds a default PSP that is applied upon bootstrapping the cluster. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-31 22:50:35 -07:00
Andrew Rynhard	a3dc6adec1	chore: remove unused files This removes unused files in hack. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-31 22:46:38 -07:00
Andrew Rynhard	03a26f5836	chore: prepare release v0.3.0-alpha.5 This is the official v0.3.0-alpha.5 release. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com> v0.3.0-alpha.5	2019-10-31 15:35:41 -07:00
Andrew Rynhard	7cd9ba588c	chore: remove RAW disk We need to remove this so that it is not published in a release. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-31 15:35:06 -07:00
Andrew Rynhard	6764170d1a	docs: remove v0.2 docs The v0.2 docs are inaccurate, and in general just bad. Since we made so many breaking changes in v0.3 I think its better we just hit the reset button and stick to v0.3 going forward. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-31 14:59:17 -07:00
Andrew Rynhard	96513ac397	docs: fix list-style-position This sets the list-style-position to inside by default, and overrides the landing page to use outside. This way we only need to maintain the CSS for the landing page and not all the other potential places we would want inside in the future. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-31 14:57:08 -07:00
Andrew Rynhard	2cad745292	docs: add customization guide This adds a section on customizing Talos. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-10-31 14:47:17 -07:00

1 2 3 4 5 ...

1178 Commits