This PR introduces a new strategy for upgrades. Instead of attempting to
zap the partition table, create a new one, and then format the
partitions, this change will only update the `vmlinuz`, and
`initramfs.xz` being used to boot. It introduces an A/B style upgrade
process, which will allow for easy rollbacks. One deviation from our
original intention with upgrades is that this change does not completely
reset a node. It falls just short of that and does not reset the
partition table. This forces us to keep the current partition scheme in
mind as we make changes in the future, because an upgrade assumes a
specific partition scheme. We can improve upgrades further in the
future, but this will at least make them more dependable. Finally, one
more feature in this PR is the ability to keep state. This enables
single node clusters to upgrade since we keep the etcd data around.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR will allow users to set the `persist: true` value in their
config data to tell talos not to re-pull the config data at each reboot.
The default will still remain as a "pull every time" methodolgy in order
to encourage immutability by default.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This PR adds some simple logic to bail early in the upgrade process if
there only seems to be a single etcd node present in the cluster. This
allows us to keep from blowing up non-HA clusters if users issue an
upgrade command.
Will close#1770.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
We have been using two packages that define a config type and a machine
type, when really they are one and the same. This unifies the types down
to one set.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This is a simple refactor that reduces the number of arguments required
by `NewTemporaryClientFromPKI`.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This verifies that all etcd members are running before performing an
upgrade. Without this we run the risk of destroying the etcd cluster.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
We should use 127.0.0.1 only in special cases (like when bootstrapping
the cluster). There is the potential that the local etcd member is
unhealthy and/or not responsive. This adds function for creating an etcd
client configured with all control plane node IPs in order to better
handle this case.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Since bootkube should only be ran once, we need a way to determine if it
has already been ran. This makes use of etcd to store a key-value pair
indicating that the cluster has been initialized.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>