19 Commits

Author SHA1 Message Date
Andrew Rynhard
4eeef28e90 feat: add etcd API
This adds RPCs for basic etcd management tasks.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-10-06 11:30:04 -07:00
Andrew Rynhard
d4f103ffcb fix: pass config via stdin
In order to perform upgrades the way we would like, it is important that
we avoid any bind mounts into containers. This change ensures that all
system services get their config via stdin.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-08-20 15:26:13 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
2697b99b7d refactor: extract pkg/net as github.com/talos-systems/net
This extracts common package as new module/repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 11:04:50 -07:00
Andrey Smirnov
52c5911fcd chore: extract pkg/crypto as external module
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 06:33:30 -07:00
Andrey Smirnov
7474b8ba52 feat: upgrade etcd to 3.4.10
This upgrades etcd to latest v3.4.x version as smooth upgrade from
version 3.3.22 in 0.6.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-13 07:33:51 -07:00
Andrey Smirnov
47608fb874 refactor: make pkg/config not rely on machined/../internal/runtime
This makes `pkg/config` directly importable from other projects.

There should be no functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 12:40:12 -07:00
Andrey Smirnov
ddbe9cfc2f fix: update timeouts on service startup to match boot timeout
There's a global timeout for all services to be up: it's 5 minutes. We
need to make sure each service startup takes less than that, otherwise
boot sequence is aborted and there's no way to see the error message for
each particular service.

Also propagate contexts correctly and set some default timeouts to make
sure API operations are not hanging forever.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-08 07:39:36 -07:00
Seán C McCord
8ba8742f2e fix: wrap etcd address URLs with formatting
Make certain that all etcd address URLs are properly wrapped to handle
IPv6 addresses.

Fixes #2120

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-05-18 10:30:16 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
69fa63a7b2 refactor: perform upgrade upon reboot
This PR introduces a new strategy for upgrades. Instead of attempting to
zap the partition table, create a new one, and then format the
partitions, this change will only update the `vmlinuz`, and
`initramfs.xz` being used to boot. It introduces an A/B style upgrade
process, which will allow for easy rollbacks. One deviation from our
original intention with upgrades is that this change does not completely
reset a node. It falls just short of that and does not reset the
partition table. This forces us to keep the current partition scheme in
mind as we make changes in the future, because an upgrade assumes a
specific partition scheme. We can improve upgrades further in the
future, but this will at least make them more dependable. Finally, one
more feature in this PR is the ability to keep state. This enables
single node clusters to upgrade since we keep the etcd data around.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-20 17:32:18 -07:00
Spencer Smith
12bfd8dd94 feat: allow for persistence of config data
This PR will allow users to set the `persist: true` value in their
config data to tell talos not to re-pull the config data at each reboot.
The default will still remain as a "pull every time" methodolgy in order
to encourage immutability by default.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-06 11:42:00 -05:00
Spencer Smith
7719a67834 fix: refuse to upgrade if single master
This PR adds some simple logic to bail early in the upgrade process if
there only seems to be a single etcd node present in the cluster. This
allows us to keep from blowing up non-HA clusters if users issue an
upgrade command.

Will close #1770.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-13 15:31:01 -05:00
Andrew Rynhard
898cf01f0a refactor: unify generate type and machine type
We have been using two packages that define a config type and a machine
type, when really they are one and the same. This unifies the types down
to one set.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-10 16:46:28 -08:00
Andrew Rynhard
3e5ca30aa5 refactor: simplify NewTemporaryClientFromPKI
This is a simple refactor that reduces the number of arguments required
by `NewTemporaryClientFromPKI`.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-03 09:10:24 -08:00
Andrew Rynhard
c9732458c1 fix: verify that all etcd members are running before upgrading
This verifies that all etcd members are running before performing an
upgrade. Without this we run the risk of destroying the etcd cluster.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-04 18:17:13 -08:00
Andrew Rynhard
33468f4d6a fix: don't use 127.0.0.1 for etcd client
We should use 127.0.0.1 only in special cases (like when bootstrapping
the cluster). There is the potential that the local etcd member is
unhealthy and/or not responsive. This adds function for creating an etcd
client configured with all control plane node IPs in order to better
handle this case.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-04 16:54:15 -08:00
Andrew Rynhard
ce911c02da refactor: use etcd package
This DRYs things up by using the etcd package for client creation.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-01 21:02:44 -07:00
Andrew Rynhard
5abbb9b041 fix: Avoid running bootkube on reboots
Since bootkube should only be ran once, we need a way to determine if it
has already been ran. This makes use of etcd to store a key-value pair
indicating that the cluster has been initialized.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-01 15:20:43 -07:00