Commit Graph

29 Commits

Author SHA1 Message Date
Andrey Smirnov
d7f5de62c3 feat: colorize output of cluster health checks
It only gets enabled if output is a terminal. Failures which resolve
themselves are removed from the final output. Small spinner to indicate
progress.

While I was at it, I fixed client-side `talosctl health` when init node
is missing.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-06 07:59:30 -07:00
Andrey Smirnov
16eb47a1a3 feat: use kubeconfig merge in talosctl kubeconfig by default
Kubeconfig merge was completely rewritten to be "smarter":

* automatically apply renames done at previous stages to avoid asking
over and over again (in general should ask just once)

* skip checks if parts of the config match exactly

* allow overwrite as an option

* flexible way to control the output

* activating context in the end

* custom merged context name

Fixes #2578

Fixes #2587

Fixes #2577

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-03 05:36:15 -07:00
Andrey Smirnov
b9bfe00b88 feat: support custom filename for talosctl kubeconfig
This also refactors much of the CLI code for the `talosctl kubeconfig`:

1. Do all the checks before fetching kubeconfig from the server: as
kubeconfig generation takes a few seconds, it doesn't make sense to
generate it if it's not going to be used.

2. Unify most of merge & write directly features.

3. Don't use ExtractTarGz method to be more flexible.

4. Allow custom paths for kubeconfig, whether it is a directory or full
path to the file to be created.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-30 12:05:50 -07:00
Andrew Rynhard
c693e556d2 feat: add images command
This adds a command that lists all of the images used by Talos. This is
useful in the case of airgap installs, so that users will know which images
to pull.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-18 12:55:08 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
47608fb874 refactor: make pkg/config not rely on machined/../internal/runtime
This makes `pkg/config` directly importable from other projects.

There should be no functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 12:40:12 -07:00
Andrey Smirnov
3d8418a689 feat: force nodes to be set in talosctl commands using the API
With load-balancing enabled by default running `talosctl` without
`--nodes` is risky, as it might hit any control plane by default without
`--nodes`.

Only two commands do not enforce this check, as they do their own node
contexts: `crashdump` and `health` (client-side).

Integration tests were updated to always supply `--nodes` cli argument,
while doing that I refactored the storage for discovered nodes to use
existing `cluster.Info` interface.

The downside is that with e2e CAPI tests CLI tests will be mostly
skipped as we don't support discovery in CLI tests at the momemnt. This
can be fixed by using `talosctl kubeconfig` + `kubectl get nodes` for
node discovery.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-21 12:17:43 -07:00
Andrey Smirnov
1a0e1bc393 chore: update module dependencies
Fixes #2316

Simply update dependencies we don't track on version level to be
compatible with Talos components (like etcd or k8s).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 12:00:50 -07:00
Andrey Smirnov
c54639e541 feat: implement server-side API for cluster health checks
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.

Summary of changes:

* new `cluster` API

* `apid` now listens without auth on local file socket

* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate

* `talosctl health` by default now does server-side health check

UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-15 13:52:13 -07:00
Andrey Smirnov
cbb7ca8390 refactor: merge osd into machined
This merges `osd` API into `machined`. API was copied from `osd` into
`machined`, and `osd` API was deprecated.

For backwards compatibility, `machined` still implements `osd` API, so
older Talos API clients can still talk to the node without changes.

Docs were updated. No functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-13 12:50:00 -07:00
Artem Chernyshev
8fc352ec4f feat: merge mode in talosctl kubeconfig
New flag `-m` will enable merge mechanism in `talosctl kubeconfig`

Command examples:

```
talosctl kubeconfig -m

talosctl kubeconfig -m ~/.kube/config
```

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-07-10 12:39:30 -07:00
Andrey Smirnov
97d18b1c43 test: fix cli tests after load-balancing got enabled
There were three problems:

* cli tests did commands in sequence assuming they all hit the same
node, but with load-balancing it's no longer true

* restart test was affected, as it hit different node for check after
restart, and it succeeded immediately, while on original node process
was still starting which resulted in failure in the next tests; replace
the check to make sure service is up and healthy, so that test leaves
cluster in a good state

* restart API response had wrong format (no message returned) which
resulted in failures with apid proxy (when used with `-n`)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-09 14:06:30 -07:00
Andrey Smirnov
a6b3bd2ff6 feat: implement service events
This implements service events, adds test for events API based on
service events as they're the easiest to generate on demand.

Disabled validate test for 'metal' as it validates disk device against
local system which doesn't make much sense.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-03 13:52:53 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrey Smirnov
0a4645fe80 feat: implement circular buffer for system logs
This replaces logging to files with inotify following to pure in-memory
circular buffer which grows on demand capped at specified maximum
capacity.

The concern with previous approach was that logs on tmpfs were growing
without any bound potentially consuming all the node memory.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-26 15:33:54 -07:00
Andrew Rynhard
d0d2ac3c74 test: default to using the bootstrap API
This moves our test scripts to using the bootstrap API. Some
automation around invoking the bootstrap API was also added
to give the same ease of use when creating clusters with the
CLI.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-24 08:46:10 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrey Smirnov
55dcbbc8d0 feat: add commands talosctl health/crashdump
This extracts health & crashdump features which were specific to
provisioning code into separate package which can be used standalone.

Everything else is just new glue.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-04-27 20:43:10 -07:00
Andrey Smirnov
b94be4f6a1 test: mark long tests as !short
This skips long-running integration tests if `-test.short` mode is
enabled.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-27 22:34:26 +03:00
Andrew Rynhard
5dbc26c7a3 feat: rename osctl to talosctl
This is a rename of the osctl binary. We decided that talosctl is a
better name for the Talos CLI. This does not break any APIs, but does
make older documentation only accurate for previous versions of Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-20 19:07:39 -07:00
Andrey Smirnov
0babc39653 feat: split osctl commands into Talos API and cluster management
This keeps backwards compatibility with `osctl` CLI binary with the
exception of `osctl config generate` which was renamed to `osctl
gen config` to avoid confusion with other `osctl config`
commands which operate on client config, not Talos server config.

Command implementation and helpers were split into subpackages for
cleaner code and more visible boundaries. The resulting binary still
combines commands from both sections into a single binary.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-20 22:45:04 +03:00
Andrey Smirnov
0afd0f651b chore: provide provisioned cluster info to integration test
Integration test can optionally consume cluster state as generated by
the call to `osctl cluster create` and use it to discover nodes in
integration tests.

This means that now CLI tests can use that as discovery source, and
API/K8s tests by default as well.

Flat list of nodes is to be replaced by something more complex in the
next iteration, but it's good for this PR.

As a demo, add CLI test with multiple nodes (dmesg).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-31 18:21:30 +03:00
Andrey Smirnov
3a021e4579 test: add integration tests for (most) CLI commands
I added tests for all the commands which work reliably in container mode.

Some tests are naive, some are more sophisticated. While going
through the tests, I think I found a small bug in `osctl gen keypair`.

When we get reliable KVM tests, I can revisit and add missing
tests for time, reboot, shutdown and friends.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-20 23:33:35 +03:00
Andrey Smirnov
f3dff87957 fix: fail on muliple nodes for commands which don't support it
Fixes #1663

(I believe it's 0.3 backport strong candidate).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-18 18:51:40 +03:00
Andrey Smirnov
6e05dd70c4 feat: add support for tailing logs
Fixes #1564

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-17 22:35:47 +03:00
Andrey Smirnov
1fbf40796f feat: implement streaming mode of dmesg, parse messages
Fixes #1563

This implements dmesg reading via `/dev/kmsg`, with message parsing and
formatting. Kernel log facility and severity are parsed, timestamp is
calculated relative to boot time (it's accurate unless time jumps a
lot during node lifetime).

New flags to follow dmesg was added, tail flag allows to stream only new
message (ignoring old messages). We could try to implement tailing last
N messages, just a bit more work, open to suggestions (for symmetry with
regular logs).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-16 17:40:15 +03:00
Andrey Smirnov
edb40437ec feat: add support for osctl logs -f
Now default is not to follow the logs (which is similar to `kubectl logs`).

Integration test was added for `Logs()` API and `osctl logs` command.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 13:58:52 -08:00
Andrey Smirnov
96a7289f06 test: fix integration version test as 'NODE:' might be missing
When invoked without `-t`, `osctl` shouldn't print `NODE:` anymore.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-03 07:45:41 -08:00
Andrey Smirnov
551fa45d33 test: add CLI integration test
This starts with a very simple test for `osctl version` using regexps as
output of the command depends a lot on current version.

We might use more of 'gold' matches for other commands potentially.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-05 17:59:23 -08:00