Commit Graph

47 Commits

Author SHA1 Message Date
Andrey Smirnov
b0209fd29d refactor: move networkd, timed APIs to machined, remove routerd
This moves implementation of the user-facing APIs to the machined, and
as now all the APIs are implemented by machined, remove routerd and
adjust apid to proxy to machined.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-24 00:00:28 -07:00
Artem Chernyshev
6ffabe5169 feat: add ability to find disk by disk properties
Fixes: https://github.com/talos-systems/talos/issues/3323

Not exactly matching with udevd generated `by-<id>` symlinks, but should
provide sufficient amount of property selectors to be able to pick
specific disks for any kind of disk: sd card, hdd, ssd, nvme.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-23 14:23:02 -07:00
Andrey Smirnov
ac8764702f refactor: move apid, routerd, timed and trustd to single executable
This removes container images for the aforementioned services, they are
now built into `machined` executable which launches one or another
service based on `argv[0]`.

Containers are started with rootfs directory which contains only a
single executable file for the service.

This creates rootfs on squashfs for each container in
`/opt/<container>`.

Service `networkd` is not touched as it's handled in #3350.

This removes all the image imports, snapshots and other things which
were associated with the existing way to run containers.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-23 09:48:11 -07:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
60aa011c7a feat: rename namespaces, resources, types etc
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.

No functional changes.

Additionally implements printing extra columns in `talosctl get xyz`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 13:34:15 -08:00
Andrey Smirnov
c7ee239087 fix: show stopped/exited containers via CRI inspector
This fixes output of `talosctl containers` to show failed/exited
containers so that it's possible to see e.g. `kube-apiserver` container
when it fails to start. This also enables using ID from the container
list to see logs of failing containers, so it's easy to debug issues
when control plane pods don't start because of wrong configuration.

Also remove option to use either CRI or containerd inspector, default to
containerd for system namespace and to CRI for kubernetes namespace.

The only side effect is that we can't see `kubelet` container in the
output of `talosctl containers -k`, but `kubelet` itself is available in
`talosctl services` and `talosctl logs kubelet`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-26 14:45:13 -08:00
Artem Chernyshev
041620c852 feat: implement talosctl edit and patch config commands
Fixes: https://github.com/talos-systems/talos/issues/3209

Using parts of `kubectl` package to run the editor.
Also using the same approach as in `kubectl edit` command:
- add commented section to the top of the file with the description.
- if the config has errors, display validation errors in the commented
section at the top of the file.
- retry apply config until it succeeds.
- abort if no changes were detected or if the edited file is empty.

Patch currently supports jsonpatch only and can read it either from the
file or from the inline argument.

https://asciinema.org/a/wPawpctjoCFbJZKo2z2ATDXeC

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-26 02:00:20 +03:00
Andrey Smirnov
254e0e91e1 fix: correctly unwrap responses for etcd commands
This uses wrappers which helps to unwrap errors from proxied apid
responses.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-17 11:33:54 -08:00
Andrey Smirnov
7f3dca8e4c test: add support for IPv6 in talosctl cluster create
Modify provision library to support multiple IPs, CIDRs, gateways, which
can be IPv4/IPv6. Based on IP types, enable services in the cluster to
run DHCPv4/DHCPv6 in the test environment.

There's outstanding bug left with routes not being properly set up in
the cluster so, IPs are not properly routable, but DHCPv6 works and IPs
are allocated (validates DHCPv6 client).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-09 13:28:53 -08:00
Artem Chernyshev
d515613bb7 fix: list command unlimited recursion default behavior
Revert back to old behavior.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-01-15 05:06:41 -08:00
Artem Chernyshev
a83e8758db feat: add commands to manage/query etcd cluster
Used already existing protobufs for that.

Commands:
`talosctl etcd members -n <node>`
`talosctl etcd leave -n <node>`
`talosctl etcd forfeit-leadership -n <node>`

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-22 11:49:10 -08:00
Artem Chernyshev
68dd5b9add feat: add talosctl merge config command
Allows merging two Talos configs into one. Merges the config in whatever
is set by TALOSCONFIG or ~/.talos/config.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-09 13:07:45 -08:00
Artem Chernyshev
d7ce831465 feat: add talosctl config contexts
Bonus to `talosctl config merge`.
Got that idea after using talosctl for a weekend.
I feel that can be a good addition to have a command that can list existing
contexts in a table view, which is similar to what `kubectl config get-contexts`
does. To avoid going through the file which has all the certs and such.

Called it just `contexts` to align with whatever we have now (to switch
    context you need to use `talosctl config context`).

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-09 12:19:10 -08:00
Andrey Smirnov
e4ebc4ab95 feat: suggest fixed control plane endpoints in talosctl gen config
Ex.:

```
$ talosctl gen config foo 192.168.0.1
no scheme and port specified for the cluster endpoint URL
try: "https://192.168.0.1:6443"
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-02 13:16:30 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Artem Chernyshev
061b296530 feat: allow specifying user-disks in talosctl cluster create
User-disks are supported by QEMU and Firecracker providers.
Can be defined by using the following parameters:
```
--user-disk /mount/path:1GB
```

Can get more than 1 user disk.
Same set of user disks will be created for all master and worker nodes.

Additionally enable user-disks in qemu e2e test.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-30 08:44:08 -07:00
Andrey Smirnov
773912833e test: clean up integration test code, fix flakes
This enables golangci-lint via build tags for integration tests (this
should have been done long ago!), and fixes the linting errors.

Two tests were updated to reduce flakiness:

* apply config: wait for nodes to issue "boot done" sequence event
before proceeding
* recover: kill pods even if they appear after the initial set gets
killed (potential race condition with previous test).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-19 15:44:14 -07:00
Artem Chernyshev
e7e99cf1b3 feat: support disk usage command in talosctl
Usage example:

```bash
talosctl du --nodes 10.5.0.2 /var -H -d 2
NODE       NAME
10.5.0.2   8.4 kB   etc
10.5.0.2   1.3 GB   lib
10.5.0.2   16 MB    log
10.5.0.2   25 kB    run
10.5.0.2   4.1 kB   tmp
10.5.0.2   1.3 GB   .
```

Supported flags:
- `-a` writes counts for all files, not just directories.
- `-d` recursion depth
- '-H' humanize size outputs.
- '-t' size threshold (skip files if < size or > size).

Fixes: https://github.com/talos-systems/talos/issues/2504

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-13 09:30:31 -07:00
Andrey Smirnov
d7f5de62c3 feat: colorize output of cluster health checks
It only gets enabled if output is a terminal. Failures which resolve
themselves are removed from the final output. Small spinner to indicate
progress.

While I was at it, I fixed client-side `talosctl health` when init node
is missing.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-06 07:59:30 -07:00
Andrey Smirnov
16eb47a1a3 feat: use kubeconfig merge in talosctl kubeconfig by default
Kubeconfig merge was completely rewritten to be "smarter":

* automatically apply renames done at previous stages to avoid asking
over and over again (in general should ask just once)

* skip checks if parts of the config match exactly

* allow overwrite as an option

* flexible way to control the output

* activating context in the end

* custom merged context name

Fixes #2578

Fixes #2587

Fixes #2577

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-03 05:36:15 -07:00
Andrey Smirnov
b9bfe00b88 feat: support custom filename for talosctl kubeconfig
This also refactors much of the CLI code for the `talosctl kubeconfig`:

1. Do all the checks before fetching kubeconfig from the server: as
kubeconfig generation takes a few seconds, it doesn't make sense to
generate it if it's not going to be used.

2. Unify most of merge & write directly features.

3. Don't use ExtractTarGz method to be more flexible.

4. Allow custom paths for kubeconfig, whether it is a directory or full
path to the file to be created.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-30 12:05:50 -07:00
Andrew Rynhard
c693e556d2 feat: add images command
This adds a command that lists all of the images used by Talos. This is
useful in the case of airgap installs, so that users will know which images
to pull.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-18 12:55:08 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
47608fb874 refactor: make pkg/config not rely on machined/../internal/runtime
This makes `pkg/config` directly importable from other projects.

There should be no functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 12:40:12 -07:00
Andrey Smirnov
3d8418a689 feat: force nodes to be set in talosctl commands using the API
With load-balancing enabled by default running `talosctl` without
`--nodes` is risky, as it might hit any control plane by default without
`--nodes`.

Only two commands do not enforce this check, as they do their own node
contexts: `crashdump` and `health` (client-side).

Integration tests were updated to always supply `--nodes` cli argument,
while doing that I refactored the storage for discovered nodes to use
existing `cluster.Info` interface.

The downside is that with e2e CAPI tests CLI tests will be mostly
skipped as we don't support discovery in CLI tests at the momemnt. This
can be fixed by using `talosctl kubeconfig` + `kubectl get nodes` for
node discovery.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-21 12:17:43 -07:00
Andrey Smirnov
1a0e1bc393 chore: update module dependencies
Fixes #2316

Simply update dependencies we don't track on version level to be
compatible with Talos components (like etcd or k8s).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 12:00:50 -07:00
Andrey Smirnov
c54639e541 feat: implement server-side API for cluster health checks
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.

Summary of changes:

* new `cluster` API

* `apid` now listens without auth on local file socket

* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate

* `talosctl health` by default now does server-side health check

UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-15 13:52:13 -07:00
Andrey Smirnov
cbb7ca8390 refactor: merge osd into machined
This merges `osd` API into `machined`. API was copied from `osd` into
`machined`, and `osd` API was deprecated.

For backwards compatibility, `machined` still implements `osd` API, so
older Talos API clients can still talk to the node without changes.

Docs were updated. No functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-13 12:50:00 -07:00
Artem Chernyshev
8fc352ec4f feat: merge mode in talosctl kubeconfig
New flag `-m` will enable merge mechanism in `talosctl kubeconfig`

Command examples:

```
talosctl kubeconfig -m

talosctl kubeconfig -m ~/.kube/config
```

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-07-10 12:39:30 -07:00
Andrey Smirnov
97d18b1c43 test: fix cli tests after load-balancing got enabled
There were three problems:

* cli tests did commands in sequence assuming they all hit the same
node, but with load-balancing it's no longer true

* restart test was affected, as it hit different node for check after
restart, and it succeeded immediately, while on original node process
was still starting which resulted in failure in the next tests; replace
the check to make sure service is up and healthy, so that test leaves
cluster in a good state

* restart API response had wrong format (no message returned) which
resulted in failures with apid proxy (when used with `-n`)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-09 14:06:30 -07:00
Andrey Smirnov
a6b3bd2ff6 feat: implement service events
This implements service events, adds test for events API based on
service events as they're the easiest to generate on demand.

Disabled validate test for 'metal' as it validates disk device against
local system which doesn't make much sense.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-03 13:52:53 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrey Smirnov
0a4645fe80 feat: implement circular buffer for system logs
This replaces logging to files with inotify following to pure in-memory
circular buffer which grows on demand capped at specified maximum
capacity.

The concern with previous approach was that logs on tmpfs were growing
without any bound potentially consuming all the node memory.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-26 15:33:54 -07:00
Andrew Rynhard
d0d2ac3c74 test: default to using the bootstrap API
This moves our test scripts to using the bootstrap API. Some
automation around invoking the bootstrap API was also added
to give the same ease of use when creating clusters with the
CLI.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-24 08:46:10 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrey Smirnov
55dcbbc8d0 feat: add commands talosctl health/crashdump
This extracts health & crashdump features which were specific to
provisioning code into separate package which can be used standalone.

Everything else is just new glue.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-04-27 20:43:10 -07:00
Andrey Smirnov
b94be4f6a1 test: mark long tests as !short
This skips long-running integration tests if `-test.short` mode is
enabled.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-27 22:34:26 +03:00
Andrew Rynhard
5dbc26c7a3 feat: rename osctl to talosctl
This is a rename of the osctl binary. We decided that talosctl is a
better name for the Talos CLI. This does not break any APIs, but does
make older documentation only accurate for previous versions of Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-20 19:07:39 -07:00
Andrey Smirnov
0babc39653 feat: split osctl commands into Talos API and cluster management
This keeps backwards compatibility with `osctl` CLI binary with the
exception of `osctl config generate` which was renamed to `osctl
gen config` to avoid confusion with other `osctl config`
commands which operate on client config, not Talos server config.

Command implementation and helpers were split into subpackages for
cleaner code and more visible boundaries. The resulting binary still
combines commands from both sections into a single binary.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-20 22:45:04 +03:00
Andrey Smirnov
0afd0f651b chore: provide provisioned cluster info to integration test
Integration test can optionally consume cluster state as generated by
the call to `osctl cluster create` and use it to discover nodes in
integration tests.

This means that now CLI tests can use that as discovery source, and
API/K8s tests by default as well.

Flat list of nodes is to be replaced by something more complex in the
next iteration, but it's good for this PR.

As a demo, add CLI test with multiple nodes (dmesg).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-31 18:21:30 +03:00
Andrey Smirnov
3a021e4579 test: add integration tests for (most) CLI commands
I added tests for all the commands which work reliably in container mode.

Some tests are naive, some are more sophisticated. While going
through the tests, I think I found a small bug in `osctl gen keypair`.

When we get reliable KVM tests, I can revisit and add missing
tests for time, reboot, shutdown and friends.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-20 23:33:35 +03:00
Andrey Smirnov
f3dff87957 fix: fail on muliple nodes for commands which don't support it
Fixes #1663

(I believe it's 0.3 backport strong candidate).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-18 18:51:40 +03:00
Andrey Smirnov
6e05dd70c4 feat: add support for tailing logs
Fixes #1564

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-17 22:35:47 +03:00
Andrey Smirnov
1fbf40796f feat: implement streaming mode of dmesg, parse messages
Fixes #1563

This implements dmesg reading via `/dev/kmsg`, with message parsing and
formatting. Kernel log facility and severity are parsed, timestamp is
calculated relative to boot time (it's accurate unless time jumps a
lot during node lifetime).

New flags to follow dmesg was added, tail flag allows to stream only new
message (ignoring old messages). We could try to implement tailing last
N messages, just a bit more work, open to suggestions (for symmetry with
regular logs).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-16 17:40:15 +03:00
Andrey Smirnov
edb40437ec feat: add support for osctl logs -f
Now default is not to follow the logs (which is similar to `kubectl logs`).

Integration test was added for `Logs()` API and `osctl logs` command.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 13:58:52 -08:00
Andrey Smirnov
96a7289f06 test: fix integration version test as 'NODE:' might be missing
When invoked without `-t`, `osctl` shouldn't print `NODE:` anymore.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-03 07:45:41 -08:00
Andrey Smirnov
551fa45d33 test: add CLI integration test
This starts with a very simple test for `osctl version` using regexps as
output of the command depends a lot on current version.

We might use more of 'gold' matches for other commands potentially.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-05 17:59:23 -08:00