Commit Graph

57 Commits

Author SHA1 Message Date
Artem Chernyshev
5d48bd5f6a feat: allow disabling NoSchedule taint on masters using TUI installer
I think this should come handy for setting up single node SBC clusters.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-07 07:31:54 -08:00
Artem Chernyshev
63e0d02aa9 feat: add TUI for configuring network interfaces settings
Allows configuring:
- cidr.
- dhcp enable/disable.
- MTU.
- Ignore.
- Dhcp metric.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-03 11:05:55 -08:00
Artem Chernyshev
c7062e3f4d feat: make GenerateConfiguration accept current time as a parameter
If the node time is out of sync, it can generate incorrect
configuration. And maintenance mode does not allow us starting ntp,
because there is no containerd.

By providing current UTC time of the machine where talosctl client is
running, it is possible to force GenerateConfiguration use correct time.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-03 08:28:11 -08:00
Artem Chernyshev
f96cffd2b2 feat: add ability to choose CNI config
Initial version which only allows setting CNI using preset, no custom
CNI urls are supported at the moment. Still need to figure out what kind
of UI can be used for that.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-26 06:49:54 -08:00
Andrey Smirnov
9a32e34cb1 feat: implement apply configuration without reboot
This allows config to be written to disk without being applied
immediately.

Small refactoring to extract common code paths.

At first, I tried to implement this via the sequencer, but looks like
it's too hard to get it right, as sequencer lacks context and config to
be written is not applied to the runtime.

Fixes #2828

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-23 12:42:44 -08:00
Artem Chernyshev
8513123d22 feat: return client config as the second value in GenerateConfiguration
To be used in interactive installer to output the node client
configuration to a file.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-17 07:20:05 -08:00
Artem Chernyshev
0f924b5122 feat: add generate config gRPC API
Fixes: https://github.com/talos-systems/talos/issues/2766

This API is implemented in Maintenance and Machine services.
Can be used to generate configuration on the node, instead of using
talosctl to generate it locally.

To be used in interactive installer and talosctl gen config.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-13 08:07:32 -08:00
Artem Chernyshev
93e30a1738 chore: remove maintenance service interface and use machine service
Now maintenance service implements `MachineService` interface, stubbing
all not implemented methods.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-11 12:33:44 -08:00
Andrew Rynhard
71321214a1 feat: add storage API
This is the initial implementation of a storage API.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-11 10:12:25 -08:00
Andrey Smirnov
026244097a refactor: drop osd compatibility layer
Fixes #2761

Service `osd` was merged into machined on Jul, 13th, before 0.6 release.

It's time to drop the backwards compatibility with clients before 0.6.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-11 09:38:19 -08:00
Andrew Rynhard
562f816526 refactor: use gRPC for interactive installation
Instead of hosting a web service, we decided to implement a gRPC service
that exposes APIs that can be used in a client-side interactive installer.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-03 08:36:44 -08:00
Artem Chernyshev
e7e99cf1b3 feat: support disk usage command in talosctl
Usage example:

```bash
talosctl du --nodes 10.5.0.2 /var -H -d 2
NODE       NAME
10.5.0.2   8.4 kB   etc
10.5.0.2   1.3 GB   lib
10.5.0.2   16 MB    log
10.5.0.2   25 kB    run
10.5.0.2   4.1 kB   tmp
10.5.0.2   1.3 GB   .
```

Supported flags:
- `-a` writes counts for all files, not just directories.
- `-d` recursion depth
- '-H' humanize size outputs.
- '-t' size threshold (skip files if < size or > size).

Fixes: https://github.com/talos-systems/talos/issues/2504

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-13 09:30:31 -07:00
Andrew Rynhard
4eeef28e90 feat: add etcd API
This adds RPCs for basic etcd management tasks.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-10-06 11:30:04 -07:00
Seán C McCord
ff92d2a14b feat: add ApplyConfiguration API
Adds the ability to apply (replace) an existing node configuration with
a new one via the Machine API.

Fixes #2345

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-09-29 14:44:06 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
74413b1393 fix: ignore sequence lock errors in machined
This prevents reboots when some actions triggers sequence while another
sequence is still running.

Fixes #2209

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-20 14:36:06 -07:00
Andrey Smirnov
ad99cb6421 feat: implement talosctl dashboard command
This builds a simple CLI UI for Talos cluster monitoring.

Some new APIs were added for monitoring based on Prometheus procfs
package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 14:24:04 -07:00
Andrey Smirnov
c54639e541 feat: implement server-side API for cluster health checks
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.

Summary of changes:

* new `cluster` API

* `apid` now listens without auth on local file socket

* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate

* `talosctl health` by default now does server-side health check

UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-15 13:52:13 -07:00
Andrey Smirnov
cbb7ca8390 refactor: merge osd into machined
This merges `osd` API into `machined`. API was copied from `osd` into
`machined`, and `osd` API was deprecated.

For backwards compatibility, `machined` still implements `osd` API, so
older Talos API clients can still talk to the node without changes.

Docs were updated. No functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-13 12:50:00 -07:00
Andrey Smirnov
4cc074cdba feat: implement API access to event history
1. Add [xid-based](https://github.com/rs/xid) event IDs. Xids
are sortable and unique enough. Xids also encode event publishing
time with a second precision.

2. Add three ways to look back into event history: based on number of
events, on time and ID. Lookup via ID might be used to restart event
polling in case of broken API connection from the same moment.

3. Reimplement core event buffer with positions which are always
incremented instead of generation+index, this implementation is much
more simple (idea from circular buffer).

4. By default, Events API works the same - it shows no history and
starts streaming new events only.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-08 10:54:50 -07:00
Andrey Smirnov
a6b3bd2ff6 feat: implement service events
This implements service events, adds test for events API based on
service events as they're the easiest to generate on demand.

Disabled validate test for 'metal' as it validates disk device against
local system which doesn't make much sense.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-03 13:52:53 -07:00
Andrew Rynhard
a5a2d959ed feat: upgrade runc to v1.0.0-rc90
This updates runc to the same version vendored by containerd.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-02 13:19:33 -07:00
Andrew Rynhard
11ad2a5ea8 feat: add rollback API
This adds an API for rolling back the version of Talos loaded by
the bootloader.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-06-09 16:18:40 -07:00
Andrey Smirnov
1739439674 fix: update Events API response type to match proxying conventions
Streaming APIs are not supposed to wrap response into `repeated`
container, as streaming allows to send as many responses back as
required.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-05-15 11:57:47 -07:00
Andrew Rynhard
7915c73a86 fix: register event service with router
This adds the events streaming RPC to routerd.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-15 07:33:32 -07:00
Andrew Rynhard
1902519727 feat: add events API
This adds an event stream to the runtime, and the ability to stream
events via the API.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-13 12:18:10 -07:00
Andrew Rynhard
8e07b1bab3 feat: add bootstrap API
This adds the ability to bootstrap a cluster using the API.
The API simply starts the bootkube service.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-07 16:47:28 -07:00
Andrew Rynhard
56d7bf19fe feat: add recovery API
This adds an API for recovering the self-hosted control plane.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-04 19:38:30 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
69fa63a7b2 refactor: perform upgrade upon reboot
This PR introduces a new strategy for upgrades. Instead of attempting to
zap the partition table, create a new one, and then format the
partitions, this change will only update the `vmlinuz`, and
`initramfs.xz` being used to boot. It introduces an A/B style upgrade
process, which will allow for easy rollbacks. One deviation from our
original intention with upgrades is that this change does not completely
reset a node. It falls just short of that and does not reset the
partition table. This forces us to keep the current partition scheme in
mind as we make changes in the future, because an upgrade assumes a
specific partition scheme. We can improve upgrades further in the
future, but this will at least make them more dependable. Finally, one
more feature in this PR is the ability to keep state. This enables
single node clusters to upgrade since we keep the etcd data around.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-20 17:32:18 -07:00
Andrew Rynhard
fe7847e0b8 feat: add reboot flag to reset API
This adds the ability to automatically reboot a machine after a reboot.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 05:10:58 -08:00
Spencer Smith
8092362098 fix: fix reset command
This PR will fix the reset command to actually wipe the system disk as
expected.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-02-18 16:18:43 -05:00
Brad Beam
88df1b50b8 feat(networkd): Add health api
This introduces a health/ready api for networkd. This
will allow us to better determine the state of networkd
and allow for some level of monitoring.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-29 09:09:27 -06:00
Andrey Smirnov
6e05dd70c4 feat: add support for tailing logs
Fixes #1564

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-17 22:35:47 +03:00
Andrew Rynhard
ad863a7f92 refactor: rename protobuf services, RPCs, and messages
This PR brings our protobuf files into conformance with the protobuf
style guide, and community conventions. It is purely renames, along with
generated docs.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-11 11:41:40 -08:00
Andrey Smirnov
3a93e65b54 feat: make osd.Dmesg API streaming
This is to prepare for upcoming switch to reading `/dev/kmsg` which
should allow following logs, doing some kind of tail, etc.

The output is far from being perfect, as `dmesg` data is delivered as
single chunk (not as lines), but once server side updates, client side
should match it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-09 23:52:35 +03:00
Andrey Smirnov
edb40437ec feat: add support for osctl logs -f
Now default is not to follow the logs (which is similar to `kubectl logs`).

Integration test was added for `Logs()` API and `osctl logs` command.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 13:58:52 -08:00
Andrey Smirnov
5b7bea2471 feat: use grpc-proxy in apid
This replaces codegen version of apid proxying with
talos-systems/grpc-proxy based version. Proxying is transparent, it
doesn't require exact information about methods and response types. It
requires some common layout response to enhance it properly with node
metadata or errors.

There should be no signifcant changes to the API with the previous
version, but it's worth mentioning a few changes:

1. grpc.ClientConn is established just once per upstream (either local
service or remote apid instance).

2. When called without `-t` (`targets`), apid proxies immediately down
to local service skipping proxying to itself (as before), which results
in empty node metadata in response (before it had local node IP). Might
revert this later to proxy to itself (?).

3. Streaming APIs are now fully supported with multiple targets, but
message definition doesn't contain `ResponseMetadata`, so streaming APIs
are broken now with targets (needs a fix).

4. Errors are now returned as responses with `Error` field set in
`ResponseMetadata`, this requires client library update and `osctl` to
handle it properly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-29 22:57:25 +03:00
Andrew Rynhard
ac089dc330 feat: add read API
This adds an API for reading arbitrary files.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-25 10:46:50 -08:00
Brad Beam
28ee910899 chore: Fix formatting ( make fmt )
Not sure if there was an update in the fmt code path, but these are the
results after running `make fmt`.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-23 13:50:52 -08:00
Andrey Smirnov
63212ab17e test: fix integration test for k8s version
Push versions to constants, introduce 'platform' to version API to
discover node mode. Check kernel version for non-containers.

A bit of refactoring on version package to expose something closer to a
single response.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-11 13:42:21 -08:00
Brad Beam
531e7d8144 feat: Add meminfo api
Add ability to retrieve node memory stats ( /proc/meminfo ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-10 21:02:43 -06:00
Brad Beam
7897374ff1 feat: Add support for streaming apis in apid
This brings in the recent updates to protoc-gen-proxy to allow support
for proxying streaming api requests. We artificially limit it to only the first
target specified in the list while we work through what multi target stream
support looks like.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-08 14:22:30 -06:00
Brad Beam
41a4741bca refactor: Move logs to machined
This moves Logs endpoint to machined to reduce the mount footprint of osd.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-04 15:04:13 -08:00
Brad Beam
a4e1479b07 refactor: Move kubeconfig to machined
This moves the Kubeconfig api endpoint to machined and consolidates the
"read a file" code into machined. This also changes Kubeconfig to
use the CopyOut method which changes Kubeconfig to a streaming grpc call.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-04 14:45:23 -08:00
Brad Beam
3fd8abf426 chore: Move data messages to common proto
This is to allows reuse across multiple apis.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-04 14:24:41 -06:00
Brad Beam
457c6416a6 feat: Add network api to apid
This extends apid to include the network api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-28 04:21:48 -07:00
Brad Beam
ee24e42319 feat: Add time api to apid
This extends apid to cover the time api.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 14:35:14 -07:00
Brad Beam
6de32dd30b fix: Fix osctl version output
This broke when we introduced the apid changes.

```
Client:
Tag:         v0.3.0-alpha.3-3-gc3e353aa-dirty
SHA:         c3e353a-dirty
Built:
Go version:  go1.13.3
OS/Arch:     linux/amd64

Server:
NODE:        10.5.0.3
Tag:         v0.3.0-alpha.3-3-gc3e353aa-dirty
SHA:         c3e353a-dirty
Built:
Go version:  go1.13.3
OS/Arch:     linux/amd64

NODE:        10.5.0.2
Tag:         v0.3.0-alpha.3-3-gc3e353aa-dirty
SHA:         c3e353a-dirty
Built:
Go version:  go1.13.3
OS/Arch:     linux/amd64
```

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 15:41:06 -05:00
Brad Beam
573cce8d18 feat: Add APId
This PR introduces APId. This service replaces the frontend functionality
previously provided by OSD. The main driver for this is two fold:

1. Create a single purpose application to expose the talos api

2. Make use of code generation to DRY api changes

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 13:02:33 -05:00