58 Commits

Author SHA1 Message Date
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
4ad4511b38 chore: enable nolintlint linter
It makes sure our `//nolint:` directives are not redundant.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 07:39:19 -07:00
Andrey Smirnov
0a4645fe80 feat: implement circular buffer for system logs
This replaces logging to files with inotify following to pure in-memory
circular buffer which grows on demand capped at specified maximum
capacity.

The concern with previous approach was that logs on tmpfs were growing
without any bound potentially consuming all the node memory.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-26 15:33:54 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrey Smirnov
e38cde9b48 chore: update upgrade tests for new version, split into two tracks
This updates upgrade tests to run two flows with 3+1 clusters:

1. 0.3 -> current (testing upgrade with partition wiping)
2. 0.4-alpha.7 -> current (testing upgrade without partition wiping,
boot-a/boot-b)

And small upgrade with preserve enabled for single-node cluster.

Provision tests are now split into two parallel tracks in Drone.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-24 15:30:00 -07:00
Spencer Smith
4d5c7e482c fix: ensure printing of panic message
This PR reworks the ordering of our recovery function. It will make sure
we actually show the user the recovery message prior to looking into
whether to auto-reboot.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-17 16:40:47 -04:00
Spencer Smith
853ce16df4 feat: respect panic kernel flag
This PR allows Talos to respect the panic=0 flag if users pass that in
their kernel args. Doing this makes it easier to catch kernel panics in
debug scenarios and allows the user to manually trigger a restart with
ctrl+alt+del when they're ready.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-10 13:21:34 -04:00
Andrew Rynhard
4efccd96ea refactor: rename virtual package to pseudo
This aligns the nomenclature for filesystems like /dev and /proc with
what is used in the kernel code.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-26 22:32:48 -08:00
Andrew Rynhard
e81b3d11a8 feat: output machined logs to /dev/kmsg and file
Since dmesg is not streamed, it becomes difficult to debug issues with
machined. This fixes that by setting up the logging of machine to go to
/dev/kmsg and to a log file.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-03 12:53:13 -08:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrew Rynhard
5ee554128e chore: move from gofumpt to gofumports
The gofumports does everything that gofumpt does with the addition of
formatting imports. This change proposes the use of the `-local` flag so
that we can have imports separated in the following order:

- standard library
- third party
- Talos specific

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-12 07:49:12 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrew Rynhard
ca35b85300 refactor: improve installation reliability
This change aims to make installations more unified and reliable. It
introduces the concept of a mountpoint manager that is capable of
mounting, unmounting, and moving a set of mountpoints in the correct
order.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 11:44:40 -07:00
Andrew Rynhard
e63c882b89 refactor: split machined into phases
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 12:40:03 -07:00
Andrew Rynhard
b7a9acbe88 refactor: move setup logic into machined
The responsibility of init should only be to mount the rootfs. This
change moves Talos specific logic into machined. This will allow us to
define a version of Talos in a single binary instead of split across
two. This will enable cleaner upgrades and helps make the codebase
easier to reason about.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-26 07:48:49 -07:00
Andrew Rynhard
8e8aae98dd feat: add machined
This commit splits our current init into init and machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-16 13:12:21 -07:00
Andrew Rynhard
1e9548d149 feat: use new pkgs for initramfs and rootfs
This brings in the newly compiled libraries and binaries from our new
pkg builds.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-15 10:32:29 -07:00
Andrey Smirnov
0662af19d1 chore: seed math.rand PRNG on startup in every service (#801)
This is important as otherwise `math/rand` outputs predictable sequence
each time.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-28 11:03:15 -07:00
Andrew Rynhard
85afe4f828
feat: use eudev for udevd (#780)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-06-25 19:25:57 -07:00
Andrew Rynhard
ebc725afa6
feat: add support for upgrading init nodes (#761)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-06-24 15:25:32 -07:00
Brad Beam
a1e635a4b2
feat(init): Prioritize usage of local userdata (#694)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-30 09:56:14 -05:00
Andrey Smirnov
40a5b7c177
feat(init): expose networkd as goroutine-based server (#682)
This adds generic goroutine runner which simply wraps service as process
goroutine. It supports log redirection and basic panic handling.

DHCP-related part of the network package was slightly adjusted to run as
service with logging updates (to redirect logs to a file) and context
canceling.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-27 17:07:28 +03:00
Andrey Smirnov
a0188aff73
feat(init): implement service dependencies, correct start and shutdown (#680)
This PR introduces dependencies between the services. Now each service
has two virtual events associated with it: 'up' (running and healthy)
and 'down' (finished or failed). These events are used to establish
correct order via conditions abstraction.

Service image unpacking was moved into 'pre' stage simplifying
`init/main.go`, service images are now closer to the code which runs the
service itself.

Step 'pre' now runs after 'wait' step, and service dependencies are now
mixed into other conditions of 'wait' step on startup.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-24 19:17:52 +03:00
Brad Beam
a64de7ed51
feat(init): Add initToken parameter to userdata (#664)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-20 14:23:38 -05:00
Andrew Rynhard
ff58642d93
feat: improve package for /proc/cmdline parsing and management (#645)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-12 09:05:29 -07:00
Andrew Rynhard
0df1d9ca70
feat(init): run udevd as a container (#601)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-30 08:48:48 -07:00
Tim Jones
4341411c16 refactor(init): add helper for getting specific kernel parameters (#596)
Signed-off-by: Tim Jones <timniverse@gmail.com>
2019-04-29 10:58:51 -07:00
Tim Jones
7127998f56 feat(init): Add support for hostname kernel parameter (#591)
Signed-off-by: Tim Jones <timniverse@gmail.com>
2019-04-29 09:50:43 -07:00
Andrew Rynhard
020d11d4ba
feat(init): enforce KSPP kernel parameters (#585)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-28 13:12:07 -07:00
Andrew Rynhard
9b4fec0fa8
feat(osctl): add ability to create docker based clusters (#584)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-28 12:06:03 -07:00
Andrew Rynhard
2a4b56d4a1
feat(init): load only the images required by the node type (#582)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-26 20:13:48 -07:00
Andrey Smirnov
ab2917e833
feat(init): implement init gRPC API, forward reboot to init (#579)
This implements insecure over-file-socket gRPC API for init with two
first simplest APIs: reboot and shutdown (poweroff).

File socket is mounted only to `osd` service, so it is the only service
which can access init API. Osd forwards reboot/shutdown already
implemented APIs to init which actually executes these.

This enables graceful shutdown/reboot with service shutdown, sync, etc.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-26 23:04:24 +03:00
Andrey Smirnov
505b5022c4
feat(init): implement graceful shutdown of 'init' (#562)
Most crucial changes in `init/main.go`: on shutdown now Talos tries
to stop gracefully all the services. All the shutdown paths are unified,
including poweroff, reboot and panic handling on startup.

While I was at it, I also fixed bug with containers failing to start
when old snapshot is still around.

Service lifecycle is wrapped with `ServiceRunner` object now which
handles state transitions and captures events related to state changes.
Every change goes to the log as well.

There's no way to capture service state yet, but that is planned to be
implemented as RPC API for `init` which is exposed via `osd` to `osctl`.

Future steps:

1. Implement service dependencies for correct startup order and
shutdown order.

2. Implement service health, so that we can say "start trustd when
containerd is up and healthy".

3. Implement gRPC API for init, expose via osd (service status, restart,
poweroff, ...)

4. Impement 'String()' for conditions, so that we can see what service
is waiting on right now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-26 16:53:19 +03:00
Andrew Rynhard
a817e744c7
feat: remove blockd (#536)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-14 16:57:37 -07:00
Andrew Rynhard
2faf36bd67
feat: add support for extra disk management (#524)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-13 22:41:03 -07:00
Brad Beam
853bbfaf5b refactor(initramfs): clean up network code (#507)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-04-09 15:33:17 -07:00
Brad Beam
7d4db80da7 feat: add network configuration support (#476)
Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2019-04-05 20:33:25 -07:00
Andrew Rynhard
e18b5086a9
chore: update org to new name (#480)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-03 18:29:21 -07:00
Andrew Rynhard
455aeb742c
chore: expose userdata and osctl client packages (#471)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-02 17:11:17 -07:00
Andrew Rynhard
2e9a7ec0c5
feat: add power off functionality (#462)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-03-24 20:21:41 -07:00
Brad Beam
3693cff14f feat: add basic ntp implementation (#459)
Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2019-03-23 15:58:13 -07:00
Brad Beam
75d1d89291 feat(initramfs): add support for refreshing dhcp lease (#454)
Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2019-03-13 06:43:36 -07:00
Andrew Rynhard
1f0896123c
feat: log to stdout when in container mode (#450)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-03-10 20:05:53 -07:00
Andrew Rynhard
b5f398d3dd
feat: add container based deploy support to init (#447)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-03-09 20:53:32 -08:00
Spencer Smith
ee232b8f9a feat: add DHCP client (#427)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-02-27 07:58:37 -08:00
Andrew Rynhard
9e947c3fa5
feat: add automated PKI for joining nodes (#406)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-02-23 23:17:56 -08:00
Brad Beam
cd2ffa54a7 refactor(init): make baremetal consume install package (#414)
Allows for a single way to install talos to a node.

Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2019-02-23 14:07:05 -08:00
Spencer Smith
8e30f95f9c fix: output userdata fails, ignore numcpu for kubeadm (#398)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-02-20 08:48:54 -08:00