This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This replaces logging to files with inotify following to pure in-memory
circular buffer which grows on demand capped at specified maximum
capacity.
The concern with previous approach was that logs on tmpfs were growing
without any bound potentially consuming all the node memory.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.
A few highlights are:
- no more event bus
- functional approach to tasks (no more types defined for each task)
- the task function definition now offers a lot more context, like
access to raw API requests, the current sequence, a logger, the new
state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
install force option
Additionally, this pulls various packes in under machined to make the
code easier to navigate.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This updates upgrade tests to run two flows with 3+1 clusters:
1. 0.3 -> current (testing upgrade with partition wiping)
2. 0.4-alpha.7 -> current (testing upgrade without partition wiping,
boot-a/boot-b)
And small upgrade with preserve enabled for single-node cluster.
Provision tests are now split into two parallel tracks in Drone.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR reworks the ordering of our recovery function. It will make sure
we actually show the user the recovery message prior to looking into
whether to auto-reboot.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This PR allows Talos to respect the panic=0 flag if users pass that in
their kernel args. Doing this makes it easier to catch kernel panics in
debug scenarios and allows the user to manually trigger a restart with
ctrl+alt+del when they're ready.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This aligns the nomenclature for filesystems like /dev and /proc with
what is used in the kernel code.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Since dmesg is not streamed, it becomes difficult to debug issues with
machined. This fixes that by setting up the logging of machine to go to
/dev/kmsg and to a log file.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
The gofumports does everything that gofumpt does with the addition of
formatting imports. This change proposes the use of the `-local` flag so
that we can have imports separated in the following order:
- standard library
- third party
- Talos specific
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This change aims to make installations more unified and reliable. It
introduces the concept of a mountpoint manager that is capable of
mounting, unmounting, and moving a set of mountpoints in the correct
order.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
The responsibility of init should only be to mount the rootfs. This
change moves Talos specific logic into machined. This will allow us to
define a version of Talos in a single binary instead of split across
two. This will enable cleaner upgrades and helps make the codebase
easier to reason about.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This adds generic goroutine runner which simply wraps service as process
goroutine. It supports log redirection and basic panic handling.
DHCP-related part of the network package was slightly adjusted to run as
service with logging updates (to redirect logs to a file) and context
canceling.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR introduces dependencies between the services. Now each service
has two virtual events associated with it: 'up' (running and healthy)
and 'down' (finished or failed). These events are used to establish
correct order via conditions abstraction.
Service image unpacking was moved into 'pre' stage simplifying
`init/main.go`, service images are now closer to the code which runs the
service itself.
Step 'pre' now runs after 'wait' step, and service dependencies are now
mixed into other conditions of 'wait' step on startup.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements insecure over-file-socket gRPC API for init with two
first simplest APIs: reboot and shutdown (poweroff).
File socket is mounted only to `osd` service, so it is the only service
which can access init API. Osd forwards reboot/shutdown already
implemented APIs to init which actually executes these.
This enables graceful shutdown/reboot with service shutdown, sync, etc.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Most crucial changes in `init/main.go`: on shutdown now Talos tries
to stop gracefully all the services. All the shutdown paths are unified,
including poweroff, reboot and panic handling on startup.
While I was at it, I also fixed bug with containers failing to start
when old snapshot is still around.
Service lifecycle is wrapped with `ServiceRunner` object now which
handles state transitions and captures events related to state changes.
Every change goes to the log as well.
There's no way to capture service state yet, but that is planned to be
implemented as RPC API for `init` which is exposed via `osd` to `osctl`.
Future steps:
1. Implement service dependencies for correct startup order and
shutdown order.
2. Implement service health, so that we can say "start trustd when
containerd is up and healthy".
3. Implement gRPC API for init, expose via osd (service status, restart,
poweroff, ...)
4. Impement 'String()' for conditions, so that we can see what service
is waiting on right now.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>