19 Commits

Author SHA1 Message Date
Andrey Smirnov
c0698c1815 chore(machined): implement process reaper for PID 1 machined process
In UNIX, any zombies without parent process get re-parented to process
with PID 1 (usually running init), and PID 1 process should take care of
them (usually simply clean them up). Cleaning up zombies is important,
as they still take kerner resources, and having enormous amount of
zombie processes signifcantly degrades system performance.

For Talos, PID 1 process is machined, and machined itself forks to run
other processes in process runner and `pkg/cmd` one-time commands. Naive
solution of running `wait()` loop doesn't work as it might race with
`Process.Wait()` and clean up zombie which wasn't re-parented which
leads to process execution false failure.

After considering other solutions, we decided to go with the simple
approach: machined runs global zombie process reaper which publishes
information about reaped zombies. Any call to `Process.Wait()` (or
`Command.Wait()` which calls it) should be replaced with listening to
reaper's channel for notifications to catch info about the process which
was created in this call.

There are several changes in this PR:

1. Reaper implementation itself, started from machined.

2. Process runner and `pkg/cmd` can either use regular `Command.Wait()`
or use reaper notifications depending on reaper status (running/not
running). This allows using this code outside of machined.

3. Small bug fixes with process log which was affecting the tests.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-05 10:01:02 -07:00
Andrew Rynhard
d4770d41ad feat: run installs via container
This moves to performing installs via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 15:01:20 -05:00
Andrew Rynhard
0bdaff1a90 feat: perform upgrades via container
This moves to performing upgrades via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 09:44:50 -07:00
Andrew Rynhard
43e20217e8 feat: add ability to pass data on event bus
We need to support eventing with associated data. This moves the event
bus to an observer design pattern that allows observers to register for
specific events, and to receive the associated data.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-26 13:27:02 -07:00
Andrew Rynhard
9eaa2d8140 feat: add sequencer interface
This adds an interface that can be used to descibe boot, shutdown, and
upgrade events in a set of phases.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 12:59:42 -07:00
Andrew Rynhard
be8f58c15d feat: add overlay task
This adds a well defined task for handling all overlay mount points that
are required by the system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 10:47:54 -07:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Andrew Rynhard
2e65cff3ce feat: mount /sys/fs/bpf
The BPF filesystem is required to pin BPF objects.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-18 07:37:08 -07:00
Brad Beam
da1f73249f fix(machined): Clean up installation process
This also includes a fix for #955 which had the unintended side effect
of breaking image creation ( since it would attempt to grow the filesystem
always ).

The refactor standardizes around looking for the DATA and ESP labels to
discover any existing installations/filesystems. If none are found, an
installation will proceed -- for both image creation and bare metal.
During bootup, the DATA partition will always attempt to expand/grow.

This also introduces a new phase to verify the installation through the
existance of /boot/installed ( migrated from install stage ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-08 22:10:14 -05:00
Andrey Smirnov
71640662e0 chore(init): rearrange phase handling to push shutdown to main
This re-arranges phases a bit so that shutdown actions are pushed back
to the top-level main.go of machined.

Small rudimentary event.Bus is introduce to facilitate event passing
(shutdown/restart) between various machined components and main.go. This
might be not the best implementation, just something to allow this
message passing without global variables or such.

Machined API was refactored to run as goroutine service.

ACPI & signal handlers re-built as phase tasks, and activated for
non-container, container modes respectively.

As part of the fix, now `docker stop` triggers correct shutdown of Talos
(not a big deal, but good for testing).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 08:42:12 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrew Rynhard
ca35b85300 refactor: improve installation reliability
This change aims to make installations more unified and reliable. It
introduces the concept of a mountpoint manager that is capable of
mounting, unmounting, and moving a set of mountpoints in the correct
order.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 11:44:40 -07:00
Andrew Rynhard
835d72b74a fix: create overlay mounts after install
Without running the install task first, /var is read-only. This causes
the overlay phase to fail as it tries to create /var/system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 06:35:12 -07:00
Andrey Smirnov
084378ac04 fix(init): flip concurrency of tasks/services, fix small issues
Phases should run sequentially, while tasks concurrently in a phase.

There are two potential issues fixed:

1. `result` multierror was updated inside goroutine without any
synchronization, so this is a data race
2. panic inside task/phase runner might happen and as unhandled panic in a
goroutine aborts whole process, this might lead to a system halt as
as the 'machined' exits

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-31 14:21:07 -07:00
Andrew Rynhard
e63c882b89 refactor: split machined into phases
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 12:40:03 -07:00
Andrew Rynhard
b7a9acbe88 refactor: move setup logic into machined
The responsibility of init should only be to mount the rootfs. This
change moves Talos specific logic into machined. This will allow us to
define a version of Talos in a single binary instead of split across
two. This will enable cleaner upgrades and helps make the codebase
easier to reason about.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-26 07:48:49 -07:00
Andrew Rynhard
0ec17e4169 feat: run rootfs from squashfs
This change moves the rootfs to a squashfs image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 08:38:31 -07:00
Andrew Rynhard
88bdedf3e6 fix: make /etc/resolv.conf writable
We need /etc/resolv.conf to be writable so that networkd can update it.
This change achieves this by creating a symlink at /etc/resolv.conf that
points to /var/resolv.conf.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-19 20:37:00 -07:00
Andrew Rynhard
8e8aae98dd feat: add machined
This commit splits our current init into init and machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-16 13:12:21 -07:00