New release comes with bugfixes (we got some of them integrated for
not tagged release), and few interesting new assertions, including
`Eventually` for polling.
See: https://github.com/stretchr/testify/milestone/2?closed=1
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves optional interface checks to unit-tests, removing type checks
via global variable assignment.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Looks like containerd creates shim file sockets in Linux abstract
namespace which are fixed (don't depend on containerd root directory)
and depend on container namespace and id. So if two containerd instances
on the same host run same namespace/id pair, that is going to create a
conflict on that shim filesocket.
Avoid that by randomizing namespace name. CRI tests should be fine as
namespace is fixed, but container ID is random.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements 'default deny' policy for service operations via the
API: services do not allow operations.
Service whitelists itself for stop/start/restart by implementing the
interface and returning boolean flag which might depend on userdata.
Machined APIs `Stop/Start` were renamed to `ServiceStop`/`ServiceStart`
to avoid confusion with osd API `Restart` which is not related to
services. Old APIs are deprecated and compatibility code forwards old
APIs to the new code.
`ServiceRestart` API was introduced to distinguish restart action from
stop/start (previously restart was implemented as stop+start in the
CLI).
Service udevd-trigger was whitelisted for all operations (allows
stopping hanging run, restarting to trigger once again).
Services proxyd & ntpd were whitelisted for restart and start (start is
whitelisted to help with service stuck in stopped state while restarting).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The gofumports does everything that gofumpt does with the addition of
formatting imports. This change proposes the use of the `-local` flag so
that we can have imports separated in the following order:
- standard library
- third party
- Talos specific
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This provides a target that can be useful for developers. It will format
code according to our standards.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
New linter 'funlen' was disabled as too many functions break the default
limit, but might be considered for the future.
To limit peak memory usage, `GOGC=50` was added to the golangci-lint run
to make Go's garbage collector more aggressive. With this setting peak
seems to be around 8Gb.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves the default log path to /var/log. An expection is made for
machined-api and system-containerd since they must have zero
dependencies on the ephemeral disk. In the case of machined-api, we
cannot stop the service since it is required to perform an upgrade. As
for system-containerd, it starts before any ephemeral disk is mounted so
we will fail to start the service since /var/log is a read-only file system.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
The gofumpt linter is a stricter drop-in replacement for gofmt. The
rules are ones that I strongly agree with and I think it would be better
if we added this linter instead of nit picking every PR.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This TODO no longer applies. We have setteled on a fixed boot size. This
also removes variables no longer needed.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR will make sure that each platform gets the console settings it
needs by setting them as extra flags in the makefile. This should ensure
that we have console logs flowing properly for each cloud.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
We need to remove an exiting AMI, if it exists, in order to create a new
one with the same name.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This moves to making AWS releases align with Azure, and GCP. We no
longer need packer since we will now release an artifact that users can
import.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
There was an issue where the hostname was getting set too early in the boot. This caused
the hostnam retrieved from platform.Hostname() to be ignored.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This PR moves to using v1alpha1 as the inital node config version, so
we can graduate these configs a little more cleanly later on.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Add translation for v1 to v0 machine networking. Also adds "Ignore"
property to v1 network interfaces.
Fixes#1134
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Broke the binding between the discrete IP addresses of the control plane
elements and the ControlPlaneEndpoint. This allows the specification of
a canonical controlplane address which may optionally be a DNS name.
Fixes#1131
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This modifies `osctl install` to use the provided userdata as the source
for default installation values. This allows such things as
userdata-supplied extra kernel parameters to be automatically
included in the bootloader.
Fixes#1102
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Added a decomposition option to the kernel.NewDefaultCmdline() so that
the Defaults can be added _after_ constructing a custom commandline.
This is then implemented for `osctl install`.
Fixes#1128
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Added a property to userdata to allow a network interface to be ignored,
such that Talos will perform no operations on it (including DHCP).
Also added kernel commandline parameter (talos.network.interface.ignore)
to specify a network interface should be ignored.
Also allows chaining of kernel cmdline parameter Contains() where the
parameter in question does not exist.
Fixes#1124
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This PR updates our e2e tests with the provider-components file that's
generated by our capi v0.1.9 update.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This was breaking e2e testing, as we depend on it for applying CAPI and
launching VMs from there.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Reading /proc/mounts while simultaneously unmounting mountpoints
prevents unmounting all submounts under /var. This is due to the fact
that /proc/mounts will change as we perform unmounts, and that causes a
read of the file to become inaccurate. We now read /proc/mounts into
memory to get a snapshot of all submounts under /var, and then we
proceed with unmounting them.
This also adds some additional logging that I found to be useful while
debugging this. It also adds logic to skip of DaemonSet managed pods.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
- Use the Validate method to ensure we get an appropriate time back
- Hard set the clock initially, adjust clock by offsets afterwards
- Introduce functional opts to configure ntp client
- Add additional test coverage
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
In UNIX, any zombies without parent process get re-parented to process
with PID 1 (usually running init), and PID 1 process should take care of
them (usually simply clean them up). Cleaning up zombies is important,
as they still take kerner resources, and having enormous amount of
zombie processes signifcantly degrades system performance.
For Talos, PID 1 process is machined, and machined itself forks to run
other processes in process runner and `pkg/cmd` one-time commands. Naive
solution of running `wait()` loop doesn't work as it might race with
`Process.Wait()` and clean up zombie which wasn't re-parented which
leads to process execution false failure.
After considering other solutions, we decided to go with the simple
approach: machined runs global zombie process reaper which publishes
information about reaped zombies. Any call to `Process.Wait()` (or
`Command.Wait()` which calls it) should be replaced with listening to
reaper's channel for notifications to catch info about the process which
was created in this call.
There are several changes in this PR:
1. Reaper implementation itself, started from machined.
2. Process runner and `pkg/cmd` can either use regular `Command.Wait()`
or use reaper notifications depending on reaper status (running/not
running). This allows using this code outside of machined.
3. Small bug fixes with process log which was affecting the tests.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This sets the default install image just before installation. It was
erroneously placed in the boot verification.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This cache was more important back when builds of Talos took upwards of
40 minutes. Since this is no longer the case, and I have seen
performance issues by mounting a host path into the container, I think
we should drop this.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Increased retry count to 6 for DHCP. In my testing, this worked
reliably in my setup, where the default (3) did not.
Ultimately, this should probably be configurable from the userdata.
Instead, this just makes it work for me.
Fixes#1099
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Fixes#1010
Wait for containerd shim socket to be removed before running container
second time.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>