This PR moves to using v1alpha1 as the inital node config version, so
we can graduate these configs a little more cleanly later on.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Add translation for v1 to v0 machine networking. Also adds "Ignore"
property to v1 network interfaces.
Fixes#1134
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Broke the binding between the discrete IP addresses of the control plane
elements and the ControlPlaneEndpoint. This allows the specification of
a canonical controlplane address which may optionally be a DNS name.
Fixes#1131
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This modifies `osctl install` to use the provided userdata as the source
for default installation values. This allows such things as
userdata-supplied extra kernel parameters to be automatically
included in the bootloader.
Fixes#1102
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Added a decomposition option to the kernel.NewDefaultCmdline() so that
the Defaults can be added _after_ constructing a custom commandline.
This is then implemented for `osctl install`.
Fixes#1128
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Added a property to userdata to allow a network interface to be ignored,
such that Talos will perform no operations on it (including DHCP).
Also added kernel commandline parameter (talos.network.interface.ignore)
to specify a network interface should be ignored.
Also allows chaining of kernel cmdline parameter Contains() where the
parameter in question does not exist.
Fixes#1124
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This PR updates our e2e tests with the provider-components file that's
generated by our capi v0.1.9 update.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This was breaking e2e testing, as we depend on it for applying CAPI and
launching VMs from there.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Reading /proc/mounts while simultaneously unmounting mountpoints
prevents unmounting all submounts under /var. This is due to the fact
that /proc/mounts will change as we perform unmounts, and that causes a
read of the file to become inaccurate. We now read /proc/mounts into
memory to get a snapshot of all submounts under /var, and then we
proceed with unmounting them.
This also adds some additional logging that I found to be useful while
debugging this. It also adds logic to skip of DaemonSet managed pods.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
- Use the Validate method to ensure we get an appropriate time back
- Hard set the clock initially, adjust clock by offsets afterwards
- Introduce functional opts to configure ntp client
- Add additional test coverage
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
In UNIX, any zombies without parent process get re-parented to process
with PID 1 (usually running init), and PID 1 process should take care of
them (usually simply clean them up). Cleaning up zombies is important,
as they still take kerner resources, and having enormous amount of
zombie processes signifcantly degrades system performance.
For Talos, PID 1 process is machined, and machined itself forks to run
other processes in process runner and `pkg/cmd` one-time commands. Naive
solution of running `wait()` loop doesn't work as it might race with
`Process.Wait()` and clean up zombie which wasn't re-parented which
leads to process execution false failure.
After considering other solutions, we decided to go with the simple
approach: machined runs global zombie process reaper which publishes
information about reaped zombies. Any call to `Process.Wait()` (or
`Command.Wait()` which calls it) should be replaced with listening to
reaper's channel for notifications to catch info about the process which
was created in this call.
There are several changes in this PR:
1. Reaper implementation itself, started from machined.
2. Process runner and `pkg/cmd` can either use regular `Command.Wait()`
or use reaper notifications depending on reaper status (running/not
running). This allows using this code outside of machined.
3. Small bug fixes with process log which was affecting the tests.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This sets the default install image just before installation. It was
erroneously placed in the boot verification.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This cache was more important back when builds of Talos took upwards of
40 minutes. Since this is no longer the case, and I have seen
performance issues by mounting a host path into the container, I think
we should drop this.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Increased retry count to 6 for DHCP. In my testing, this worked
reliably in my setup, where the default (3) did not.
Ultimately, this should probably be configurable from the userdata.
Instead, this just makes it work for me.
Fixes#1099
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Fixes#1010
Wait for containerd shim socket to be removed before running container
second time.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#1018#1020
Add more wait loops to address cases when unit-tests are running
extremely slow under high load on the build machine.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Use circular buffer instead of (unlimited) `bytes.Buffer` to limit
amount of stderr output captured. If command being run produces too much
output on stderr, this might consume too much RAM.
Use `pkg/cmd` to run command in `udevd` service. This should allow
easier udevd integration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Go by default caches unit-tests results via build cache, so if source
code doesn't have any changes, test results are cached on package level.
As our unit-tests are not that pure and depend on the environment, it
would be more helpful to make sure all the unit-tests during each build.
Setting number of test runs to one disable test result cache (but build
cache is still being used).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Since the command's name is 'generate' the 'gen' prefix is not needed
in the version flag. The flag is scoped under the generate command so
it should be very clear that the '--version' flag is used to control the
config version.
We also move to defaulting to v0 since v1 is new and still needs to be
tested in the real world. We can default to v1 in the next release.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This step is based on `golang` image, so `GOCACHE` is set in a bit of a
different way.
No big deal, but should speed up subsequent runs a bit.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
If multiple interfaces exist on a node, but the first interface was unsuccessful
in getting a dhcp response, we would seg fault when trying to retrieve the hostname
for that interface. This was due to d.Ack being nil and us having no guard around it
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
- Use az/gcloud cli bundled with container
- Use consistent spacing in scripts ( 2 spaces vs tab )
- Updated count functions to handle the count inline
- Made platform kubeconfig the default
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
Since the cluster create command and the upgrade command shared a common
variable, and the upgrade defaults to an empty string, we get an invalid
reference format error when attempting to create a cluster. This makes
the variables unique to avoid that.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
We have this flag missing in a number of places. This ensures that all
commands in the future will have this flags. A potential cleanup would
be to hide this flag in commands where it does not make sense. For now I
think its best to have everywhere.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
We have no need for this anymore since installs and upgrades are now
completely handled in a container.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR will upgrade to the latest beta of v1.16 in order to get us
closer to catching the v1.16.0 release as soon as it drops.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
We've seen some instances where the initial delay is not long enough (containerd)
as well as a period of every second increases the log size for services like
proxyd which log incoming connections.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This PR will implement the v1 machine config proposal. This will allow
for a streamlined config for talos nodes.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
We need to support eventing with associated data. This moves the event
bus to an observer design pattern that allows observers to register for
specific events, and to receive the associated data.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR will move to using the external kubeadm v1beta2 structs for our
code base. This will hopefully allow for more stable integrations with
kubeadm in the long term, as well as solve some needs we have in the
machine config rewrite.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>