When running with cgroupsv2 and the deeply nested nature of our CI, we
need to take extra steps to make sure tests are working fine.
Some tests were disabled under cgroupsv2 as I can't make them work.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The structure of the controllers is really similar to addresses and
routes:
* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state
Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#3538
See also talos-systems/pkgs#276
As new containerd is now Go module-based, it pulls many more
dependencies if simply imported in `go.mod`, so I had to replace the
reference to the constant in `pkg/machinery/` to `containerd` volume
with simple value to avoid pulling Kubernetes dependencies into
`pkg/machinery`.
Also updates the kernel to include PR talos-systems/pkgs#275 for AES-NI
support.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This removes container images for the aforementioned services, they are
now built into `machined` executable which launches one or another
service based on `argv[0]`.
Containers are started with rootfs directory which contains only a
single executable file for the service.
This creates rootfs on squashfs for each container in
`/opt/<container>`.
Service `networkd` is not touched as it's handled in #3350.
This removes all the image imports, snapshots and other things which
were associated with the existing way to run containers.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes a problem when Talos pulls `etcd` image one every reboot, as
`etcd` was running in the system containerd which is completely
ephemeral (backed by `tmpfs`).
Also skip pulling if image is already present and unpacked (same fix for
the `kubelet` image).
Fixes#3229
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes output of `talosctl containers` to show failed/exited
containers so that it's possible to see e.g. `kube-apiserver` container
when it fails to start. This also enables using ID from the container
list to see logs of failing containers, so it's easy to debug issues
when control plane pods don't start because of wrong configuration.
Also remove option to use either CRI or containerd inspector, default to
containerd for system namespace and to CRI for kubernetes namespace.
The only side effect is that we can't see `kubelet` container in the
output of `talosctl containers -k`, but `kubelet` itself is available in
`talosctl services` and `talosctl logs kubelet`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Instead of running `PreFunc` in goroutine which might leak behind the
lifetime of the service `PreFunc`, add more clauses to correctly abort
sequence on context canceled.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes were applied automatically.
Import ordering might be questionable, but it's strict:
* stdlib
* other packages
* same package imports
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This bug is sometimes reproducible with QEMU/arm64, as it runs really
slow. Looks like multiple concurrent image unpacks sharing some layers
might fail unexpectedly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This uses go-retry feature
(https://github.com/talos-systems/go-retry/pull/3) to print errors being
retried.
If image is not found in the index, abort retries immediately.
Don't pull installer image twice (if already pulled by the validation
code before).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.
And `pkg/machinery` is published as Go module inside Talos repository.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This change is only moving packages and updating import paths.
Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.
As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.
Other changes were direct dependencies discovered by `importvet` which
were updated.
Public packages (useful, general purpose packages with stable API):
* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`
Private packages (used only on provisioning library internally):
* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This make the config provider a pure interface definition by removing
all concrete internal types, and making them an interface.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
This makes `pkg/config` directly importable from other projects.
There should be no functional changes.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2272
`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.
This contains fixes from:
* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
I had to copy over some oci stuff from newer package version, but as we
for a long time use newer oic, we don't need a copy anymore.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Using this `LoggingManager` all the log flows (reading and writing) were
refactored. Inteface of `LoggingManager` should be now generic enough to
replace log handling with almost any implementation - log rotation,
sending logs to remote destination, keeping logs in memory, etc.
There should be no functional changes.
As part of changes, `follow.Reader` was implemented which makes
appending file feel like a stream. `file.NewChunker` was refactored to
use `follow.Reader` and `stream.NewChunker` to do the actual work. So
basically now we have only a single instance of chunker - stream
chunker, as everything is represented as a stream.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Looks like goroutine launched from suite setup might have a race while
trying to access methods which in the end try to load `testing.T` value,
as it changes while each individual test is running.
This leaves us with less diagnostics, but eliminates the race.
Sample:
```
WARNING: DATA RACE
Write at 0x00c00035e418 by goroutine 56:
github.com/stretchr/testify/suite.(*Suite).SetT()
/go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:37
+0x12d
github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetT()
<autogenerated>:1 +0x4d
github.com/stretchr/testify/suite.Run.func2()
/go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:119
+0x10f
testing.tRunner()
/toolchain/go/src/testing/testing.go:991 +0x1eb
Previous read at 0x00c00035e418 by goroutine 40:
github.com/stretchr/testify/suite.(*Suite).Require()
/go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:42
+0xdc
github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetupSuite.func1()
/src/internal/pkg/containers/containerd/containerd_test.go:119
+0x101
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.
A few highlights are:
- no more event bus
- functional approach to tasks (no more types defined for each task)
- the task function definition now offers a lot more context, like
access to raw API requests, the current sequence, a logger, the new
state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
install force option
Additionally, this pulls various packes in under machined to make the
code easier to navigate.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR does the following:
- updates the conform config
- cleans up conform scopes
- moves slash commands to the talos-bot
- adds a check list to the pull request template
- disables codecov comments
- uses `BOT_TOKEN` so all actions are performed as the talos-bot user
- adds a `make conformance` target to make it easy for contributors to
check their commit before creating a PR
- bumps golangci-lint to v1.24.0
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
For some places `.Close()` was clearly missing, for some of them I wanted
to be 200% sure it gets called on every code path.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#1901
This is same fix as #1680, #1690, but applied to image resolver code.
Default HTTP client can't be used here, as custom TLS client config
might be set on the transport to authenticate to the registry.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
When images are pulled by Talos or via CRI plugin, configuration
for each registry is applied. Mirrors allow to redirect pull request to
either local registry or cached registry. Auth & TLS enable
authentication and TLS authentication for non-public registries.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The kubelet fails to start if a machine's hostname is not set. If
networkd doesn't set it in time, the kubelet service fails to start.
Addionally, this adds retries to container pulls to ensure that any
temporary network failures don't cause fatal errors if we can't pull
images.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Now default is not to follow the logs (which is similar to `kubectl logs`).
Integration test was added for `Logs()` API and `osctl logs` command.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
I can't say how exactly those conflicts happen in the tests, but I tried
to randomize more container IDs and namespace names (which both feed
into final abstract unix socket path).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The problem was that if container fails to start, it never reaches
'StateRunning' and test hangs waiting for that state. Assertion doesn't
abort whole test (it only aborts goroutine it was called from), so this
doesn't help.
Fix that by signalling back if some containers fail to start.
This is not a fix, but it should expose the actual failure happening in
this test.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This moves to using a single revision of pkgs. It includes a few
changes:
- kernel with KVM host support
- containerd v1.3.0
This change brings in a kernel with host KVM support. This will allow us
to use VMs within Talos for things like integrations tests. This also
allows users to do things with KVM as they see fit.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Memory usage reduced around 8-10x: now it stays stable at 1GB.
I disabled some of the new linters, and one rule which is violated a
lot.
I might make sense to go back and enable `wsl` fixing all the issues
(leaving that for another PR).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves from translating a config into an internal config
representation, to using an interface. The idea is that an interface
gives us stronger compile time checks, and will prevent us from having to copy
from on struct to another. As long as a concrete type implements the
Configurator interface, it can be used to provide instructions to Talos.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Looks like containerd creates shim file sockets in Linux abstract
namespace which are fixed (don't depend on containerd root directory)
and depend on container namespace and id. So if two containerd instances
on the same host run same namespace/id pair, that is going to create a
conflict on that shim filesocket.
Avoid that by randomizing namespace name. CRI tests should be fine as
namespace is fixed, but container ID is random.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>