75 Commits

Author SHA1 Message Date
Andrey Smirnov
f62d17125b
chore: update crypto to use new import path siderolabs/crypto
No functional changes in this PR, just updating import paths.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-07 23:02:50 +04:00
Andrey Smirnov
4e9c322564
fix: correctly render hosts.toml with multiple endpoints
We should preserve the order of keys in generated `hosts.toml`, but
go-toml library has no real way to do that on marshaling, so fix the
previous workaround, as it was generating invalid TOML.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-29 15:34:42 +04:00
Dmitriy Matrenichev
0fe4492e72
chore: bump golangci-lint from 1.47.2 to 1.48.0
Patch version linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-15 18:11:30 +03:00
Andrey Smirnov
f9b664c947
fix: reload trusted CA list when client is recreated
Fixes #5652

This reworks and unifies HTTP client/transport management in Talos:

* cleanhttp is used everywhere consistently
* DefaultClient is using pooled client, other clients use regular
  transport
* like before, Proxy vars are inspected on each request (but now
  consistently)
* manifest download functions now recreate the client on each run to
  pick up latest changes
* system CA list is picked up from a fixed locations, and supports
  reloading on changes

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-04 20:01:35 +04:00
Eng Zer Jun
fb058a7c92
test: use T.TempDir to create temporary test directory
This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 16:31:55 +04:00
Andrey Smirnov
86a0a7bdf7
refactor: use pointer types more in machine config structs
There should be no functional change with this PR.

The primary driver is supporting strategic merge configuration patches.
For such type of patches machine config should be loaded from incomplete
fragments, so it becomes critically important to distinguish between a
field having zero value vs. field being set in YAML.

E.g. with following struct:

```go
struct { AEnabled *bool `yaml:"a"` }
```

It's possible to distinguish between:

```yaml
a: false
```

and no metion of `a` in YAML.

Merging process trewats zero values as "not set" (skips them when
merging), so it's important to allow overriding value to explicit
`false`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-01 17:27:11 +04:00
Andrey Smirnov
1973095d14
feat: update containerd to 1.6.3
This includes a fix for image pull slowness from
https://github.com/containerd/containerd/pull/6702.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-26 21:43:28 +03:00
Noel Georgi
37f868e374
fix: validate empty TLS config for registries
Validate empty TLS config for registries

Fixes: #5262

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-03-31 14:31:59 +05:30
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
d4b8445935
feat: support CRI configuration merging and reimplement registry config
Containerd doesn't support merging plugin configuration from multiple
sources, and Talos has several pieces which configure CRI plugin:
(see https://github.com/containerd/containerd/issues/5837)

* base config
* registry mirror config
* system extensions
* ...

So we implement our own simple way of merging config parts (by simply
concatenating text files) to build a final `cri.toml`.

At the same time containerd migrated to a new format to specify registry
mirror configuration, while old way (via CRI config) is going to be
removed in 1.7.0. New way also allows to apply most of registry
configuration (except for auth) on the fly.

Also, containerd was updated to 1.6.0-rc.0 and runc to 1.1.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-01-20 23:05:20 +03:00
Andrey Smirnov
8f3e1a4ad6
fix: drop unpacked layers from containerd image store
See https://github.com/containerd/cri/pull/1543

Fixes #4274

Fix is applied on two levels:

* for Talos-initiated pulls, update API call
* for Kubernetes-initiated pulls, update CRI plugin config

Comparison of `/var` usage before/after, as reported by
`talosctl mounts` (in GiB):

|              | before | after |
|--------------|:------:|------:|
| controlplane |  1.98  |  1.74 |
| worker       |  1.17  |  1.01 |

It's hard to measure effect on pulls to system containerd, like
`installer` image, as it's ephemeral, but it should also reduce space
usage in `tmpfs`.

Also fixes output of `talosctl mounts`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-06 20:41:48 +03:00
Alexey Palazhchenko
e6f90bb41a
chore: remove unused parameters
That context is not actually used.

Discovered by new golangci-lint.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-09 12:47:35 +00:00
Andrey Smirnov
9bb0b79709
test: adapt tests to the cgroupsv2
When running with cgroupsv2 and the deeply nested nature of our CI, we
need to take extra steps to make sure tests are working fine.

Some tests were disabled under cgroupsv2 as I can't make them work.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-19 23:32:00 +03:00
Serge Logvinov
b9d04928d9 feat: move system processes to cgroups
* use cgroup v2
* cgroups: /init, /system, /system/runtime
* kubelet cgroup metrics

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
2021-08-04 09:00:38 -07:00
Serge Logvinov
d8602025c8 chore: update containerd config version 2
* Rename key cri -> io.containerd.grpc.v1.cri
* Disable plugins aufs,zfs,devmapper,btrfs (less warning messages on
  boot time)

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
2021-07-01 09:08:54 -07:00
Andrey Smirnov
5811f4dda1 feat: implement link (interface) controllers
The structure of the controllers is really similar to addresses and
routes:

* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state

Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-01 09:36:25 -07:00
Andrey Smirnov
95c656fb72 feat: update containerd to 1.5.0, runc to 1.0.0-rc94
Fixes #3538

See also talos-systems/pkgs#276

As new containerd is now Go module-based, it pulls many more
dependencies if simply imported in `go.mod`, so I had to replace the
reference to the constant in `pkg/machinery/` to `containerd` volume
with simple value to avoid pulling Kubernetes dependencies into
`pkg/machinery`.

Also updates the kernel to include PR talos-systems/pkgs#275 for AES-NI
support.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-11 14:43:27 -07:00
Andrey Smirnov
ac8764702f refactor: move apid, routerd, timed and trustd to single executable
This removes container images for the aforementioned services, they are
now built into `machined` executable which launches one or another
service based on `argv[0]`.

Containers are started with rootfs directory which contains only a
single executable file for the service.

This creates rootfs on squashfs for each container in
`/opt/<container>`.

Service `networkd` is not touched as it's handled in #3350.

This removes all the image imports, snapshots and other things which
were associated with the existing way to run containers.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-23 09:48:11 -07:00
Artem Chernyshev
22f375300c chore: update golanci-lint to 1.38.0
Fix all discovered issues.
Detected couple bugs, fixed them as well.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-12 06:50:02 -08:00
Andrey Smirnov
8e57fc4f52 fix: move containerd CRI config files under /var/
Talos user files task prohibits file creation under `/etc`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-12 02:22:17 -08:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
561f8aa15e fix: move etcd to cri containerd runner
This fixes a problem when Talos pulls `etcd` image one every reboot, as
`etcd` was running in the system containerd which is completely
ephemeral (backed by `tmpfs`).

Also skip pulling if image is already present and unpacked (same fix for
the `kubelet` image).

Fixes #3229

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 07:58:07 -08:00
Andrey Smirnov
c7ee239087 fix: show stopped/exited containers via CRI inspector
This fixes output of `talosctl containers` to show failed/exited
containers so that it's possible to see e.g. `kube-apiserver` container
when it fails to start. This also enables using ID from the container
list to see logs of failing containers, so it's easy to debug issues
when control plane pods don't start because of wrong configuration.

Also remove option to use either CRI or containerd inspector, default to
containerd for system namespace and to CRI for kubernetes namespace.

The only side effect is that we can't see `kubelet` container in the
output of `talosctl containers -k`, but `kubelet` itself is available in
`talosctl services` and `talosctl logs kubelet`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-26 14:45:13 -08:00
Andrey Smirnov
b2f6ce65ef refactor: remove setup goroutine in etcd service
Instead of running `PreFunc` in goroutine which might leak behind the
lifetime of the service `PreFunc`, add more clauses to correctly abort
sequence on context canceled.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-13 11:27:37 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrew Rynhard
562ab1d572 chore: update golangci-lint
Brings in the latest version of golangci-lint and addresses errors.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-02 20:34:05 -08:00
Andrey Smirnov
98443cd0e9 fix: retry container image import
This bug is sometimes reproducible with QEMU/arm64, as it runs really
slow. Looks like multiple concurrent image unpacks sharing some layers
might fail unexpectedly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-28 08:58:47 -07:00
Andrey Smirnov
8236822c90 fix: retry image pulling, stop on 404, no duplicate pulls
This uses go-retry feature
(https://github.com/talos-systems/go-retry/pull/3) to print errors being
retried.

If image is not found in the index, abort retries immediately.

Don't pull installer image twice (if already pulled by the validation
code before).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-22 07:07:45 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
52c5911fcd chore: extract pkg/crypto as external module
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 06:33:30 -07:00
Andrey Smirnov
9379cf9ee1 refactor: expose provision as public package
This change is only moving packages and updating import paths.

Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.

As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.

Other changes were direct dependencies discovered by `importvet` which
were updated.

Public packages (useful, general purpose packages with stable API):

* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`

Private packages (used only on provisioning library internally):

* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-12 05:12:05 -07:00
Andrew Rynhard
92523bc422 refactor: remove structs from config provider
This make the config provider a pure interface definition by removing
all concrete internal types, and making them an interface.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-08-06 13:21:41 -07:00
Andrey Smirnov
47608fb874 refactor: make pkg/config not rely on machined/../internal/runtime
This makes `pkg/config` directly importable from other projects.

There should be no functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 12:40:12 -07:00
Andrey Smirnov
41d5f7859a chore: update golangci-lint to 1.28.3
Fixes #2272

`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.

This contains fixes from:

* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 08:05:42 -07:00
Andrey Smirnov
219425f629 test: resolve old TODO item
I had to copy over some oci stuff from newer package version, but as we
for a long time use newer oic, we don't need a copy anymore.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-02 11:09:58 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrey Smirnov
4ad4511b38 chore: enable nolintlint linter
It makes sure our `//nolint:` directives are not redundant.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 07:39:19 -07:00
Andrey Smirnov
a9766d31bc refactor: implement LoggingManager as central log flow processor
Using this `LoggingManager` all the log flows (reading and writing) were
refactored. Inteface of `LoggingManager` should be now generic enough to
replace log handling with almost any implementation - log rotation,
sending logs to remote destination, keeping logs in memory, etc.

There should be no functional changes.

As part of changes, `follow.Reader` was implemented which makes
appending file feel like a stream. `file.NewChunker` was refactored to
use `follow.Reader` and `stream.NewChunker` to do the actual work. So
basically now we have only a single instance of chunker - stream
chunker, as everything is represented as a stream.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-10 14:30:36 -07:00
Andrey Smirnov
c06095f904 test: fix race in some tests caused by SetT
Looks like goroutine launched from suite setup might have a race while
trying to access methods which in the end try to load `testing.T` value,
as it changes while each individual test is running.

This leaves us with less diagnostics, but eliminates the race.

Sample:

```
WARNING: DATA RACE
Write at 0x00c00035e418 by goroutine 56:
  github.com/stretchr/testify/suite.(*Suite).SetT()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:37
        +0x12d
          github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetT()
        <autogenerated>:1 +0x4d
          github.com/stretchr/testify/suite.Run.func2()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:119
        +0x10f
          testing.tRunner()
        /toolchain/go/src/testing/testing.go:991 +0x1eb

        Previous read at 0x00c00035e418 by goroutine 40:
          github.com/stretchr/testify/suite.(*Suite).Require()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:42
        +0xdc
          github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetupSuite.func1()
        /src/internal/pkg/containers/containerd/containerd_test.go:119
        +0x101
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-09 15:22:58 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
a10acd592a chore: address random CI nits
This PR does the following:

- updates the conform config
- cleans up conform scopes
- moves slash commands to the talos-bot
- adds a check list to the pull request template
- disables codecov comments
- uses `BOT_TOKEN` so all actions are performed as the talos-bot user
- adds a `make conformance` target to make it easy for contributors to
check their commit before creating a PR
- bumps golangci-lint to v1.24.0

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-13 13:01:14 -07:00
Andrey Smirnov
5255883034 fix: make sure Close() is called on every path
For some places `.Close()` was clearly missing, for some of them I wanted
to be 200% sure it gets called on every code path.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-04-03 19:16:01 -04:00
Andrey Smirnov
cafd33acd8 fix: refresh proxy settings from environment in image resolver
Fixes #1901

This is same fix as #1680, #1690, but applied to image resolver code.
Default HTTP client can't be used here, as custom TLS client config
might be set on the transport to authenticate to the registry.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-20 09:46:08 -05:00
Andrey Smirnov
e1779ac77c feat: implement registry mirror & config for image pull
When images are pulled by Talos or via CRI plugin, configuration
for each registry is applied. Mirrors allow to redirect pull request to
either local registry or cached registry. Auth & TLS enable
authentication and TLS authentication for non-public registries.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-14 00:28:59 +03:00
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00
Andrew Rynhard
28782c2d46 fix: stop race condition between kubelet and networkd
The kubelet fails to start if a machine's hostname is not set. If
networkd doesn't set it in time, the kubelet service fails to start.
Addionally, this adds retries to container pulls to ensure that any
temporary network failures don't cause fatal errors if we can't pull
images.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-20 10:52:53 -05:00
Andrey Smirnov
6e05dd70c4 feat: add support for tailing logs
Fixes #1564

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-17 22:35:47 +03:00
Andrey Smirnov
edb40437ec feat: add support for osctl logs -f
Now default is not to follow the logs (which is similar to `kubectl logs`).

Integration test was added for `Logs()` API and `osctl logs` command.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 13:58:52 -08:00