51 Commits

Author SHA1 Message Date
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrew Rynhard
562ab1d572 chore: update golangci-lint
Brings in the latest version of golangci-lint and addresses errors.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-02 20:34:05 -08:00
Andrey Smirnov
98443cd0e9 fix: retry container image import
This bug is sometimes reproducible with QEMU/arm64, as it runs really
slow. Looks like multiple concurrent image unpacks sharing some layers
might fail unexpectedly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-28 08:58:47 -07:00
Andrey Smirnov
8236822c90 fix: retry image pulling, stop on 404, no duplicate pulls
This uses go-retry feature
(https://github.com/talos-systems/go-retry/pull/3) to print errors being
retried.

If image is not found in the index, abort retries immediately.

Don't pull installer image twice (if already pulled by the validation
code before).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-22 07:07:45 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
52c5911fcd chore: extract pkg/crypto as external module
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 06:33:30 -07:00
Andrey Smirnov
9379cf9ee1 refactor: expose provision as public package
This change is only moving packages and updating import paths.

Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.

As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.

Other changes were direct dependencies discovered by `importvet` which
were updated.

Public packages (useful, general purpose packages with stable API):

* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`

Private packages (used only on provisioning library internally):

* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-12 05:12:05 -07:00
Andrew Rynhard
92523bc422 refactor: remove structs from config provider
This make the config provider a pure interface definition by removing
all concrete internal types, and making them an interface.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-08-06 13:21:41 -07:00
Andrey Smirnov
47608fb874 refactor: make pkg/config not rely on machined/../internal/runtime
This makes `pkg/config` directly importable from other projects.

There should be no functional changes.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 12:40:12 -07:00
Andrey Smirnov
41d5f7859a chore: update golangci-lint to 1.28.3
Fixes #2272

`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.

This contains fixes from:

* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 08:05:42 -07:00
Andrey Smirnov
219425f629 test: resolve old TODO item
I had to copy over some oci stuff from newer package version, but as we
for a long time use newer oic, we don't need a copy anymore.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-02 11:09:58 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrey Smirnov
4ad4511b38 chore: enable nolintlint linter
It makes sure our `//nolint:` directives are not redundant.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 07:39:19 -07:00
Andrey Smirnov
a9766d31bc refactor: implement LoggingManager as central log flow processor
Using this `LoggingManager` all the log flows (reading and writing) were
refactored. Inteface of `LoggingManager` should be now generic enough to
replace log handling with almost any implementation - log rotation,
sending logs to remote destination, keeping logs in memory, etc.

There should be no functional changes.

As part of changes, `follow.Reader` was implemented which makes
appending file feel like a stream. `file.NewChunker` was refactored to
use `follow.Reader` and `stream.NewChunker` to do the actual work. So
basically now we have only a single instance of chunker - stream
chunker, as everything is represented as a stream.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-10 14:30:36 -07:00
Andrey Smirnov
c06095f904 test: fix race in some tests caused by SetT
Looks like goroutine launched from suite setup might have a race while
trying to access methods which in the end try to load `testing.T` value,
as it changes while each individual test is running.

This leaves us with less diagnostics, but eliminates the race.

Sample:

```
WARNING: DATA RACE
Write at 0x00c00035e418 by goroutine 56:
  github.com/stretchr/testify/suite.(*Suite).SetT()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:37
        +0x12d
          github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetT()
        <autogenerated>:1 +0x4d
          github.com/stretchr/testify/suite.Run.func2()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:119
        +0x10f
          testing.tRunner()
        /toolchain/go/src/testing/testing.go:991 +0x1eb

        Previous read at 0x00c00035e418 by goroutine 40:
          github.com/stretchr/testify/suite.(*Suite).Require()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:42
        +0xdc
          github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetupSuite.func1()
        /src/internal/pkg/containers/containerd/containerd_test.go:119
        +0x101
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-09 15:22:58 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
a10acd592a chore: address random CI nits
This PR does the following:

- updates the conform config
- cleans up conform scopes
- moves slash commands to the talos-bot
- adds a check list to the pull request template
- disables codecov comments
- uses `BOT_TOKEN` so all actions are performed as the talos-bot user
- adds a `make conformance` target to make it easy for contributors to
check their commit before creating a PR
- bumps golangci-lint to v1.24.0

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-13 13:01:14 -07:00
Andrey Smirnov
5255883034 fix: make sure Close() is called on every path
For some places `.Close()` was clearly missing, for some of them I wanted
to be 200% sure it gets called on every code path.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-04-03 19:16:01 -04:00
Andrey Smirnov
cafd33acd8 fix: refresh proxy settings from environment in image resolver
Fixes #1901

This is same fix as #1680, #1690, but applied to image resolver code.
Default HTTP client can't be used here, as custom TLS client config
might be set on the transport to authenticate to the registry.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-20 09:46:08 -05:00
Andrey Smirnov
e1779ac77c feat: implement registry mirror & config for image pull
When images are pulled by Talos or via CRI plugin, configuration
for each registry is applied. Mirrors allow to redirect pull request to
either local registry or cached registry. Auth & TLS enable
authentication and TLS authentication for non-public registries.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-14 00:28:59 +03:00
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00
Andrew Rynhard
28782c2d46 fix: stop race condition between kubelet and networkd
The kubelet fails to start if a machine's hostname is not set. If
networkd doesn't set it in time, the kubelet service fails to start.
Addionally, this adds retries to container pulls to ensure that any
temporary network failures don't cause fatal errors if we can't pull
images.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-20 10:52:53 -05:00
Andrey Smirnov
6e05dd70c4 feat: add support for tailing logs
Fixes #1564

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-17 22:35:47 +03:00
Andrey Smirnov
edb40437ec feat: add support for osctl logs -f
Now default is not to follow the logs (which is similar to `kubectl logs`).

Integration test was added for `Logs()` API and `osctl logs` command.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 13:58:52 -08:00
Andrew Rynhard
66f1355b10 chore: update containerd client version
This aligns the containerd version we use as a client witht the version
of the daemon.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-05 13:48:03 -08:00
Andrew Rynhard
d4c202438c refactor: set CRI config to /etc/cri/containerd.toml
This changes the CRI specific containerd instance's config to a
different path.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-04 19:32:00 -08:00
Andrew Rynhard
1d3cc0038b feat: use containerd-shim-runc-v2
This configures the CRI containerd to use containerd-shim-runc-v2.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-04 14:36:18 -08:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Andrey Smirnov
f48830e7db chore: attempt to avoid containerd shim socket conflicts in tests
I can't say how exactly those conflicts happen in the tests, but I tried
to randomize more container IDs and namespace names (which both feed
into final abstract unix socket path).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-23 19:33:55 +03:00
Andrey Smirnov
8a80712d9a chore: fix containerd test hanging
The problem was that if container fails to start, it never reaches
'StateRunning' and test hangs waiting for that state. Assertion doesn't
abort whole test (it only aborts goroutine it was called from), so this
doesn't help.

Fix that by signalling back if some containers fail to start.

This is not a fix, but it should expose the actual failure happening in
this test.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-22 16:59:21 -07:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrew Rynhard
fef151748b feat: use the unified pkgs repo artifacts
This moves to using a single revision of pkgs. It includes a few
changes:

- kernel with KVM host support
- containerd v1.3.0

This change brings in a kernel with host KVM support. This will allow us
to use VMs within Talos for things like integrations tests. This also
allows users to do things with KVM as they see fit.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 07:18:17 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrey Smirnov
bb5f5cc754 chore: bump golangci-lint to 1.20
Memory usage reduced around 8-10x: now it stays stable at 1GB.

I disabled some of the new linters, and one rule which is violated a
lot.

I might make sense to go back and enable `wsl` fixing all the issues
(leaving that for another PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-09 22:21:08 +03:00
Andrew Rynhard
4ae8186107 feat: add configurator interface
This moves from translating a config into an internal config
representation, to using an interface. The idea is that an interface
gives us stronger compile time checks, and will prevent us from having to copy
from on struct to another. As long as a concrete type implements the
Configurator interface, it can be used to provide instructions to Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-04 07:53:09 -07:00
Andrey Smirnov
7d8c40e3aa chore: randomize containerd namespace in tests
Looks like containerd creates shim file sockets in Linux abstract
namespace which are fixed (don't depend on containerd root directory)
and depend on container namespace and id. So if two containerd instances
on the same host run same namespace/id pair, that is going to create a
conflict on that shim filesocket.

Avoid that by randomizing namespace name. CRI tests should be fine as
namespace is fixed, but container ID is random.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-13 23:56:40 +03:00
Andrew Rynhard
2955428850 chore: format code with gofumpt
The gofumpt linter is a stricter drop-in replacement for gofmt. The
rules are ones that I strongly agree with and I think it would be better
if we added this linter instead of nit picking every PR.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-11 11:03:29 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrey Smirnov
f56a9d5b96 chore: implement first version of CRI runner
It runs containers via CRI interface in a pod sandbox. This is the very
first version:  I tried not to introduce any changes to common runner
interface.

There should be some CRI-speficic options for the runner (like polling
interval, as it doesn't have nice `Wait()` API), plus my plan so far is
to use OCI as the common layer for container options, so that we can
analyze OCI and translate to CRI (when possible, return errors when
option is not implemented).

CRI interface doesn't have a concept of 'unpacking' an image, so we
probably need to unpack via containerd API (or any other
runtime-specific API) by targeting CRI namespace.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-26 21:07:46 +03:00
Andrew Rynhard
8e8aae98dd feat: add machined
This commit splits our current init into init and machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-16 13:12:21 -07:00
Andrew Rynhard
1e9548d149 feat: use new pkgs for initramfs and rootfs
This brings in the newly compiled libraries and binaries from our new
pkg builds.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-15 10:32:29 -07:00
Andrey Smirnov
c10ef0f15a chore: extract CRI client as separate package
This is preparation for implementing CRI runner.

CRI client moved into its own package, I split it into multiple files
and added rudimentary tests for it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-11 01:52:19 +03:00
Andrey Smirnov
82fe5b55e5 chore: make unit-tests use isolated instances of containerd
This makes test launch their own isolated instance of containerd with
its own root/state directories and listening socket address. Each test
brings this instance up/down on its own.

Add options to override containerd address in the code (used only in the
tests).

Enable parallel go test runs once again.

P.S. I wish I could share that 'SetupSuite' phase across the tests, but
afaik there's no way in Go to share `_test.go` code across packages. If
we put it as normal package, this might pull in test dependencies (like
`testify`) into production code, which I don't like.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-10 19:46:32 +03:00
Andrey Smirnov
5d91d762ce feat(osd): implement container metrics for CRI inspector (#824)
This refactors metrics interface to remove containerd-specific stuff and
make it common for CRI & containerd.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-04 11:25:15 -07:00
Andrey Smirnov
237e903f91 feat(osd): implement CRI inspector for containers (#817)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-02 15:48:00 -07:00
Andrey Smirnov
89b876c676
fix: containers test by locking image to specific tag (#734)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-17 18:59:57 +03:00
Andrey Smirnov
070cbc9d60
refactor(osd): implement container inspector for a single container (#720)
Instead of pulling a full list of containers, implement inspector query
for a single container following the spec to build display name.

Also adds many more tests for the container inspector.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-17 17:54:28 +03:00
Andrey Smirnov
bf6ef7043c
chore: address flaky tests instability (#713)
For #711, this should be a complete fix - waiting for container to be
started.

For #712, this should be more of a workaround - playing with timeouts to
hit the failure less likely. Idea of the test is that health check
should be aborted on timeout (1ms) while health check succeeds if not
aborted in 50ms. Before the fix it was 1ms/10ms, but still concurrently
there was a chance that goroutine exits successfully after 10ms while
1ms context deadline is not reached.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-04 23:22:05 +03:00