23 Commits

Author SHA1 Message Date
Andrey Smirnov
457c97248d fix: ignore 'not found' errors when stopping/removing CRI pods
Fixes #2806

Also skips stopping pods which are already stopped from the previous run
with modes `POD`/`CONTAINER`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-18 22:41:23 +03:00
Andrey Smirnov
75817a58ef fix: stop CRI pods on upgrade with preserve
Talos always stops and removes CRI pods before stopping CRI containerd
when upgrading with wipe (force), but on "preserve" code paths pods were
never stopped (we can't remove them to keep preserve guarantees). This
PR makes sure pods are stopped on upgrade in any case.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-26 16:05:48 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
41d5f7859a chore: update golangci-lint to 1.28.3
Fixes #2272

`gofumpt` is now included into `golangci-lint`, but not the
`gofumports`, so we keep it using it as separate binary, but we keep
versions in sync with `golangci-lint`.

This contains fixes from:

* `gofumpt` (automated, mostly around octal constants)
* `exhaustive` in `switch` statements
* `noctx` (adding context with default timeout to http requests)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 08:05:42 -07:00
Andrey Smirnov
81d1c2bfe7 chore: enable godot linter
Issues were fixed automatically.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 10:39:56 -07:00
Andrey Smirnov
4ad4511b38 chore: enable nolintlint linter
It makes sure our `//nolint:` directives are not redundant.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-30 07:39:19 -07:00
Andrey Smirnov
a9766d31bc refactor: implement LoggingManager as central log flow processor
Using this `LoggingManager` all the log flows (reading and writing) were
refactored. Inteface of `LoggingManager` should be now generic enough to
replace log handling with almost any implementation - log rotation,
sending logs to remote destination, keeping logs in memory, etc.

There should be no functional changes.

As part of changes, `follow.Reader` was implemented which makes
appending file feel like a stream. `file.NewChunker` was refactored to
use `follow.Reader` and `stream.NewChunker` to do the actual work. So
basically now we have only a single instance of chunker - stream
chunker, as everything is represented as a stream.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-10 14:30:36 -07:00
Andrey Smirnov
c06095f904 test: fix race in some tests caused by SetT
Looks like goroutine launched from suite setup might have a race while
trying to access methods which in the end try to load `testing.T` value,
as it changes while each individual test is running.

This leaves us with less diagnostics, but eliminates the race.

Sample:

```
WARNING: DATA RACE
Write at 0x00c00035e418 by goroutine 56:
  github.com/stretchr/testify/suite.(*Suite).SetT()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:37
        +0x12d
          github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetT()
        <autogenerated>:1 +0x4d
          github.com/stretchr/testify/suite.Run.func2()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:119
        +0x10f
          testing.tRunner()
        /toolchain/go/src/testing/testing.go:991 +0x1eb

        Previous read at 0x00c00035e418 by goroutine 40:
          github.com/stretchr/testify/suite.(*Suite).Require()
        /go/pkg/mod/github.com/stretchr/testify@v1.5.1/suite/suite.go:42
        +0xdc
          github.com/talos-systems/talos/internal/pkg/containers/containerd_test.(*ContainerdSuite).SetupSuite.func1()
        /src/internal/pkg/containers/containerd/containerd_test.go:119
        +0x101
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-09 15:22:58 -07:00
Andrew Rynhard
fcaed8b0dd fix: don't proxy gRPC unix connections
The default gRPC dialer honors proxy environment variables, which causes
local unix socket connections to attempt to go through the proxy. This
fixes that by using a custom dialer.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-10 05:37:50 -08:00
Andrew Rynhard
fa515b8117 fix: kill POD network mode pods first on upgrades
When we upgrade a node, we kill off all pods before performing a fresh
install. The issue with this is that we run the risk of killing the CNI
pod before we finish killing all other pods, leaving the CRI unable to
teardown the pod's networking. This works around that by first killing
any pods running without host networking so that the CNI can do its'
job, and then removing the remaining pods.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-09 13:45:31 -08:00
Andrew Rynhard
d4c202438c refactor: set CRI config to /etc/cri/containerd.toml
This changes the CRI specific containerd instance's config to a
different path.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-04 19:32:00 -08:00
Andrew Rynhard
1d3cc0038b feat: use containerd-shim-runc-v2
This configures the CRI containerd to use containerd-shim-runc-v2.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-04 14:36:18 -08:00
Andrey Smirnov
5b7bea2471 feat: use grpc-proxy in apid
This replaces codegen version of apid proxying with
talos-systems/grpc-proxy based version. Proxying is transparent, it
doesn't require exact information about methods and response types. It
requires some common layout response to enhance it properly with node
metadata or errors.

There should be no signifcant changes to the API with the previous
version, but it's worth mentioning a few changes:

1. grpc.ClientConn is established just once per upstream (either local
service or remote apid instance).

2. When called without `-t` (`targets`), apid proxies immediately down
to local service skipping proxying to itself (as before), which results
in empty node metadata in response (before it had local node IP). Might
revert this later to proxy to itself (?).

3. Streaming APIs are now fully supported with multiple targets, but
message definition doesn't contain `ResponseMetadata`, so streaming APIs
are broken now with targets (needs a fix).

4. Errors are now returned as responses with `Error` field set in
`ResponseMetadata`, this requires client library update and `osctl` to
handle it properly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-29 22:57:25 +03:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrew Rynhard
4ae8186107 feat: add configurator interface
This moves from translating a config into an internal config
representation, to using an interface. The idea is that an interface
gives us stronger compile time checks, and will prevent us from having to copy
from on struct to another. As long as a concrete type implements the
Configurator interface, it can be used to provide instructions to Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-04 07:53:09 -07:00
Andrew Rynhard
2955428850 chore: format code with gofumpt
The gofumpt linter is a stricter drop-in replacement for gofmt. The
rules are ones that I strongly agree with and I think it would be better
if we added this linter instead of nit picking every PR.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-11 11:03:29 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrey Smirnov
f56a9d5b96 chore: implement first version of CRI runner
It runs containers via CRI interface in a pod sandbox. This is the very
first version:  I tried not to introduce any changes to common runner
interface.

There should be some CRI-speficic options for the runner (like polling
interval, as it doesn't have nice `Wait()` API), plus my plan so far is
to use OCI as the common layer for container options, so that we can
analyze OCI and translate to CRI (when possible, return errors when
option is not implemented).

CRI interface doesn't have a concept of 'unpacking' an image, so we
probably need to unpack via containerd API (or any other
runtime-specific API) by targeting CRI namespace.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-26 21:07:46 +03:00
Andrew Rynhard
8e8aae98dd feat: add machined
This commit splits our current init into init and machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-16 13:12:21 -07:00
Andrew Rynhard
1e9548d149 feat: use new pkgs for initramfs and rootfs
This brings in the newly compiled libraries and binaries from our new
pkg builds.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-15 10:32:29 -07:00
Andrey Smirnov
c10ef0f15a chore: extract CRI client as separate package
This is preparation for implementing CRI runner.

CRI client moved into its own package, I split it into multiple files
and added rudimentary tests for it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-11 01:52:19 +03:00