9 Commits

Author SHA1 Message Date
Andrey Smirnov
1a85c14a51 fix: avoid data race on CRI pod stop
Message:

```
WARNING: DATA RACE
Write at 0x00c000a9d4d0 by goroutine 154:
  github.com/talos-systems/talos/internal/pkg/cri.stopAndRemove.func1()
      /src/internal/pkg/cri/pods.go:188 +0x312
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /.cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x94

Previous write at 0x00c000a9d4d0 by goroutine 276:
  github.com/talos-systems/talos/internal/pkg/cri.stopAndRemove.func1()
      /src/internal/pkg/cri/pods.go:188 +0x312
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /.cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x94

```

This bug might lead to incorrect pod stop handling at worst.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-06 07:35:29 -07:00
Alexey Palazhchenko
df52c13581 chore: fix //nolint directives
That's the recommended syntax:
https://golangci-lint.run/usage/false-positives/

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-05 05:58:33 -08:00
Andrey Smirnov
f1964aab53 fix: ignore errors on stopping/removing pod sandboxes
Talos stops CRI pods and containers before upgrade to make sure
ephemeral partition is not mounted anymore. At the same time with
different CNIs it's frequent that removing stop sandbox might fail
because of CNI teardown issue (dependency on API server being up, for
example). As upgrade only depends on volume mounts and doesn't require
CNI to be stopped, we can ignore such errors.

Plus installer anyway does mount check across all mount namespaces, so
it will abort if ephemeral partition is still mounted.

Fixes #2974

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-22 06:33:14 -08:00
Andrey Smirnov
457c97248d fix: ignore 'not found' errors when stopping/removing CRI pods
Fixes #2806

Also skips stopping pods which are already stopped from the previous run
with modes `POD`/`CONTAINER`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-18 22:41:23 +03:00
Andrey Smirnov
75817a58ef fix: stop CRI pods on upgrade with preserve
Talos always stops and removes CRI pods before stopping CRI containerd
when upgrading with wipe (force), but on "preserve" code paths pods were
never stopped (we can't remove them to keep preserve guarantees). This
PR makes sure pods are stopped on upgrade in any case.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-26 16:05:48 -07:00
Andrew Rynhard
fa515b8117 fix: kill POD network mode pods first on upgrades
When we upgrade a node, we kill off all pods before performing a fresh
install. The issue with this is that we run the risk of killing the CNI
pod before we finish killing all other pods, leaving the CRI unable to
teardown the pod's networking. This works around that by first killing
any pods running without host networking so that the CNI can do its'
job, and then removing the remaining pods.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-09 13:45:31 -08:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrey Smirnov
c10ef0f15a chore: extract CRI client as separate package
This is preparation for implementing CRI runner.

CRI client moved into its own package, I split it into multiple files
and added rudimentary tests for it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-11 01:52:19 +03:00