Message:
```
WARNING: DATA RACE
Write at 0x00c000a9d4d0 by goroutine 154:
github.com/talos-systems/talos/internal/pkg/cri.stopAndRemove.func1()
/src/internal/pkg/cri/pods.go:188 +0x312
golang.org/x/sync/errgroup.(*Group).Go.func1()
/.cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x94
Previous write at 0x00c000a9d4d0 by goroutine 276:
github.com/talos-systems/talos/internal/pkg/cri.stopAndRemove.func1()
/src/internal/pkg/cri/pods.go:188 +0x312
golang.org/x/sync/errgroup.(*Group).Go.func1()
/.cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x94
```
This bug might lead to incorrect pod stop handling at worst.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos stops CRI pods and containers before upgrade to make sure
ephemeral partition is not mounted anymore. At the same time with
different CNIs it's frequent that removing stop sandbox might fail
because of CNI teardown issue (dependency on API server being up, for
example). As upgrade only depends on volume mounts and doesn't require
CNI to be stopped, we can ignore such errors.
Plus installer anyway does mount check across all mount namespaces, so
it will abort if ephemeral partition is still mounted.
Fixes#2974
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2806
Also skips stopping pods which are already stopped from the previous run
with modes `POD`/`CONTAINER`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos always stops and removes CRI pods before stopping CRI containerd
when upgrading with wipe (force), but on "preserve" code paths pods were
never stopped (we can't remove them to keep preserve guarantees). This
PR makes sure pods are stopped on upgrade in any case.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
When we upgrade a node, we kill off all pods before performing a fresh
install. The issue with this is that we run the risk of killing the CNI
pod before we finish killing all other pods, leaving the CRI unable to
teardown the pod's networking. This works around that by first killing
any pods running without host networking so that the CNI can do its'
job, and then removing the remaining pods.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This is preparation for implementing CRI runner.
CRI client moved into its own package, I split it into multiple files
and added rudimentary tests for it.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>