Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
While waiting for node ready condition, API server endpoint might return
networking errors (e.g. if endpoint is a RR DNS record).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.
Summary of changes:
* new `cluster` API
* `apid` now listens without auth on local file socket
* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate
* `talosctl health` by default now does server-side health check
UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
1. Increase retry timeout.
2. Use timeout per attempt.
3. Check for node readiness as a gate to succeed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Talos will mark node as schedulable if it was previously cordoned by
Talos (for upgrade, reset, etc.)
If user marked node as not schedulable, Talos won't change it on boot.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
There's a global timeout for all services to be up: it's 5 minutes. We
need to make sure each service startup takes less than that, otherwise
boot sequence is aborted and there's no way to see the error message for
each particular service.
Also propagate contexts correctly and set some default timeouts to make
sure API operations are not hanging forever.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR will pull in the latest release of k8s 1.18 so we can start
validating it through our test suite.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This PR cleans up the formatting for various package imports as they
were causing the linter to throw errors.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The `client.Creds` struct was not used very often, and made using the
`client.NewClient` function impossible to use in combination with the
`RemoteRenewingFileCertificateProvider`. This modifies
`client.NewClient` to accept a `tls.Config` instead of `client.Creds`,
allowing for the use of `RemoteRenewingFileCertificateProvider` with
`client.NewClient`.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This is a simple refactor that reduces the number of arguments required
by `NewTemporaryClientFromPKI`.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
When implementing the controller-manager I found a race condition between it
and the cordon operation. The controller-manager annotates the node to
indicate that an upgrade is in progress, and Talos tries to mark the
node as unschedulable at nearly the same time. This leads to a race
condition. The fix is to simply retry the cordon.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
The name helper isn't very good. This renames it to Client. A new func
was also added, NewForConfig, that will allow for the creation of the helper
client from an arbitrary Kubernetes REST config.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR moves to using the full URL for endpoint instead of trying to
hardcode 6443 in various places like we were doing.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This package provides a consistent way for us to retry arbitrary logic.
It provides the following backoff algorithms:
- exponential
- linear
- constant
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This change allows for discovery of the control plane IPs. The
motivation behind this is to remove the static IP requirement. The
endpoints are discovered by machined, and passed into OSD as arguments
in order to avoid the need to mount /var/lib/kubelet/pki.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Reading /proc/mounts while simultaneously unmounting mountpoints
prevents unmounting all submounts under /var. This is due to the fact
that /proc/mounts will change as we perform unmounts, and that causes a
read of the file to become inaccurate. We now read /proc/mounts into
memory to get a snapshot of all submounts under /var, and then we
proceed with unmounting them.
This also adds some additional logging that I found to be useful while
debugging this. It also adds logic to skip of DaemonSet managed pods.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>