Commit Graph

32 Commits

Author SHA1 Message Date
Andrey Smirnov
28ba6e416e feat: update Kubernetes to v1.20.0-beta.2
Talos 0.8 is going to ship with K8s 1.20.x.

Changes to support new `control-plane` label,
upgrade-k8s supports automated fixups for 1.20.

See also: https://github.com/talos-systems/bootkube-plugin/pull/22

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-25 06:39:14 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Artem Chernyshev
9c969a4be5 feat: allow disabling NoSchedule on master nodes
Add talosconfig parameter that allows to disable NoSchedule taint on
master nodes.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-06 10:52:37 -07:00
Andrey Smirnov
788cd15c29 test: add e2e test to the provision (upgrade) tests
Add sonobuoy runner code with log fetching on failure. Use hand-picked
set of e2e tests to run: verify basic pod functionality, verify service
connectivity.

Add option `--run-e2e` to the `talosctl health` to run quick e2e test to
verify cluster health.

Add option to run provision tests with custom CNI, run one track of
provision tests with Cilium.

Bump Cilium to 1.8.2.

Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as
0.5 doesn't put annotation. Workaround that in upgrade tests.

Bump upgrade test version to 0.6.0 release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-08 13:26:31 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
52c5911fcd chore: extract pkg/crypto as external module
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 06:33:30 -07:00
Andrey Smirnov
b110a9fa4d fix: retry non-HTTP errors from API server
While waiting for node ready condition, API server endpoint might return
networking errors (e.g. if endpoint is a RR DNS record).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-10 07:26:52 -07:00
Andrey Smirnov
3926442704 feat: taint master nodes with NoSchedule taint
Fixes #2350

This also brings in a fix for `coredns` tolerations from
https://github.com/talos-systems/bootkube-plugin/pull/19.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 14:02:41 -07:00
Andrey Smirnov
c54639e541 feat: implement server-side API for cluster health checks
This implements existing server-side health checks as defined in
`internal/pkg/cluster/checks` in Talos API.

Summary of changes:

* new `cluster` API

* `apid` now listens without auth on local file socket

* `cluster` API is for now implemented in `machined`, but we can move it
to the new service if we find it more appropriate

* `talosctl health` by default now does server-side health check

UX: `talosctl health` without arguments does health check for the
cluster if it has healthy K8s to return master/worker nodes. If needed,
node list can be overridden with flags.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-15 13:52:13 -07:00
Andrey Smirnov
804f162756 fix: improve node uncordon tasks
1. Increase retry timeout.

2. Use timeout per attempt.

3. Check for node readiness as a gate to succeed.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-10 09:26:47 -07:00
Andrey Smirnov
a4a2a3c83a feat: uncordon nodes automatically on boot
Talos will mark node as schedulable if it was previously cordoned by
Talos (for upgrade, reset, etc.)

If user marked node as not schedulable, Talos won't change it on boot.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-09 15:32:36 -07:00
Andrey Smirnov
ddbe9cfc2f fix: update timeouts on service startup to match boot timeout
There's a global timeout for all services to be up: it's 5 minutes. We
need to make sure each service startup takes less than that, otherwise
boot sequence is aborted and there's no way to see the error message for
each particular service.

Also propagate contexts correctly and set some default timeouts to make
sure API operations are not hanging forever.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-08 07:39:36 -07:00
Spencer Smith
3a4eaeeef0 feat: upgrade kubernetes to 1.18
This PR will pull in the latest release of k8s 1.18 so we can start
validating it through our test suite.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-26 14:59:43 -04:00
Spencer Smith
fa82454be4 chore: fix formatting of imports
This PR cleans up the formatting for various package imports as they
were causing the linter to throw errors.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-19 15:06:05 -04:00
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00
Andrew Rynhard
f3623d22b0 refactor: use tls.Config as client credentials
The `client.Creds` struct was not used very often, and made using the
`client.NewClient` function impossible to use in combination with the
`RemoteRenewingFileCertificateProvider`. This modifies
`client.NewClient` to accept a `tls.Config` instead of `client.Creds`,
allowing for the use of `RemoteRenewingFileCertificateProvider` with
`client.NewClient`.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-21 17:10:07 -08:00
Andrew Rynhard
3e5ca30aa5 refactor: simplify NewTemporaryClientFromPKI
This is a simple refactor that reduces the number of arguments required
by `NewTemporaryClientFromPKI`.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-03 09:10:24 -08:00
Andrew Rynhard
6a1a9fc8d9 fix: retry cordon and uncordon
When implementing the controller-manager I found a race condition between it
and the cordon operation. The controller-manager annotates the node to
indicate that an upgrade is in progress, and Talos tries to mark the
node as unschedulable at nearly the same time. This leads to a race
condition. The fix is to simply retry the cordon.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-16 11:15:22 -08:00
Andrew Rynhard
03a09c2294 refactor: rename Helper to Client
The name helper isn't very good. This renames it to Client. A new func
was also added, NewForConfig, that will allow for the creation of the helper
client from an arbitrary Kubernetes REST config.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-04 19:31:27 -08:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Spencer Smith
d0111fe617 feat: allow specifcation of full url for endpoint
This PR moves to using the full URL for endpoint instead of trying to
hardcode 6443 in various places like we were doing.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-10-16 13:45:05 -04:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Brad Beam
d3f20db0aa fix: Use correct names for kubelet config
With the change to bootkube, kubelet.conf has changed names and is now kubelet-kubeconfig.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-11 07:42:32 -07:00
Andrew Rynhard
92de30715e feat: add retry package
This package provides a consistent way for us to retry arbitrary logic.
It provides the following backoff algorithms:

- exponential
- linear
- constant

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-10 13:11:02 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrew Rynhard
b29391f0be feat: use bootkube for cluster creation
This replaces kubeadm with bootkube.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-07 17:17:57 -07:00
Andrew Rynhard
e8dbf108e2 feat: add etcd service
This allows users to create an etcd service using the host init system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-03 12:54:19 -07:00
Andrew Rynhard
9e9154b8f5 feat: discover control plane endpoints via Kubernetes
This change allows for discovery of the control plane IPs. The
motivation behind this is to remove the static IP requirement. The
endpoints are discovered by machined, and passed into OSD as arguments
in order to avoid the need to mount /var/lib/kubelet/pki.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-30 11:39:24 -07:00
Andrew Rynhard
37a8ce78ae fix: prevent EBUSY when unmounting system disk
Reading /proc/mounts while simultaneously unmounting mountpoints
prevents unmounting all submounts under /var. This is due to the fact
that /proc/mounts will change as we perform unmounts, and that causes a
read of the file to become inaccurate. We now read /proc/mounts into
memory to get a snapshot of all submounts under /var, and then we
proceed with unmounting them.

This also adds some additional logging that I found to be useful while
debugging this. It also adds logic to skip of DaemonSet managed pods.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-06 05:05:59 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00