156 Commits

Author SHA1 Message Date
Brad Beam
e6bf92ce31 feat(osd): Enable hitting multiple OSD endpoints
This enables the ability to specify additional <talos> endpoints to connect to
to pull back data.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-16 15:30:25 -05:00
Andrew Rynhard
6c33547452 fix: add slub_debug=P to ISO kernel args
This option is required by KSPP.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 10:57:56 -07:00
Andrew Rynhard
792a35e8ae fix: use talos.config instead of talos.userdata
The new kernel parameter talos.config should be used instead of
tallos.userdata.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 10:44:59 -07:00
Andrew Rynhard
80e3876df5 feat: remove proxyd
We have decided that proxyd is not the best architectue for HA
Kubernetes. Our recommendation to users will be to create a load
balancer instead.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 08:11:00 -07:00
Andrew Rynhard
fef151748b feat: use the unified pkgs repo artifacts
This moves to using a single revision of pkgs. It includes a few
changes:

- kernel with KVM host support
- containerd v1.3.0

This change brings in a kernel with host KVM support. This will allow us
to use VMs within Talos for things like integrations tests. This also
allows users to do things with KVM as they see fit.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 07:18:17 -07:00
Spencer Smith
5d5f530bb0 chore: update sonobuoy for conformance tests
This PR updates the sonobuoy version. We're currently running
conformance tests with 0.15.x

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-10-10 18:26:05 -07:00
Spencer Smith
313ca2cb23 chore: re-enable end to end tests
This PR will add the bits necessary to make use of changes to our
v1alpha1 cluster api provider for CI testing. This is needed since we've
had machine config changes.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-10-10 17:32:44 -04:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrey Smirnov
bb5f5cc754 chore: bump golangci-lint to 1.20
Memory usage reduced around 8-10x: now it stays stable at 1GB.

I disabled some of the new linters, and one rule which is violated a
lot.

I might make sense to go back and enable `wsl` fixing all the issues
(leaving that for another PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-09 22:21:08 +03:00
Andrew Rynhard
4454afef2f feat: default docker based cluster to 1 master
The goal with the docker based cluster is to provide developers with an
easy way to run Kubernetes on their local machines. Most of the time,
they won't need more than 1 master. This defaults cluster creation to 1
master.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-08 19:04:54 -07:00
Andrew Rynhard
b29391f0be feat: use bootkube for cluster creation
This replaces kubeadm with bootkube.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-07 17:17:57 -07:00
Andrew Rynhard
4ae8186107 feat: add configurator interface
This moves from translating a config into an internal config
representation, to using an interface. The idea is that an interface
gives us stronger compile time checks, and will prevent us from having to copy
from on struct to another. As long as a concrete type implements the
Configurator interface, it can be used to provide instructions to Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-04 07:53:09 -07:00
Seán C McCord
5686ba2db3 feat: Allow env override of hack/qemu image location
This fixes #1220

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-09-29 07:10:20 -07:00
Andrew Rynhard
27adda4d9d chore: use the official Drone git plugin
The changes we needed in the clone plugin have been merged. We should
use the official plugin to minimize what we have to maintain.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-23 22:45:31 -07:00
Andrew Rynhard
82c706a0fb feat: upgrade Kubernetes to v1.16.0
Brings in Kubernetes v1.16.0.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 20:19:29 -07:00
Andrew Rynhard
6efd6fbe08 chore: move gRPC API to public
In order for other projects to make use of our APIs, they must not
reside underneath the internal directory. This moves the protobuf
definitions to a top-level "api" directory and scopes them according to
their domain. This change also removes generated code from the gitignore
file so that users don't have to generate the code themseleves.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 08:55:13 -07:00
Andrew Rynhard
20302eb8f6 chore: fix AWS image dependency
We no longer need to wait for the installer image to be pushed before
creating the AWS image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-17 21:12:03 -07:00
Andrew Rynhard
472f1aa6e8 chore: upgrade Sonobuoy to v0.15.4
This version has a fix for a bug that is affecting us.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-17 14:52:10 -07:00
Andrew Rynhard
3e62973b2c chore: upgrade conformange image
This upgrade the kube-conformance image used by sonobouy to
v1.16.0-rc.2.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-16 16:05:24 -07:00
Andrew Rynhard
ab4e058489 feat: upgrade Kubernetes to v1.16.0-rc.2
This brings in the release candidate for Kubernetes v1.16.0.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-16 14:56:55 -07:00
Andrew Rynhard
75746266ce feat: upgrade Kubernetes to v1.16.0-rc.1
This brings in the latest RC of 1.16.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-12 20:20:48 -07:00
Andrey Smirnov
980829708e chore: upgrade golancgi-lint to 1.18.0
New linter 'funlen' was disabled as too many functions break the default
limit, but might be considered for the future.

To limit peak memory usage, `GOGC=50` was added to the golangci-lint run
to make Go's garbage collector more aggressive. With this setting peak
seems to be around 8Gb.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-11 15:18:57 -07:00
Andrew Rynhard
298ddc8f49 fix: enable slub_debug=P
This is the last KSPP kernel parameter we need to be compliant with KSPP
guidelines.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-10 10:53:19 -07:00
Andrew Rynhard
38690d72df chore: remove unneeded packages
This removes packages we don't need anymore.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-10 08:12:07 -07:00
Andrew Rynhard
e48cee6343 chore: remove existing AMI
We need to remove an exiting AMI, if it exists, in order to create a new
one with the same name.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-10 04:52:43 -07:00
Andrew Rynhard
44dd2fc7c9 chore: remove packer from installer
This moves to making AWS releases align with Azure, and GCP. We no
longer need packer since we will now release an artifact that users can
import.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-09 18:54:37 -07:00
Brad Beam
f21d1244bd test(ci): Add aws for e2e and conformance targets
Add additional scripts and steps to enable doing tests against aws.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-09-09 13:56:19 -05:00
Brad Beam
be4f7e1e6a chore: Rename maintainers channel
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-09-09 10:59:48 -05:00
Spencer Smith
8b019d8f33 chore: update provider-components for capi v0.1.9
This PR updates our e2e tests with the provider-components file that's
generated by our capi v0.1.9 update.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-09-06 22:45:44 -04:00
Spencer Smith
71cddfd30b fix: remove basic integration teardown
This was breaking e2e testing, as we depend on it for applying CAPI and
launching VMs from there.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-09-06 15:15:24 -05:00
Brad Beam
f03975bdc3 chore: Retry check for HA control plane
Think this was causing some of our flakeyness for this test

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-09-05 22:04:38 -05:00
Andrey Smirnov
7ab0f8a7f2 chore: enable unit-tests-race
This is experiment to see how stable they are.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-02 19:02:38 -07:00
Brad Beam
1373806165 fix(init): Enable containerd subreaper
Should take care of our issue with Zombies

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-30 14:32:13 -07:00
Andrey Smirnov
029374f07d chore: disable go test result cache
Go by default caches unit-tests results via build cache, so if source
code doesn't have any changes, test results are cached on package level.
As our unit-tests are not that pure and depend on the environment, it
would be more helpful to make sure all the unit-tests during each build.

Setting number of test runs to one disable test result cache (but build
cache is still being used).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-30 22:03:00 +03:00
Brad Beam
b1dc400fea chore: Fix azure image upload
Single quote causes variable to not be evaluated

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 20:38:30 -05:00
Brad Beam
9b91cd4511 chore: Clean up e2e scripts
- Use az/gcloud cli bundled with container
- Use consistent spacing in scripts ( 2 spaces vs tab )
- Updated count functions to handle the count inline
- Made platform kubeconfig the default

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 08:31:47 -05:00
Andrew Rynhard
bf8fc1dcbd chore: lint protobuf definitions
This adds linting to our protobuf definitions via prototool.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:12:36 -07:00
Andrew Rynhard
fd25c019bf chore: fix qemu-boot.sh
Fixes a typo that cased the switch statement to not match Linux
environments.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 13:24:24 -07:00
Andrew Rynhard
f5f6c29e99 chore: add QEMU script
This script will help in low-level development.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 00:56:12 -07:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Andrew Rynhard
0af1eba159 refactor: add more runtime modes
In order to DRY up all installation methods and mount methods, this PR
introduces a few more runtime modes. The modes are then used to
determine the strategy for creating and or mounting the paritions.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 20:23:45 -07:00
Andrew Rynhard
060498ec87 chore: disable CIS benchmarks
These are failing with false positives. Disable for now so that we can
run our conformance tests.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 11:04:15 -07:00
Brad Beam
af47edf1ad chore: Make losetup atomic during installation
This should fix a race conditions where two independent image creation steps
run `losetup -f` and discover the same 'next available' loopback device and
attempt to use it.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 15:23:42 -05:00
Andrew Rynhard
7970f977b7 chore: add markdownlint
This will give us a standard tool for linting Markdown files.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 03:53:52 -07:00
Spencer Smith
9d759df9bd chore: move to smaller azure instance type
This PR will save us a little dinero over the course of running e2e
builds in azure. It's only a couple cents per hour difference, but will
shave off a fair amount over the course of a month.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-16 09:46:17 -07:00
Andrew Rynhard
92452ab981 chore: remove sonobuoy spinner
This is only slowing down the build since we use a remote DB for drone.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-15 05:15:20 -07:00
Andrew Rynhard
48109e9757 chore: apply manifests when init node is ready
If we wait for all masters to check in before applying the PSP, we run
the risk of kube-proxy failing to start for a long period of time.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 20:28:34 -07:00
Andrew Rynhard
f18ecca50c chore: use go runner in sonobuoy
This is the recommended fix for waiting on conformance results. Sonobuoy
is returning early even though the --wait flag is specified.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-13 22:26:03 -07:00
Spencer Smith
57d22ef1bb chore: enable floating IP creation in e2e tests
This PR will edit the manifests for e2e so that we can take advantage of https://github.com/talos-systems/cluster-api-provider-talos/pull/47

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-13 15:23:28 -07:00
Andrew Rynhard
caa0354fe9 chore: fix drone clone
In order to use promotion against pull requests to trigger things like
E2E, we need to update the default clone logic. The issue is that a
promotion is assumed to be ran against a build that has been merged. In
our case, we need to promote builds that are not necessarily merged.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 20:33:29 -07:00