129 Commits

Author SHA1 Message Date
Brad Beam
be4f7e1e6a chore: Rename maintainers channel
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-09-09 10:59:48 -05:00
Spencer Smith
8b019d8f33 chore: update provider-components for capi v0.1.9
This PR updates our e2e tests with the provider-components file that's
generated by our capi v0.1.9 update.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-09-06 22:45:44 -04:00
Spencer Smith
71cddfd30b fix: remove basic integration teardown
This was breaking e2e testing, as we depend on it for applying CAPI and
launching VMs from there.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-09-06 15:15:24 -05:00
Brad Beam
f03975bdc3 chore: Retry check for HA control plane
Think this was causing some of our flakeyness for this test

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-09-05 22:04:38 -05:00
Andrey Smirnov
7ab0f8a7f2 chore: enable unit-tests-race
This is experiment to see how stable they are.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-02 19:02:38 -07:00
Brad Beam
1373806165 fix(init): Enable containerd subreaper
Should take care of our issue with Zombies

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-30 14:32:13 -07:00
Andrey Smirnov
029374f07d chore: disable go test result cache
Go by default caches unit-tests results via build cache, so if source
code doesn't have any changes, test results are cached on package level.
As our unit-tests are not that pure and depend on the environment, it
would be more helpful to make sure all the unit-tests during each build.

Setting number of test runs to one disable test result cache (but build
cache is still being used).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-30 22:03:00 +03:00
Brad Beam
b1dc400fea chore: Fix azure image upload
Single quote causes variable to not be evaluated

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 20:38:30 -05:00
Brad Beam
9b91cd4511 chore: Clean up e2e scripts
- Use az/gcloud cli bundled with container
- Use consistent spacing in scripts ( 2 spaces vs tab )
- Updated count functions to handle the count inline
- Made platform kubeconfig the default

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 08:31:47 -05:00
Andrew Rynhard
bf8fc1dcbd chore: lint protobuf definitions
This adds linting to our protobuf definitions via prototool.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:12:36 -07:00
Andrew Rynhard
fd25c019bf chore: fix qemu-boot.sh
Fixes a typo that cased the switch statement to not match Linux
environments.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 13:24:24 -07:00
Andrew Rynhard
f5f6c29e99 chore: add QEMU script
This script will help in low-level development.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 00:56:12 -07:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Andrew Rynhard
0af1eba159 refactor: add more runtime modes
In order to DRY up all installation methods and mount methods, this PR
introduces a few more runtime modes. The modes are then used to
determine the strategy for creating and or mounting the paritions.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 20:23:45 -07:00
Andrew Rynhard
060498ec87 chore: disable CIS benchmarks
These are failing with false positives. Disable for now so that we can
run our conformance tests.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 11:04:15 -07:00
Brad Beam
af47edf1ad chore: Make losetup atomic during installation
This should fix a race conditions where two independent image creation steps
run `losetup -f` and discover the same 'next available' loopback device and
attempt to use it.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 15:23:42 -05:00
Andrew Rynhard
7970f977b7 chore: add markdownlint
This will give us a standard tool for linting Markdown files.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 03:53:52 -07:00
Spencer Smith
9d759df9bd chore: move to smaller azure instance type
This PR will save us a little dinero over the course of running e2e
builds in azure. It's only a couple cents per hour difference, but will
shave off a fair amount over the course of a month.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-16 09:46:17 -07:00
Andrew Rynhard
92452ab981 chore: remove sonobuoy spinner
This is only slowing down the build since we use a remote DB for drone.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-15 05:15:20 -07:00
Andrew Rynhard
48109e9757 chore: apply manifests when init node is ready
If we wait for all masters to check in before applying the PSP, we run
the risk of kube-proxy failing to start for a long period of time.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 20:28:34 -07:00
Andrew Rynhard
f18ecca50c chore: use go runner in sonobuoy
This is the recommended fix for waiting on conformance results. Sonobuoy
is returning early even though the --wait flag is specified.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-13 22:26:03 -07:00
Spencer Smith
57d22ef1bb chore: enable floating IP creation in e2e tests
This PR will edit the manifests for e2e so that we can take advantage of https://github.com/talos-systems/cluster-api-provider-talos/pull/47

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-13 15:23:28 -07:00
Andrew Rynhard
caa0354fe9 chore: fix drone clone
In order to use promotion against pull requests to trigger things like
E2E, we need to update the default clone logic. The issue is that a
promotion is assumed to be ran against a build that has been merged. In
our case, we need to promote builds that are not necessarily merged.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 20:33:29 -07:00
Andrew Rynhard
1956504bd4 chore: fix default pipeline
This prevents the default pipeline from running on releases. It also
ensures that the push step is executed on a release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 17:45:26 -07:00
Andrew Rynhard
e8355f07a0 chore: fix release pipeline
We should only use the "tag" event and remove the promotion event. It
seems like we can't have both.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 17:24:12 -07:00
Andrew Rynhard
a420b85b07 chore: run unique E2E tests
In order to run more than one instance of E2E testing at a time, we need
to ensure that all resources are unique to the run.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 14:14:08 -07:00
Andrew Rynhard
57db8a77b7 chore: exclude promotion event
We need to exclude the promotion event in a number of places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 11:43:38 -07:00
Andrew Rynhard
ac54a3cb86 chore: add ability to promote to a release
Although the GitHub release plug requires a tag and will fail on a
promotion, this is still useful as it will allow us to mimic a release
before we tag.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 11:51:53 -07:00
Andrew Rynhard
2ee769d19e chore: add image test step
Instead of building platform specific images in the default pipeline, we
should build just one image as part of our basic testing to make sure
installations work as expected.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 10:51:33 -07:00
Andrew Rynhard
c34ce3a4ed chore: reenable AMI publishing
This was removed during the refactor of our Drone file.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 10:07:57 -07:00
Andrew Rynhard
817380bad6 chore: refactor the Jsonnet file
This change improves the drone jsonnet file by making it more DRY and
structuring it in a way that makes it much easier to follow.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 09:23:30 -07:00
Andrew Rynhard
620efe52ef chore: fix push step dependencies
We should wait until basic integration is done.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-10 03:52:29 -07:00
Andrey Smirnov
ae54f7e40d fix: stalls in local Docker cluster boot
Problem was triggered by udevd trigger, root cause is not clear, but
workaround is to disable it for container mode.

Implement CPU/mem limits for `osctl cluster create`, apply defaults,
bump defaults for cicd.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-10 13:31:47 +03:00
Andrew Rynhard
b965239672 chore: fix clone logic
This is another attempt at fixing the clone logic to make it work when
building the master branch.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 23:04:43 -07:00
Andrew Rynhard
217b7e2f9d chore: fix broken clone
This fixes and issue with cloning the master branch caused by git
refusing to fetch into the current branch.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 22:44:54 -07:00
Andrew Rynhard
8786916fd0 chore: build drone YAML via jsonnet
This PR aims to DRY the drone config file by using Jsonnet to generate
it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 22:30:37 -07:00
Brad Beam
e60a57e186 chore: Fix up adhoc e2e tests
- Wait a little after cluster comes up
- Change interaction with CONFORMANCE variable to work around
  set -eou pipefail restrictions
- Set sonobouy runner version to latest to work with alpha
  version

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-09 13:55:14 -05:00
Brad Beam
bfc1646cd9 chore(ci): Add e2e promotion pipeline
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-08 11:27:57 -05:00
Spencer Smith
eea33a2254 chore: enable CIS testing in conformance runs
This PR will run through the kube-bench tests as part of our nightly
conformance runs

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 17:06:03 -04:00
Spencer Smith
902577b4dc feat: upgrade kubernetes to v1.16.0-alpha.3
This PR updates the kubernetes version constant, as well as pulls in the
new kubeadm image with the last alpha of v1.16.0 baked in. Additionally,
moves the CNI daemon sets to apps/v1, since they're now out of beta.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 16:05:07 -04:00
Spencer Smith
9e02c77c0a chore: add azure e2e testing
This PR will allow us to run an azure e2e test in parallel with our
current GCE implementation.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 12:16:32 -04:00
Andrey Smirnov
71640662e0 chore(init): rearrange phase handling to push shutdown to main
This re-arranges phases a bit so that shutdown actions are pushed back
to the top-level main.go of machined.

Small rudimentary event.Bus is introduce to facilitate event passing
(shutdown/restart) between various machined components and main.go. This
might be not the best implementation, just something to allow this
message passing without global variables or such.

Machined API was refactored to run as goroutine service.

ACPI & signal handlers re-built as phase tasks, and activated for
non-container, container modes respectively.

As part of the fix, now `docker stop` triggers correct shutdown of Talos
(not a big deal, but good for testing).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 08:42:12 -07:00
Spencer Smith
38dfddbab3 feat: break up osctl cluster create and basic/e2e tests
This PR will break cluster create apart from the other steps in
integration tests. It will allow us to run the cluster create, then use
it for parallel e2e builds in different cloud environments.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-01 10:55:24 -04:00
Andrey Smirnov
587011e250 chore: remove hack/dev/ scripts & docker-compose
They are outdated, `osctl cluster` implements cluster up/down in a
better way. K8s manifests are left intact, they are used in integration
tests.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-30 00:47:58 +03:00
Andrew Rynhard
e63c882b89 refactor: split machined into phases
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 12:40:03 -07:00
Andrew Rynhard
6852fa969f chore: create raw image as sparse file
This change reduces the size of raw disk significantly by creating it as
a sparse file.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 11:28:07 -07:00
Andrew Rynhard
0ec17e4169 feat: run rootfs from squashfs
This change moves the rootfs to a squashfs image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 08:38:31 -07:00
Andrey Smirnov
8c59adb9dc chore: allow to run tests only for specified packages
This allows to do `make test TESTPKGS=./internal/app/machined`.

Also update Dockerfile slug as
https://github.com/moby/buildkit/pull/1081 was merged into master.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-23 22:17:22 +03:00
Spencer Smith
089890f36b chore: setup gce for e2e builds
This PR will provide a basis for running e2e tests on GCE several times
a day. We'll need to add a cron event to the drone repo once merged.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-07-22 12:46:02 -04:00
Andrey Smirnov
9f9acf1f05 chore: run tests in the buildkit itself
This relies on two PRs to the buildkit:

* https://github.com/moby/buildkit/pull/1081
* https://github.com/moby/buildkit/pull/1085

Sysfs fix was merged to upstream, so updated tag, while using
`Dockerfile` slug I can switch to dockerfile2llb with support for
`--security=insecure`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-19 07:53:49 -07:00