979 Commits

Author SHA1 Message Date
Brad Beam
bfc1646cd9 chore(ci): Add e2e promotion pipeline
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-08 11:27:57 -05:00
Spencer Smith
c03e4f850c chore: re-add github actions
This PR will hopefully re-enable the github actions for conform to work
as expected. 🤞

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-08 11:56:54 -04:00
Spencer Smith
e88d908f07 chore: delete github actions temporarily
This PR will drop the .github directory in an effort to clean things up
so we can add it back and get conform acting correctly.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-08 11:51:58 -04:00
Andrew Rynhard
1df4690db3 chore: set docker server entrypoint to dockerd to avoid TLS generation
As of the latest DIND images, TLS certificates are generated by default.
This change bypasses the TLS generation.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-07 15:10:57 -07:00
Spencer Smith
eea33a2254 chore: enable CIS testing in conformance runs
This PR will run through the kube-bench tests as part of our nightly
conformance runs

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 17:06:03 -04:00
Spencer Smith
902577b4dc feat: upgrade kubernetes to v1.16.0-alpha.3
This PR updates the kubernetes version constant, as well as pulls in the
new kubeadm image with the last alpha of v1.16.0 baked in. Additionally,
moves the CNI daemon sets to apps/v1, since they're now out of beta.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 16:05:07 -04:00
Spencer Smith
9e02c77c0a chore: add azure e2e testing
This PR will allow us to run an azure e2e test in parallel with our
current GCE implementation.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 12:16:32 -04:00
Brad Beam
53b1330c44 fix(initramfs): Allow data partition to grow
This fix ensures that we always grow the data partition during an installation.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-07 09:11:02 -05:00
Spencer Smith
ec3c77d863 feat: bump k8s version to v1.15.2
This PR will bump the hyperkube version so that we've got fixes for some
pretty critical CVEs: CVE-2019-11247 and CVE-2019-11249

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-06 15:56:18 -04:00
Andrey Smirnov
80f2d62958 chore: stabilize one more health test
Same approach: attempt more retries to fight general slowness/resource
starvation.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-06 02:45:00 +03:00
Andrew Rynhard
719afb56bd chore: prepare release v0.2.0-alpha.5
This is the official v0.2.0-alpha.5 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
v0.2.0-alpha.5
2019-08-05 12:04:45 -07:00
Andrey Smirnov
2f0698def2 chore: stabilize health test
It was failing randomly due to Sleep being insufficient for the desired
condition being reached.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 14:04:03 -07:00
Andrey Smirnov
8362f58e7a chore: fix data race in goroutine runner
Discovered with `go test -race`:

```
WARNING: DATA RACE
Read at 0x00c0000cf2f8 by goroutine 25:
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine.(*goroutineRunner).Stop()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine.go:111 +0x3e
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine_test.(*GoroutineSuite).TestStop()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine_test.go:115 +0x345
  runtime.call32()
      /usr/local/go/src/runtime/asm_amd64.s:519 +0x3a
  reflect.Value.Call()
      /usr/local/go/src/reflect/value.go:308 +0xc0
  github.com/stretchr/testify/suite.Run.func2()
      /home/smira/Documents/go/pkg/mod/github.com/stretchr/testify@v1.3.1-0.20190311161405-34c6fa2dc709/suite/suite.go:133 +0x2ec
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163

Previous write at 0x00c0000cf2f8 by goroutine 26:
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine.(*goroutineRunner).Run()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine.go:65 +0xcb
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine_test.(*GoroutineSuite).TestStop.func3()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine_test.go:104 +0x4a
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 14:03:18 -07:00
Andrey Smirnov
37c1703f06 chore: add tests for event.Bus
Small tests to make sure code works as expected.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 14:02:18 -07:00
Andrey Smirnov
71640662e0 chore(init): rearrange phase handling to push shutdown to main
This re-arranges phases a bit so that shutdown actions are pushed back
to the top-level main.go of machined.

Small rudimentary event.Bus is introduce to facilitate event passing
(shutdown/restart) between various machined components and main.go. This
might be not the best implementation, just something to allow this
message passing without global variables or such.

Machined API was refactored to run as goroutine service.

ACPI & signal handlers re-built as phase tasks, and activated for
non-container, container modes respectively.

As part of the fix, now `docker stop` triggers correct shutdown of Talos
(not a big deal, but good for testing).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 08:42:12 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrew Rynhard
a9c4a95a4b fix: mount the owned partitions in cloud platforms
This adds the logic for mounting the owned block device and resizing the
ephemeral partition for cloud platforms.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 21:48:23 -07:00
Andrew Rynhard
ca35b85300 refactor: improve installation reliability
This change aims to make installations more unified and reliable. It
introduces the concept of a mountpoint manager that is capable of
mounting, unmounting, and moving a set of mountpoints in the correct
order.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 11:44:40 -07:00
Andrey Smirnov
9c63f4ed0a feat(init): implement complete API for service lifecycle (start/stop)
It is now possible to `start`/`stop`/`restart` any service via `osctl`
commands.

There are some changes in `ServiceRunner` to support re-use (re-entering
running state). `Services` singleton now tracks service running state to
avoid calling `Start()` on already running `ServiceRunner` instance.
Method `Start()` was renamed to `LoadAndStart()` to break up service
loading (adding to the list of service) and actual service start.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-01 11:16:57 -07:00
Andrew Rynhard
91ac1d7a8c chore: run CI jobs on CI nodes
This adds a node selector to our drone jobs that runs the jobs on
dedictated CI nodes.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 08:33:02 -07:00
Spencer Smith
38dfddbab3 feat: break up osctl cluster create and basic/e2e tests
This PR will break cluster create apart from the other steps in
integration tests. It will allow us to run the cluster create, then use
it for parallel e2e builds in different cloud environments.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-01 10:55:24 -04:00
Andrew Rynhard
835d72b74a fix: create overlay mounts after install
Without running the install task first, /var is read-only. This causes
the overlay phase to fail as it tries to create /var/system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 06:35:12 -07:00
Andrey Smirnov
3024c26a55 chore: update dockerfile/buildkit versions
New buildkit release: https://github.com/moby/buildkit/releases/tag/v0.6.0

New release was published for buildkit's dockerfile:
https://github.com/moby/buildkit/releases/tag/dockerfile%2F1.1.2-experimental,
so we can stick to release version now.

These releases include fixes/implementation for `RUN --security=insecure`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-01 01:05:42 +03:00
Andrey Smirnov
084378ac04 fix(init): flip concurrency of tasks/services, fix small issues
Phases should run sequentially, while tasks concurrently in a phase.

There are two potential issues fixed:

1. `result` multierror was updated inside goroutine without any
synchronization, so this is a data race
2. panic inside task/phase runner might happen and as unhandled panic in a
goroutine aborts whole process, this might lead to a system halt as
as the 'machined' exits

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-31 14:21:07 -07:00
Spencer Smith
bc5fe085bd fix: set mtu value regardless of interface state
This PR will fix a bug we encountered in GCE, where the interface was
already up and the MTU value wasn't getting set.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-07-31 15:02:02 -04:00
Andrey Smirnov
ac963ad7e1 feat(osctl): allow configurable number of masters to cluster create
This allows to run tiny Talos clusters (which is sometimes nice for
local testing), e.g. with just a single master and zero workers.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-30 15:32:16 -07:00
Andrew Rynhard
e2e5236f62 chore: prepare release v0.1.0
This is the official v0.1.0 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 21:17:10 -07:00
Andrew Rynhard
12486ef0e2 chore: remove rootfs output param
This removes the `--output` flag from the rootfs target. With the output
specified it was outputting the file directory structure to the build
directory.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
v0.2.0-alpha.4
2019-07-29 20:26:54 -07:00
Andrew Rynhard
f0c469c558 chore: prepare release v0.2.0-alpha.4
This is the official v0.2.0-alpha.4 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 19:31:59 -07:00
Andrew Rynhard
92b72311c7 chore: add AMI build
This will build AMIs and publish them to our official account on a
release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 18:56:33 -07:00
Andrey Smirnov
587011e250 chore: remove hack/dev/ scripts & docker-compose
They are outdated, `osctl cluster` implements cluster up/down in a
better way. K8s manifests are left intact, they are used in integration
tests.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-30 00:47:58 +03:00
Andrew Rynhard
e63c882b89 refactor: split machined into phases
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 12:40:03 -07:00
Andrey Smirnov
f56a9d5b96 chore: implement first version of CRI runner
It runs containers via CRI interface in a pod sandbox. This is the very
first version:  I tried not to introduce any changes to common runner
interface.

There should be some CRI-speficic options for the runner (like polling
interval, as it doesn't have nice `Wait()` API), plus my plan so far is
to use OCI as the common layer for container options, so that we can
analyze OCI and translate to CRI (when possible, return errors when
option is not implemented).

CRI interface doesn't have a concept of 'unpacking' an image, so we
probably need to unpack via containerd API (or any other
runtime-specific API) by targeting CRI namespace.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-26 21:07:46 +03:00
Andrey Smirnov
3e6993c648 chore: fix build cache
Remove `-a` flag to `go build` which caused cache to be missed all the
time. Add cache mount where missing, update path to match Go build cache
exactly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-26 10:38:00 -07:00
Andrew Rynhard
b7a9acbe88 refactor: move setup logic into machined
The responsibility of init should only be to mount the rootfs. This
change moves Talos specific logic into machined. This will allow us to
define a version of Talos in a single binary instead of split across
two. This will enable cleaner upgrades and helps make the codebase
easier to reason about.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-26 07:48:49 -07:00
Andrew Rynhard
a7d76b9410 fix: Run cleanup script earlier in rootfs build
This change fixes a bug that caused the API server to fail due to a
missing directory at /usr/share/ca-certificates.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 14:51:13 -07:00
Andrew Rynhard
6852fa969f chore: create raw image as sparse file
This change reduces the size of raw disk significantly by creating it as
a sparse file.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 11:28:07 -07:00
Andrew Rynhard
0ec17e4169 feat: run rootfs from squashfs
This change moves the rootfs to a squashfs image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 08:38:31 -07:00
Andrew Rynhard
0b8778d772 feat: enable missing KSPP sysctls
These were disabled in previous versions of Talos since BPF was
completely disabled. With this change, we now implement all recommended
sysctls.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-24 22:41:43 -07:00
Andrew Rynhard
5a68b8b371 fix: mount cgroups properly
This change mounts cgroups properly.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-24 22:10:15 -07:00
Andrew Rynhard
b4383e35db feat: move df API to init
This change allows for more accurate mount reporting as /proc/mounts is
a symlink to /proc/self/mounts and contains mounts that are relative to
the running process. In our case this was osd. This caused inaccurate
reporting of mounts since they were relative to osd when we really
wanted mounts relative to machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-24 15:28:37 -07:00
Seán C McCord
8884b85905 fix(trustd): allow hostnames for trustd endpoints
Fixes #666

Also adds IPv6 to tests for trustd endpoints

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-07-24 15:28:03 -07:00
Andrey Smirnov
b1c184b616 chore: fix GOCACHE dir location
`go env GOCACHE` tells it's actually `/.cache` in our build environment
(probably because `$HOME` is not set?)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-25 00:44:57 +03:00
Spencer Smith
2208eb5924 fix: check proper value of parseip in dhcp
This PR fixes a small bug where we weren't properly checking the value
of a net.ParseIP() call and setting the hostname to the first octet of
an IP.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-07-24 15:11:27 -04:00
Andrey Smirnov
8c59adb9dc chore: allow to run tests only for specified packages
This allows to do `make test TESTPKGS=./internal/app/machined`.

Also update Dockerfile slug as
https://github.com/moby/buildkit/pull/1081 was merged into master.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-23 22:17:22 +03:00
Spencer Smith
45def0a242 feat: attempt to connect to all trustd endpoints when downloading PKI
This PR will connect to each trustd endpoint specified, returning once
successful. Closes #891.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-07-22 19:40:12 -04:00
Andrew Rynhard
6fa7c1fcbd chore: compress Azure image
The image needs to be compressed in order to publish it to GitHub.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
v0.2.0-alpha.3
2019-07-22 14:48:24 -07:00
Andrew Rynhard
32961efbe0 chore: remove the raw disk after Azure build
The raw disk causes the release to fail.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-22 14:20:50 -07:00
Andrew Rynhard
fdb48c981a chore: fix release
This change makes the release step wait for the Azure artifact.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-22 13:49:04 -07:00
Andrew Rynhard
0e2b5f9227 chore: fix image builds on tags
The GCE and Azure steps need to run in serial since they both output the
same artifact name.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-22 13:45:32 -07:00