979 Commits

Author SHA1 Message Date
Andrey Smirnov
d49c4baf62 chore: make health tests more robust
Fixes #1018 #1020

Add more wait loops to address cases when unit-tests are running
extremely slow under high load on the build machine.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-02 19:01:33 -07:00
Andrey Smirnov
3012851208 fix(machined): limit max stderr output, use pkg/cmd consistently
Use circular buffer instead of (unlimited) `bytes.Buffer` to limit
amount of stderr output captured. If command being run produces too much
output on stderr, this might consume too much RAM.

Use `pkg/cmd` to run command in `udevd` service. This should allow
easier udevd integration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-02 19:01:15 -07:00
Brad Beam
1373806165 fix(init): Enable containerd subreaper
Should take care of our issue with Zombies

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-30 14:32:13 -07:00
Andrey Smirnov
029374f07d chore: disable go test result cache
Go by default caches unit-tests results via build cache, so if source
code doesn't have any changes, test results are cached on package level.
As our unit-tests are not that pure and depend on the environment, it
would be more helpful to make sure all the unit-tests during each build.

Setting number of test runs to one disable test result cache (but build
cache is still being used).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-30 22:03:00 +03:00
Andrew Rynhard
ef2154745d fix: leave etcd when upgrading control plane node
We need to remove the current node from etcd when upgrading.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-30 07:16:56 -07:00
Andrew Rynhard
1bbed6907b chore: fix generate version flag and mark v0 as deprecated
Since the command's name is 'generate' the 'gen' prefix is not needed
in the version flag. The flag is scoped under the generate command so
it should be very clear that the '--version' flag is used to control the
config version.

We also move to defaulting to v0 since v1 is new and still needs to be
tested in the real world. We can default to v1 in the next release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-30 06:59:54 -07:00
Andrey Smirnov
de49903a5f chore: fix location of Go build cache mount for unit-tests-race
This step is based on `golang` image, so `GOCACHE` is set in a bit of a
different way.

No big deal, but should speed up subsequent runs a bit.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-29 16:35:14 -07:00
Brad Beam
a6ba81bf4e fix(networkd): Fix hostname retrieval
If multiple interfaces exist on a node, but the first interface was unsuccessful
in getting a dhcp response, we would seg fault when trying to retrieve the hostname
for that interface. This was due to d.Ack being nil and us having no guard around it

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 21:25:15 -05:00
Brad Beam
b1dc400fea chore: Fix azure image upload
Single quote causes variable to not be evaluated

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 20:38:30 -05:00
Brad Beam
9b91cd4511 chore: Clean up e2e scripts
- Use az/gcloud cli bundled with container
- Use consistent spacing in scripts ( 2 spaces vs tab )
- Updated count functions to handle the count inline
- Made platform kubeconfig the default

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 08:31:47 -05:00
Andrew Rynhard
d89b199825 chore: change upgrade request "url" to "image"
This aligns the nomenclature used throughout the codebase.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 21:43:20 -07:00
Andrew Rynhard
2e8f393fc5 chore: remove unused init token
This removes a token that we never used. Right now its just noise, so
let's remove it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 21:36:52 -07:00
Andrew Rynhard
1b8bf0d3aa fix: use unique variables for CLI flags
Since the cluster create command and the upgrade command shared a common
variable, and the upgrade defaults to an empty string, we get an invalid
reference format error when attempting to create a cluster. This makes
the variables unique to avoid that.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 19:33:30 -07:00
Andrew Rynhard
295cbf9dc6 chore: remove generated raw disk
This was mistakenly removed.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 19:08:51 -07:00
Andrew Rynhard
66c848cc0d fix: make --target persistent across all commands
We have this flag missing in a number of places. This ensures that all
commands in the future will have this flags. A potential cleanup would
be to hide this flag in commands where it does not make sense. For now I
think its best to have everywhere.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:57:53 -07:00
Andrew Rynhard
d098785a17 chore: remove local upgrade functionality
We have no need for this anymore since installs and upgrades are now
completely handled in a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:44:18 -07:00
Andrew Rynhard
bf8fc1dcbd chore: lint protobuf definitions
This adds linting to our protobuf definitions via prototool.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:12:36 -07:00
Andrew Rynhard
4247b1befc chore: output top header in all caps
This changes the top output to be consistent with the rest of the CLI
output.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:04:39 -07:00
Andrew Rynhard
83b978c983 chore: prepare release v0.2.0-alpha.7
This is the official v0.2.0-alpha.7 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
v0.2.0-alpha.7
2019-08-27 15:00:30 -07:00
Andrew Rynhard
d4770d41ad feat: run installs via container
This moves to performing installs via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 15:01:20 -05:00
Spencer Smith
739e232896 feat: upgrade kubernetes to v1.16.0-beta.1
This PR will upgrade to the latest beta of v1.16 in order to get us
closer to catching the v1.16.0 release as soon as it drops.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-27 13:25:33 -04:00
Brad Beam
f028d29d31 chore: Increase timers for healthchecks
We've seen some instances where the initial delay is not long enough (containerd)
as well as a period of every second increases the log size for services like
proxyd which log incoming connections.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-27 09:54:05 -07:00
Andrew Rynhard
0bdaff1a90 feat: perform upgrades via container
This moves to performing upgrades via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 09:44:50 -07:00
Spencer Smith
f85750cdca feat: generate and use v1 machine configs
This PR will implement the v1 machine config proposal. This will allow
for a streamlined config for talos nodes.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-26 19:36:14 -04:00
Andrew Rynhard
15cfd42168 chore: upgrade tools
This brings in Go v1.12.9 to address CVEs and bugs.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-26 15:57:30 -07:00
Andrew Rynhard
43e20217e8 feat: add ability to pass data on event bus
We need to support eventing with associated data. This moves the event
bus to an observer design pattern that allows observers to register for
specific events, and to receive the associated data.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-26 13:27:02 -07:00
Spencer Smith
6f8e089271 chore: use kubeadm v1beta2 structs everywhere
This PR will move to using the external kubeadm v1beta2 structs for our
code base. This will hopefully allow for more stable integrations with
kubeadm in the long term, as well as solve some needs we have in the
machine config rewrite.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-26 12:07:36 -04:00
Brad Beam
692571bdec feat(networkd): Add grpc endpoint
Allows us to list routes and interface details

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-25 19:48:08 -07:00
Brad Beam
d36007fb29 feat(osd): Add ntpd client
Allows us to access ntp api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-25 13:38:34 -07:00
Andrew Rynhard
9eaa2d8140 feat: add sequencer interface
This adds an interface that can be used to descibe boot, shutdown, and
upgrade events in a set of phases.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 12:59:42 -07:00
Andrew Rynhard
be8f58c15d feat: add overlay task
This adds a well defined task for handling all overlay mount points that
are required by the system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 10:47:54 -07:00
Andrew Rynhard
1eb02875c2 feat: use BLKPG ioctl for partition events
This moves to using BLKPG ioctl instead of BLKRRPART. BLKRRPART is older
and more sensitive to EBUSY errors. BLKPG has the potential to minimize
the changes of encountering an EBUSY error when manipulating partition
tables.

In looking at a comparison between BLKPG and BLKRRPART, it seems that
both have their pros and cons. Eventually a combination of the two may
serve us better, but for now I think BLKPG will get us further.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 07:55:24 -07:00
Andrew Rynhard
fd25c019bf chore: fix qemu-boot.sh
Fixes a typo that cased the switch statement to not match Linux
environments.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 13:24:24 -07:00
Andrew Rynhard
f5f6c29e99 chore: add QEMU script
This script will help in low-level development.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 00:56:12 -07:00
Seán C McCord
7b217c79d7 feat: allow specification of additional API SANs
Adds handler for specification of additional subjet alt names (SANs) for
the API Server when generating a new cluster configuration using
`osctl`.

Fixes #800

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-21 16:25:54 -07:00
Brad Beam
cdc989ddda refactor(networkd): Switch from rtnetlink to rtnl
Gives a better abstraction on rtnetlink interaction

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Andrew Rynhard
0af1eba159 refactor: add more runtime modes
In order to DRY up all installation methods and mount methods, this PR
introduces a few more runtime modes. The modes are then used to
determine the strategy for creating and or mounting the paritions.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 20:23:45 -07:00
Andrew Rynhard
794c7231f5 feat: run dedicated instance of containerd for system services
In order to facilitate upgrades and resets that are capable of
manipulating the system block device, we need to run an instance of
containerd that has zero dependencies on the disk. We run containerd
purely in memory for running system services.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 12:32:59 -07:00
Andrew Rynhard
060498ec87 chore: disable CIS benchmarks
These are failing with false positives. Disable for now so that we can
run our conformance tests.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 11:04:15 -07:00
Andrew Rynhard
2e65cff3ce feat: mount /sys/fs/bpf
The BPF filesystem is required to pin BPF objects.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-18 07:37:08 -07:00
Seán C McCord
cb1210719a fix: enclose target in quotes
Fixes issue #1049

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-17 21:19:10 -07:00
Brad Beam
ec0f188309 fix(machined): Remove host mounts for specific CNI providers
We shouldnt need these anymore

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 20:20:45 -07:00
Brad Beam
03228c7401 chore(ci): Only push latest tags if branch is master.
Should prevent flakes when we merge fixes on release branches where they unintentionally
get tagged as `latest`.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 16:03:01 -05:00
Brad Beam
af47edf1ad chore: Make losetup atomic during installation
This should fix a race conditions where two independent image creation steps
run `losetup -f` and discover the same 'next available' loopback device and
attempt to use it.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 15:23:42 -05:00
Brad Beam
046a8a4ba5 chore: Fix reread error value on retry
Should prevent a flake with returning an error when
it actually succeeded.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 12:58:22 -07:00
Andrew Rynhard
8c73c38b8a chore: enforce one sentence per line in Markdown files
This is widely considered best practice, we should enforce it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 10:15:27 -07:00
Andrew Rynhard
7970f977b7 chore: add markdownlint
This will give us a standard tool for linting Markdown files.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 03:53:52 -07:00
Andrew Rynhard
e305acac20 feat: add standardized command runner
This adds a command runner function that can be used everywhere we need
to exec a binary. It adds addtional logic around error handling that
will allow for viewing errors in the case of a failed command.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 03:38:36 -07:00
Brad Beam
a68cac0a94 chore: Retry reread partition table if EBUSY
Should help make it more robust

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 03:27:13 -07:00