979 Commits

Author SHA1 Message Date
Spencer Smith
e01bc3be05
chore: update toolchain images (#754)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-21 16:24:23 -04:00
Andrey Smirnov
76071abbb8
feat(init): move 'ls' API to init from osd (#755)
Service `osd` doesn't have access to rootfs, as it is running in a
container, so move API to `init` which has unconstrained access to
rootfs. (This is in line with another API, `osctl cp`).

Fixes: #752

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-21 22:29:39 +03:00
Andrew Rynhard
1f36f0e7df
refactor(osctl): use UserHomeDir to detect user home directory (#749)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-06-20 17:57:57 -07:00
Andrey Smirnov
9ed45f7090 feat(osctl): implement 'cp' to copy files out of the Talos node (#740)
Actual API is implemented in the `init`, as it has access to root
filesystem. `osd` proxies API back to `init` with some tricks to support
grpc streaming.

Given some absolute path, `init` produces and streams back .tar.gz
archive with filesystem contents.

`osctl cp` works in two modes. First mode streams data to stdout, so
that we can do e.g.: `osctl cp /etc - | tar tz`. Second mode extracts
archive to specified location, dropping ownership info and adjusting
permissions a bit. Timestamps are not preserved.

If full dump with owner/permisisons is required, it's better to stream
data to `tar xz`, for quick and dirty look into filesystem contents
under unprivileged user it's easier to use in-place extraction.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-20 17:02:58 -07:00
Andrey Smirnov
e86ef87fe8 chore: don't run tests in parallel across packages (#748)
We run tests in parallel mode (`go test -p 4`), default is to run in
parallel in fact. But tests are not isolated, as some of them launch
containerd on a fixed file socket (as socket path is hardcoded in
Talos), and that might lead to any weirdness when tests try to
launch containerd concurrently on the same file socket.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-20 16:30:21 -07:00
Spencer Smith
8a89ecd679
fix: we don't need no stinkin' localapiendpoint (#741)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-19 20:36:47 -04:00
Spencer Smith
11788966af
fix: run basic-integration on nightly cron (#735)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-17 17:00:01 -04:00
Spencer Smith
4ba12fecd8
feat(ci): enable nightly e2e tests (#716)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-17 12:36:48 -04:00
Andrey Smirnov
89b876c676
fix: containers test by locking image to specific tag (#734)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-17 18:59:57 +03:00
Andrey Smirnov
070cbc9d60
refactor(osd): implement container inspector for a single container (#720)
Instead of pulling a full list of containers, implement inspector query
for a single container following the spec to build display name.

Also adds many more tests for the container inspector.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-17 17:54:28 +03:00
Andrey Smirnov
854395517f
chore: improve test stability for containerd tests (#733)
This should be no-op but allows to depend less on timing for concurrent
operations.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-15 00:00:06 +03:00
Andrey Smirnov
0c0a0340b2
fix(osctl): allow '-target' flag for osctl restart (#732)
I couldn't find any use for the `timeout` flag nor the value passed in
the API, but it block much more useful and present in other commands
flag 'target'.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-14 21:37:57 +03:00
Spencer Smith
4168111ae5 chore: add google group to readme (#730)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-12 11:53:38 -07:00
Brad Beam
c88b6fc422 fix(proxyd): Fix backend deletion (#729)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-06-07 14:34:47 -07:00
Andrey Smirnov
fb320a894b
fix(osctl): Revert "display non-fatal errors from ps/stats in osctl (#724)" (#727)
This reverts commit f200eb7a8a0b7c2d29710f695000eb7680ce8b7d.

grpc can't send back both response and an error.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-07 22:50:05 +03:00
Seán C. McCord
532a53bfaf feat(init): Implement 'ls' command (#721)
Fixes #719

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-06-07 10:19:20 -07:00
Andrey Smirnov
f5969d2c6c
fix(osctl): avoid panic on empty 'talosconfig' (#725)
When talosconfig doesn't exist, `osctl` creates empty one behind the
scenes, but that leads to immediate panic if the command tries to build
osd client:

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11d6786]

goroutine 1 [running]:
github.com/talos-systems/talos/cmd/osctl/pkg/client.NewDefaultClientCredentials(0x7ffd720f5100, 0xb, 0xc000559ce8, 0x757014, 0xc0000d5500)
	/src/cmd/osctl/pkg/client/client.go:50 +0xa6
github.com/talos-systems/talos/cmd/osctl/cmd.setupClient(0x16ca3f0)
	/src/cmd/osctl/cmd/root.go:100 +0x3d
github.com/talos-systems/talos/cmd/osctl/cmd.glob..func22(0x24ad7c0, 0xc00058c240, 0x0, 0x3)
	/src/cmd/osctl/cmd/ps.go:32 +0x37
github.com/spf13/cobra.(*Command).execute(0x24ad7c0, 0xc0005f8a00, 0x3, 0x4, 0x24ad7c0, 0xc0005f8a00)
	/toolchain/gopath/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:766 +0x2ae
github.com/spf13/cobra.(*Command).ExecuteC(0x24ae140, 0x2507030, 0x162f2d7, 0xb)
	/toolchain/gopath/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:852 +0x2ec
github.com/spf13/cobra.(*Command).Execute(...)
	/toolchain/gopath/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:800
github.com/talos-systems/talos/cmd/osctl/cmd.Execute()
	/src/cmd/osctl/cmd/root.go:93 +0x24f
main.main()
	/src/cmd/osctl/main.go:10 +0x20
```

Fix that by returning explicit error:

```
error getting client credentials: 'context' key is not set in the config
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-07 17:40:28 +03:00
Andrey Smirnov
f200eb7a8a
fix(osctl): display non-fatal errors from ps/stats in osctl (#724)
Logging those errors in osd makes them hard to discover.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-07 17:10:18 +03:00
Brad Beam
0d5f521291
feat(init): Add support for kubeadm reset during upgrade (#714)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-06-06 22:41:22 -05:00
Spencer Smith
95b107d884
chore(ci): modularize integration test (#722)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-06 09:28:53 -04:00
Brad Beam
8a5acff119
fix: Add gitmeta as dependency for push (#718)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-06-05 16:01:46 -05:00
Brad Beam
d68e303f27
feat(init): Add service stop api (#708)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-06-05 14:49:03 -05:00
Andrew Rynhard
ecfa945fc8
chore: download official gitmeta to BINDIR (#717)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-06-05 11:44:15 -07:00
Andrey Smirnov
7a4a677f04
fix(init): use 127.0.0.1 IP in healthchecks to avoid resolver weirdness (#715)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-05 19:30:28 +03:00
Spencer Smith
921114dd99
fix: ensure index remains in bounds for ud gen (#710)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-04 17:37:54 -04:00
Brad Beam
1a01440482
feat(init): Add support for stopping individual services (#706)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-06-04 15:51:30 -05:00
Andrey Smirnov
bf6ef7043c
chore: address flaky tests instability (#713)
For #711, this should be a complete fix - waiting for container to be
started.

For #712, this should be more of a workaround - playing with timeouts to
hit the failure less likely. Idea of the test is that health check
should be aborted on timeout (1ms) while health check succeeds if not
aborted in 50ms. Before the fix it was 1ms/10ms, but still concurrently
there was a chance that goroutine exits successfully after 10ms while
1ms context deadline is not reached.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-04 23:22:05 +03:00
Andrew Rynhard
84616b48b3
chore: prepare release v0.1.0-alpha.28 (#687)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
v0.1.0-alpha.28
2019-06-03 20:07:18 -07:00
Andrey Smirnov
d9f4f378c2 fix(osd): consistent container ids in stats, ps and reset (#707)
Fixes: #689, #690

Refactor container inspection code into a package of its own with some
rudimentary tests. Use this package consistently in osd commands dealing
with containers.

Improvements for the next PRs:

* implement API to fetch info about container by ID (to avoid fetching
full list)

* handle and display errors on client side, not to the log of the
server

* more tests, including k8s containers (how can we do that?)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-03 20:51:01 -05:00
Spencer Smith
16530db722
fix: don't set BUILDKIT_CACHE to empty string in Makefile (#705)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-06-03 13:59:39 -04:00
Andrey Smirnov
ee297da1a2
chore: enable GOPROXY for go modules (#703)
Announcement: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/golang-announce/0wo8cOhGuAI/v96KeTYtBwAJ

This should improve module download time as `go` no longer needs
to clone full repositories.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-31 23:59:52 +03:00
Andrey Smirnov
f96d3ce7cb
fix(osctl): don't print message on first ^C (#704)
This resolves extra messages when user does ^C to stop osctl. Message is
still printed on the second ^C and process is aborted on the third.

For the `logs` command, as it is streaming, suppress context canceled
error (before context changes process was crashing before printing an error).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-31 23:37:57 +03:00
Andrew Rynhard
b330d3b778
feat: leave etcd before upgrading (#702)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-31 10:59:12 -07:00
Brad Beam
8537e7eeb6
feat(init): Add support for control plane join config (#700)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-31 12:21:00 -05:00
Andrey Smirnov
dc79b0ad05
refactor(init): use 'switch' instead of long condition (#701)
Based on feedback from #699

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-31 17:39:38 +03:00
Andrey Smirnov
32826e3d14
feat(init): update 'waiting' state descritpion when conditions change (#698)
If some of the conditions are already satisfied, we can update the
description to reflect that, e.g.:

```
EVENTS   [Waiting]: Waiting for service "containerd" to be "up", service "networkd" to be "up", service "trustd" to be "up" (14m17s ago)
         [Waiting]: Waiting for service "trustd" to be "up" (14m16s ago)
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-31 16:40:32 +03:00
Spencer Smith
313a988292
fix: ensure shebang at top of userdata (#695)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-05-31 09:02:55 -04:00
Andrew Rynhard
f95f8f87a4
feat: upgrade Kubernetes to v1.15.0-beta.1 (#696)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-30 18:56:30 -07:00
Andrey Smirnov
7b7f4d4484 fix(init): consider 'finished' services to be 'up' (#699)
This fixes a race condition when kubeadm doesn't start waiting for
networkd forever.

If one service enters 'Waiting' state after another service already
finishes running, we should consider 'Finished' service to be 'up' in
the sense it has finished successfully (vs. 'Failed' state which is not
'up').

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-30 20:02:03 -05:00
Brad Beam
a1e635a4b2
feat(init): Prioritize usage of local userdata (#694)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-30 09:56:14 -05:00
Andrey Smirnov
ca95469247
feat(osctl): handle ^C by aborting context (#693)
This provides a bit better handling for the handing grpc
requests (or just slow requests):

```
$ osctl-linux-amd64 --talosconfig talosconfig version
Client:
	Tag:         ad410fb-dirty
	SHA:         ad410fb-dirty
	Built:
	Go version:  go1.12.5
	OS/Arch:     linux/amd64

^CSignal received, aborting, press Ctrl+C once again to abort immediately...
error getting version: rpc error: code = Canceled desc = context canceled
```

For now we catch `SIGINT` & `SIGTERM`. Second signal kills process
immediately as signal handler is removed.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-30 00:11:58 +03:00
Andrey Smirnov
ad410fb7f2
refactor(osctl): move cli code out of 'client' package (#692)
This moves cli code (rendering output, etc.) out of 'client' package, so
that client package is usable outside of cli.

Consistently accept context as first param to API methods, so that we
can build graceful request cancellation.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-29 01:10:25 +03:00
Andrey Smirnov
20f4d77d39 fix(init): move directory creation to kubeadm pre-func (#688)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-28 09:51:38 -07:00
Brad Beam
6cf260c5af fix(osctl): Generate correct config with master IPs (#681)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-27 18:59:41 -07:00
Andrey Smirnov
f704cb2cc3
refactor(osctl): DRY up osctl sources by using common client setup (#686)
Remove duplicated code which was setting up grpc client with common
method. Should have no functional changes otherwise.

Add args len check where missing.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-27 22:55:20 +03:00
Andrew Rynhard
90b5b83b8d
chore: improve the basic integration test (#685)
This PR ensures that we have 3 master nodes.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-27 10:44:09 -07:00
Andrey Smirnov
40a5b7c177
feat(init): expose networkd as goroutine-based server (#682)
This adds generic goroutine runner which simply wraps service as process
goroutine. It supports log redirection and basic panic handling.

DHCP-related part of the network package was slightly adjusted to run as
service with logging updates (to redirect logs to a file) and context
canceling.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-27 17:07:28 +03:00
Brad Beam
d8249c8779
refactor(init): Allow kubeadm init on controlplane (#658)
* refactor(init): Allow kubeadm init on controlplane

This shifts the cluster formation from init(bootstrap) and join(control plane)
to init(control plane).

This makes use of the previously implemented initToken to provide a TTL for
cluster initialization to take place and allows us to mostly treat all control
plane nodes equal. This also sets up the path for us to handle master upgrades
and not be concerned with odd behavior when upgrading the previously defined
init node.

To facilitate kubeadm init across all control plane nodes, we make use of the
initToken to run `kubeadm init phase certs` command to generate any missing
certificates once. All other control plane nodes will attempt to sync the
necessary certs/files via all defined trustd endpoints and being the startup
process.

* feat(init): Add service runner context to PreFunc

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-24 16:05:49 -05:00
Andrey Smirnov
a0188aff73
feat(init): implement service dependencies, correct start and shutdown (#680)
This PR introduces dependencies between the services. Now each service
has two virtual events associated with it: 'up' (running and healthy)
and 'down' (finished or failed). These events are used to establish
correct order via conditions abstraction.

Service image unpacking was moved into 'pre' stage simplifying
`init/main.go`, service images are now closer to the code which runs the
service itself.

Step 'pre' now runs after 'wait' step, and service dependencies are now
mixed into other conditions of 'wait' step on startup.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-05-24 19:17:52 +03:00
Andrew Rynhard
ecad4e35a9
docs: change meeting times to 24 hour format (#675)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-21 13:56:35 -07:00