49 Commits

Author SHA1 Message Date
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00
Brad Beam
a39cd81b8f chore(networkd): Report on errors during interface configuration
This DRYs up the interface configuration and adds in an error channel to capture
any issues that come up from interface configuration. These errors are still
treated as non-fatal, but should provide some additional insight.

Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2020-02-03 12:46:37 -08:00
Brad Beam
e9113537f9 feat(networkd): Make healthcheck perform a check
This implements an actual health check for networkd. We use the arp table ( ip neighbors )
to determine if the machine is actively sending traffic. We should see at least one entry
with a REACHABLE/STALE/DELAY state during normal operating conditions.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-02-03 11:01:00 -08:00
Brad Beam
4593c4f727 fix(networkd): fix ticker leak
Call ticker.Stop() to prevent leak.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-29 10:05:47 -08:00
Brad Beam
88df1b50b8 feat(networkd): Add health api
This introduces a health/ready api for networkd. This
will allow us to better determine the state of networkd
and allow for some level of monitoring.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-29 09:09:27 -06:00
Andrey Smirnov
cebd88f77c fix: parse correctly kernel command line missing DNS config
If nameserver is missing, `net.ParseIP` parses it as `nil` `net.IP` and
later on this `<nil>` address is pushed to `resolv.conf`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-27 21:14:05 +03:00
Brad Beam
93218687ec fix(networkd): Fix incorrect resolver settings
This modifies the way the hostname gets set. Previously, we would run
through the entire addressing and resolver configuration and then set the
hostname. This is problematic because the resolver depends on the functionality
of Hostname() ( resolver configuration relies on the domainname of the host ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-17 09:47:54 -05:00
Brad Beam
3dff2b234d fix(networkd): Set hostname properly for dhcp when no hostname option is returned
This fixes a condition where a dhcp response does not provide a hostname. Previously
this would cause the default hostname ( talos-127-0-1-1 ) to be used. This catches
the condition and changes it to compute the hostname via talos-ip.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-16 09:03:47 -06:00
Andrew Rynhard
2a449aea2f fix: fix error format
Minor fix to error string format that also uses %q instead of %s. The
quoted format helps when there are hidden characters.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-30 20:56:23 -08:00
Andrew Rynhard
5a7eb631b2 feat: add installer command to installer container
This replaces the entrypoint.sh shell script with a go binary.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-26 06:41:25 -08:00
Brad Beam
da88d7bcb3 fix(networkd): Make better route scoping decisions
This brings in an updated library along with some tweaks on our side to allow for
better decision making when it comes to the scope of routes. This also fixes an
issue where multiple configuration definitions for an interface were not properly
merged and instead were overwritten.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-23 09:43:14 -08:00
Andrew Rynhard
0fae1bc92d fix: fix output formats
This fixes random log issues.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-23 09:09:55 -08:00
Brad Beam
64a7eeb0e1 fix(networkd): Check for IFF_RUNNING on link up
This should allow us to correctly differentiate between IFF_UP ( admin up ) and
IFF_RUNNING ( link ready ). This means that we should now wait for the link to
be up and running before proceeding with addressing which should allow for more
reliable results in the dhcp configuration and avoid any race issues in static
configuration.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-21 15:01:39 -06:00
Andrey Smirnov
de35b4d5af fix: issues discovered by lgtm tool
Using `SafePath` function from `runc` (but had to create local copy as
`runc` doesn't build on OS X).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-18 21:43:59 +03:00
Andrew Rynhard
ad863a7f92 refactor: rename protobuf services, RPCs, and messages
This PR brings our protobuf files into conformance with the protobuf
style guide, and community conventions. It is purely renames, along with
generated docs.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-11 11:41:40 -08:00
Brad Beam
9f69733d74 chore: Remove increased timeouts for dhcp addressing.
These timeouts were initially increased to handle long times for links to be ready. I think
with the updated link ready check in networkd these timers are unnecessary.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-09 13:18:01 -08:00
Seán C McCord
d8caa5316a fix: append domainname to DHCP-sourced hostname
If we receive a domain name from the DHCP server, append it to the
received hostname.

Fixes #1628

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-12-08 19:33:26 -08:00
Seán C McCord
b597306989 feat: add domain search line to resolv.conf
If we have a domainname, add it as a search root in resolv.conf.

Fixes #1626

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-12-08 18:57:07 -08:00
Brad Beam
653100dc3b fix(networkd): Ignore loopback interface during hostname decision.
This should disregard the loopback from the hostname decision since it will always be hardcoded
to the default talos hostname.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-07 12:50:00 -08:00
Brad Beam
06d537b6ce chore: Add link name to dhcp addressing error
Minor fix to print out the interface name along with the failure message.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-06 14:34:16 -06:00
Brad Beam
e1651a8a98 fix: Add hostname setting to networkd
This moves all the hostname aggregation and setting into networkd so we can
get a correct response.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-05 11:02:12 -08:00
Seán C McCord
9d9b958fba fix: reverse preference order of network config
Kernel config should always play second to a file-based config.

Fixes #1588

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-12-04 17:01:05 -08:00
Brad Beam
05c1659126 feat(networkd): Add support for kernel nfsroot arguments.
This adds support for parsing/honoring the `ip=` kernel argument that can
be supplied to configure an interface on the host.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-02 09:57:05 -08:00
Brad Beam
a6ab1ec2a5 chore(networkd): Ignore bonded interfaces without config
This change sets bonded devices to ignored if there is no user supplied
configuration. Without configuration, a bonded interfaces doesnt provide
any value. This should speed up initial boot times by preventing address
discovery on this interface.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-27 08:10:07 -08:00
Brad Beam
119bf3e7bb feat(networkd): Add support for bonding
This includes a healthy refactor of the networkd code as well.
- Move netlink functionality to nic package
- Networkd facilitates the orchestration of the underlying interface configuration
- Networkd now stores the state of each interface configuration. This
  should allow us to expose this information via api in the future.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-26 20:08:31 -08:00
Brad Beam
d67fbf269b feat: Add support for resetting the network during bootup
This introduces the ability to reset the network interface during the bootup sequence.
This allows for user defined static networking to be the only configuration on the
network interface instead of potentially dhcp+static.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-12 10:48:28 -08:00
Andrey Smirnov
e658c442a6 feat: implement grpc request loggging
Logging is pretty simple and bare minimum is being logged. I believe
better logging can be provided for apid when it does fan-out, but that
is beyond the scope for the first PR.

Sample logs:

```
$ osctl-linux-amd64 logs machined-api
machined 2019/11/11 21:16:43 OK [/machine.Machine/ServiceList] 0.000ms unary Success (:authority=unix:/run/system/machined/machine.sock;content-type=application/grpc;user-agent=grpc-go/1.23.0)
machined 2019/11/11 21:17:09 Unknown [/machine.Machine/Logs] 0.000ms stream open /run/system/log/machined.log: no such file or directory (:authority=unix:/run/system/machined/machine.sock;content-type=application/grpc;user-agent=grpc-go/1.23.0)
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-11 13:42:08 -08:00
Brad Beam
8988c1c6a0 feat: Disable networkd configuration if ip kernel parameter is specified
This allows the kernel argument `ip` to take precedence over networking configuration. Documentation for
this parameter can be found here https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-10 12:07:01 -08:00
Brad Beam
32fe6297fe feat(networkd): Add support for custom nameservers
This adds support for specify nameservers in the config.

When I was adding tests I noticed the netconf code for setting
the MTU caused a panic. Given how we retrieve the data ( device centric )
in the static addressing method, I think this is safe to remove.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-07 13:57:02 -06:00
Brad Beam
457c6416a6 feat: Add network api to apid
This extends apid to include the network api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-28 04:21:48 -07:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrew Rynhard
94c28657d3 feat: add config validation task
This should provide a better UX around misconfigured Talos nodes. It is
just the start of something we can expand on.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 20:26:26 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrey Smirnov
bb5f5cc754 chore: bump golangci-lint to 1.20
Memory usage reduced around 8-10x: now it stays stable at 1GB.

I disabled some of the new linters, and one rule which is violated a
lot.

I might make sense to go back and enable `wsl` fixing all the issues
(leaving that for another PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-09 22:21:08 +03:00
Andrew Rynhard
4ae8186107 feat: add configurator interface
This moves from translating a config into an internal config
representation, to using an interface. The idea is that an interface
gives us stronger compile time checks, and will prevent us from having to copy
from on struct to another. As long as a concrete type implements the
Configurator interface, it can be used to provide instructions to Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-04 07:53:09 -07:00
Andrew Rynhard
3a92537a30 refactor: rename RPCs
The following RPCs have been renamed:

- ps to containers
- top to processes
- df to mounts

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-20 14:33:51 -07:00
Andrew Rynhard
6efd6fbe08 chore: move gRPC API to public
In order for other projects to make use of our APIs, they must not
reside underneath the internal directory. This moves the protobuf
definitions to a top-level "api" directory and scopes them according to
their domain. This change also removes generated code from the gitignore
file so that users don't have to generate the code themseleves.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 08:55:13 -07:00
Andrew Rynhard
5ee554128e chore: move from gofumpt to gofumports
The gofumports does everything that gofumpt does with the addition of
formatting imports. This change proposes the use of the `-local` flag so
that we can have imports separated in the following order:

- standard library
- third party
- Talos specific

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-12 07:49:12 -07:00
Andrew Rynhard
2955428850 chore: format code with gofumpt
The gofumpt linter is a stricter drop-in replacement for gofmt. The
rules are ones that I strongly agree with and I think it would be better
if we added this linter instead of nit picking every PR.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-11 11:03:29 -07:00
Seán C McCord
f7ad24ec4f feat: allow network interface to be ignored
Added a property to userdata to allow a network interface to be ignored,
such that Talos will perform no operations on it (including DHCP).

Also added kernel commandline parameter (talos.network.interface.ignore)
to specify a network interface should be ignored.

Also allows chaining of kernel cmdline parameter Contains() where the
parameter in question does not exist.

Fixes #1124

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-09-07 16:33:52 -07:00
Andrew Rynhard
9337dcdfcd feat: configure interfaces concurrently
This uses a wait group to configure interfaces concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-05 14:45:42 -07:00
Seán C McCord
845cd92e5d fix: increase retries for DHCP
Increased retry count to 6 for DHCP.  In my testing, this worked
reliably in my setup, where the default (3) did not.

Ultimately, this should probably be configurable from the userdata.
Instead, this just makes it work for me.

Fixes #1099

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-09-02 19:02:53 -07:00
Brad Beam
a6ba81bf4e fix(networkd): Fix hostname retrieval
If multiple interfaces exist on a node, but the first interface was unsuccessful
in getting a dhcp response, we would seg fault when trying to retrieve the hostname
for that interface. This was due to d.Ack being nil and us having no guard around it

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-28 21:25:15 -05:00
Andrew Rynhard
bf8fc1dcbd chore: lint protobuf definitions
This adds linting to our protobuf definitions via prototool.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:12:36 -07:00
Andrew Rynhard
0bdaff1a90 feat: perform upgrades via container
This moves to performing upgrades via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 09:44:50 -07:00
Brad Beam
692571bdec feat(networkd): Add grpc endpoint
Allows us to list routes and interface details

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-25 19:48:08 -07:00
Brad Beam
cdc989ddda refactor(networkd): Switch from rtnetlink to rtnl
Gives a better abstraction on rtnetlink interaction

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00