90 Commits

Author SHA1 Message Date
Aaron U'Ren
4fd7bc4d19 fix(sync_routes): add deletion / immediate syncing
Added the following items to the original logic:
* Added map route entry deletion on withdrawl so that the system doesn't
  incorrectly sync it back to the kernel's routing table
* Added an immediate route sync upon BGP path receive
* Added a mutex to ensure that deleted routes aren't accidentally synced
  back to the system
* Added stopCh and wg (wait group) handling
* Increase default sync time from 15 seconds to 1 minute since this
  scenario is unlikely and netlink calls could potentially be burdensome
  in large clusters.
2022-03-18 15:02:02 -05:00
RusoX89
23ac78cf94 Routes Synchronization Routine 2022-03-18 15:02:02 -05:00
Tamihiro Lee
1db19931a2 skip binding device to ipip tunnel if node's interface is loopback 2022-03-11 16:41:14 -06:00
Tamihiro Lee
184976a536 start peering connection to neighbors from node's advertise-ip 2022-03-11 16:19:00 -06:00
Aaron U'Ren
b9a9246e8e fix(lint): don't error on deprecated protobuf funcs 2021-12-02 12:13:31 +01:00
Xiang Liu
8e7d585217 fix(bgp): use PeerState_ESTABLISHED logic like function name(#1184) 2021-11-08 15:14:01 -06:00
Aaron U'Ren
5e1d033a44 fix(sysctl): revert is fatal check for some conditions 2021-09-13 17:39:28 -05:00
Aaron U'Ren
8f3861de40 fact(sysctl): consolidate sysctl usage into utils 2021-09-11 16:20:07 -05:00
Aaron U'Ren
1d90e215e9 feat(.golangci.yml): enable stylecheck linter and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
85f28411dc feat(.golangci.yml): enable long lines linter and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
874a746e30 feat(.golangci.yml): enable gosec and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
6208bfac46 feat(.golangci.yml): enable gomnd and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
f52fddddee feat(.golangci.yml): enable gocritic and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
d6ccc22519 feat(.golangci.yml): enable goconst and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
35d334ca96 fix: add sleeps between iptables and ipset cleanup
I found that without taking a brief pause between iptables cleanup and
ipset deletion, sometimes the system still thought that there were
iptables references to the ipsets and would error instead of cleaning
the ipsets.
2021-08-05 16:39:28 -05:00
Aaron U'Ren
cafd69dfaf fix(NRC): reduce logging for egress cleanup errors
Errors can happen here for a lot of reasons, the user may not have been
running the controller, the definitions may have already been deleted,
the ipset may not be around to be referenced because the user already
cleaned up before.

Reduced the logging to trim user confusion over error statements in the
logs.
2021-08-05 16:39:28 -05:00
Aaron U'Ren
06e246ff30 fix(NRC): PR feedback fixes 2021-07-30 12:59:32 -05:00
Aaron U'Ren
445ad9a1b5 fix(injectRoute): process withdrawls first
Avoid extra and unneeded work by processing withdrawls first. Also makes
the logic a lot more clear.
2021-07-30 12:59:32 -05:00
Aaron U'Ren
2e590a4185 fix(NRC): consolidate route delete logic
This also makes the call that happens upon path withdrawl safer, by
checking to see if the route exists before deleting it.

One departure here is that we used to only log errors, now we return
errors as soon as they are encountered, this may cause some routes to
persist even if they had been cleaned before by stopping at the first
error. However, I think that it makes for more consistent and expected
behavior if this needs to be called in another place.
2021-07-30 12:59:32 -05:00
Aaron U'Ren
d0501c0763 fix(injectRoute): cleanup tunnels & routes when peer drops 2021-07-30 12:59:32 -05:00
Aaron U'Ren
94640acf81 doc(injectRoute): improve comments on logic flow 2021-07-30 12:59:32 -05:00
Aaron U'Ren
4959da43a4 feat(NRC): reduce verbosity of log messages for common overlay cases 2021-07-30 12:59:32 -05:00
Aaron U'Ren
38222a350b fact(injectRoute): extract setupOverlayTunnel() and cleanupTunnels() 2021-07-30 12:59:32 -05:00
Aaron U'Ren
63c3b90e05 fact(injectRoute): extract parseBGPPath method to simplify 2021-07-30 12:59:32 -05:00
Aaron U'Ren
e9be04ef2f
fix: add nil checking to ipsetMutex cleanup actions (#1129) 2021-07-20 01:22:48 +05:30
Aaron U'Ren
fa8d69edd8 fix: add locking around ipset invocations 2021-06-01 10:42:08 -05:00
Aaron U'Ren
a610596277 fact(GetMTUFromNodeIP): move up a layer of abstraction
This function is useful for more than just the NRC, move it up a layer
into the global utils so it can be used from multiple controllers.
2021-05-17 16:33:15 -05:00
Aaron U'Ren
9cbc3763b3 feat(bgp): add BGP communities support via node annotation 2021-05-17 12:08:36 -05:00
Aaron U'Ren
ef827d3dbf fix: protect uint32 conversion
See the following for more details:
https://github.com/cloudnativelabs/kube-router/security/code-scanning?query=ref%3Arefs%2Fpull%2F1065%2Fmerge+tool%3ACodeQL
2021-04-14 16:23:59 -05:00
Aaron U'Ren
1816886cb4 fix: remove possible BGP password leak via logs
See:
https://github.com/cloudnativelabs/kube-router/security/code-scanning/1?query=ref%3Arefs%2Fpull%2F1065%2Fmerge
2021-04-14 16:23:59 -05:00
Aaron U'Ren
be01f317c7 fact: other misc cleanups 2021-04-14 16:23:59 -05:00
Aaron U'Ren
53cfbe30eb fix: return early when we might be holding nil references 2021-04-14 16:23:59 -05:00
Aaron U'Ren
96675e620b fix: don't capitalize error messages
It is standard practice in Go to not capitalize error messages:
https://github.com/golang/go/wiki/CodeReviewComments#error-strings
2021-04-14 16:23:59 -05:00
Manuel Rüger
7d47aefe7d Replace github.com/golang/glog with k8s.io/klog/v2
glog is effectively unmaintained and the kubernetes ecosystem is mainly
using its fork klog

Fixes: #1051
2021-04-11 13:16:03 -05:00
Murali Reddy
40512f104a serialize the iptables changes by NRC and NPC while starting 2021-03-18 09:21:22 -05:00
yydzhou
49b9add056
Making IPIP/tunnel and override-nexthop independent (#1025)
* enable tunnel plus override-nexthop config

* add docs

* feedback integration

Co-authored-by: deng.zhou <deng.zhou@bytedance.com>
2021-02-09 18:44:56 +05:30
Murali Reddy
54b921f1f8 Merge remote-tracking branch 'iamakulov/master' 2021-01-04 16:56:41 +05:30
Murali Reddy
92b914e7fd review comments 2020-10-01 23:00:36 -05:00
Murali Reddy
7904b7c950 addressing review comments 2020-10-01 23:00:36 -05:00
Murali Reddy
947bb246e4 fix lint error 2020-10-01 23:00:36 -05:00
Murali Reddy
db1bd5611e set mtu in cni spec to auto configure MTU's of the pod's veth's and kube-bridge interfaces
Fixes #165
2020-10-01 23:00:36 -05:00
Aaron U'Ren
824614d162
Add Support for Reading Peer Passwords via a File (#986)
* Add support for reading peer passwords via a file

Syntax of the file is the same as for --peer-router-passwords, that is,
a comma separated list of base64 encoded passwords.

Passwords specified with --peer-router-passwords have precedence over
passwords read from peer-router-passwords-file.

* fix(options): peer password file linting and doc

Co-authored-by: Jean Raby <jean@raby.sh>
2020-09-08 16:16:21 -05:00
Murali Reddy
3c734fb96a
merge gobgp-update into master (#982)
* merge gobgp-update into master

* update travis.yaml go version:

* go get github.com/osrg/gobgp to build gobgp

* install git as go get needs it
2020-09-07 10:27:58 +05:30
Ivan Akulov
1a487d2140
Remove options passed to .Refresh()
To match the existing code behavior that existed for at least two years
2020-08-19 21:50:37 +03:00
Murali Reddy
a33089d292
[testing] run go linters (#943)
* run go linters for static code checking

* fix(lint): fix all goimports linting errors

* fix(lint): fix all golint errors

* fix(lint): fix all spelling errors

Co-authored-by: Aaron U'Ren <aauren@gmail.com>
2020-07-28 23:52:41 +05:30
Aaron U'Ren
031a9926d6
Merge pull request #786 from jdrahos/rr_ipv4_785
Allow to configure RR cluster id using IPv4 strings
2020-07-16 09:41:13 -05:00
CloudNativer
1c184624d1
The bgp-holdtime function parameter of setting holdtime is added to adjust the holdtime of BGP negotiation with the connected network devices. (#921)
The bgp-holdtime function parameter of setting holdtime is added to adjust the holdtime of BGP negotiation with the connected network devices.
2020-07-13 09:10:31 -05:00
Jean Raby
1c594b2827 Allow setting BGP Graceful restart time from CLI
Default value remains the same as GoBGP (90s)
2020-07-10 13:57:04 -05:00
Manuel Rüger
12674d5f8b
Add golangci-lint support (#895)
* Makefile: Add lint using golangci-lint

* build/travis-test.sh: Run lint step

* metrics_controller: Lint

pkg/metrics/metrics_controller.go:150:2: `mu` is unused (structcheck)
        mu          sync.Mutex
        ^
pkg/metrics/metrics_controller.go:151:2: `nodeIP` is unused (structcheck)
        nodeIP      net.IP
        ^

* network_service_graceful: Lint

pkg/controllers/proxy/network_service_graceful.go:21:6: `gracefulQueueItem` is unused (deadcode)
type gracefulQueueItem struct {
     ^
pkg/controllers/proxy/network_service_graceful.go:22:2: `added` is unused (structcheck)
        added   time.Time
        ^
pkg/controllers/proxy/network_service_graceful.go:23:2: `service` is unused (structcheck)
        service *ipvs.Service
        ^

* network_services_controller_test: Lint

pkg/controllers/proxy/network_services_controller_test.go:80:6: func `logf` is unused (unused)

* ecmp_vip: Lint

pkg/controllers/routing/ecmp_vip.go:208:4: S1023: redundant `return` statement (gosimple)
                        return
                        ^

* bgp_peers: Lint

pkg/controllers/routing/bgp_peers.go:331:4: S1023: redundant `return` statement (gosimple)
                        return
                        ^

* bgp_policies: Lint

pkg/controllers/routing/bgp_policies.go:80:3: S1011: should replace loop with `externalBgpPeers = append(externalBgpPeers, nrc.nodePeerRouters...)` (gosimple)
                for _, peer := range nrc.nodePeerRouters {
                ^
pkg/controllers/routing/bgp_policies.go:23:20: ineffectual assignment to `err` (ineffassign)
        podCidrPrefixSet, err := table.NewPrefixSet(config.PrefixSet{
                          ^
pkg/controllers/routing/bgp_policies.go:42:22: ineffectual assignment to `err` (ineffassign)
        clusterIPPrefixSet, err := table.NewPrefixSet(config.PrefixSet{
                            ^
pkg/controllers/routing/bgp_policies.go:33:30: Error return value of `nrc.bgpServer.AddDefinedSet` is not checked (errcheck)
                nrc.bgpServer.AddDefinedSet(podCidrPrefixSet)
                                           ^
pkg/controllers/routing/bgp_policies.go:48:30: Error return value of `nrc.bgpServer.AddDefinedSet` is not checked (errcheck)
                nrc.bgpServer.AddDefinedSet(clusterIPPrefixSet)
                                           ^
pkg/controllers/routing/bgp_policies.go:69:31: Error return value of `nrc.bgpServer.AddDefinedSet` is not checked (errcheck)
                        nrc.bgpServer.AddDefinedSet(iBGPPeerNS)
                                                   ^
pkg/controllers/routing/bgp_policies.go:108:31: Error return value of `nrc.bgpServer.AddDefinedSet` is not checked (errcheck)
                        nrc.bgpServer.AddDefinedSet(ns)
                                                   ^
pkg/controllers/routing/bgp_policies.go:120:30: Error return value of `nrc.bgpServer.AddDefinedSet` is not checked (errcheck)
                nrc.bgpServer.AddDefinedSet(ns)
                                           ^
                                                   ^

* network_policy_controller: Lint

pkg/controllers/netpol/network_policy_controller.go:35:2: `networkPolicyAnnotation` is unused (deadcode)
        networkPolicyAnnotation      = "net.beta.kubernetes.io/network-policy"
        ^
pkg/controllers/netpol/network_policy_controller.go:1047:4: SA9003: empty branch (staticcheck)
                        if err != nil {
                        ^
pkg/controllers/netpol/network_policy_controller.go:969:10: SA4006: this value of `err` is never used (staticcheck)
        chains, err := iptablesCmdHandler.ListChains("filter")
                ^
pkg/controllers/netpol/network_policy_controller.go:1568:4: SA4006: this value of `err` is never used (staticcheck)
                        err = iptablesCmdHandler.Delete("filter", "FORWARD", strconv.Itoa(i-realRuleNo))
                        ^
pkg/controllers/netpol/network_policy_controller.go:1584:4: SA4006: this value of `err` is never used (staticcheck)
                        err = iptablesCmdHandler.Delete("filter", "OUTPUT", strconv.Itoa(i-realRuleNo))
                        ^

* network_services_controller: Lint

pkg/controllers/proxy/network_services_controller.go:66:2: `h` is unused (deadcode)
        h      *ipvs.Handle
        ^
pkg/controllers/proxy/network_services_controller.go:879:23: SA1019: client.NewEnvClient is deprecated: use NewClientWithOpts(FromEnv)  (staticcheck)
        dockerClient, err := client.NewEnvClient()
                             ^
pkg/controllers/proxy/network_services_controller.go:944:5: unreachable: unreachable code (govet)
                                glog.V(3).Infof("Waiting for tunnel interface %s to come up in the pod, retrying", KUBE_TUNNEL_IF)
                                ^
pkg/controllers/proxy/network_services_controller.go:1289:5: S1002: should omit comparison to bool constant, can be simplified to `!hasHairpinChain` (gosimple)
        if hasHairpinChain != true {
           ^
pkg/controllers/proxy/network_services_controller.go:1237:43: S1019: should use make(map[string][]string) instead (gosimple)
        rulesNeeded := make(map[string][]string, 0)
                                                 ^
pkg/controllers/proxy/network_services_controller.go:1111:4: S1023: redundant break statement (gosimple)
                        break
                        ^
pkg/controllers/proxy/network_services_controller.go:1114:4: S1023: redundant break statement (gosimple)
                        break
                        ^
pkg/controllers/proxy/network_services_controller.go:1117:4: S1023: redundant break statement (gosimple)
                        break
                        ^
pkg/controllers/proxy/network_services_controller.go:445:21: Error return value of `nsc.publishMetrics` is not checked (errcheck)
                nsc.publishMetrics(nsc.serviceMap)
                                  ^
pkg/controllers/proxy/network_services_controller.go:1609:9: Error return value of `h.Write` is not checked (errcheck)
        h.Write([]byte(ip + "-" + protocol + "-" + port))
               ^
pkg/controllers/proxy/network_services_controller.go:912:13: Error return value of `netns.Set` is not checked (errcheck)
                        netns.Set(hostNetworkNamespaceHandle)
                                 ^
pkg/controllers/proxy/network_services_controller.go:926:13: Error return value of `netns.Set` is not checked (errcheck)
                        netns.Set(hostNetworkNamespaceHandle)
                                 ^
pkg/controllers/proxy/network_services_controller.go:950:13: Error return value of `netns.Set` is not checked (errcheck)
                        netns.Set(hostNetworkNamespaceHandle)
                                 ^
pkg/controllers/proxy/network_services_controller.go:641:9: SA4006: this value of `err` is never used (staticcheck)
        addrs, err := getAllLocalIPs()
               ^

* network_routes_controller: Lint

pkg/controllers/routing/network_routes_controller.go:340:2: S1000: should use for range instead of for { select {} } (gosimple)
        for {
        ^
pkg/controllers/routing/network_routes_controller.go:757:22: Error return value of `nrc.bgpServer.Stop` is not checked (errcheck)
                        nrc.bgpServer.Stop()
                                          ^
pkg/controllers/routing/network_routes_controller.go:770:22: Error return value of `nrc.bgpServer.Stop` is not checked (errcheck)
                        nrc.bgpServer.Stop()
                                          ^
pkg/controllers/routing/network_routes_controller.go:782:23: Error return value of `nrc.bgpServer.Stop` is not checked (errcheck)
                                nrc.bgpServer.Stop()
                                                  ^
pkg/controllers/routing/network_routes_controller.go:717:12: Error return value of `g.Serve` is not checked (errcheck)
        go g.Serve()

* ipset: Lint

pkg/utils/ipset.go:243:23: Error return value of `entry.Set.Parent.Save` is not checked (errcheck)
        entry.Set.Parent.Save()
                             ^

* pkg/cmd/kube-router: Lint

pkg/cmd/kube-router.go:214:26: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
                fmt.Fprintf(os.Stderr, output)
                                       ^
pkg/cmd/kube-router.go:184:15: SA1017: the channel used with signal.Notify should be buffered (staticcheck)
        signal.Notify(ch, syscall.SIGINT, syscall.SIGTERM)
                     ^
pkg/cmd/kube-router.go:94:17: Error return value of `hc.RunServer` is not checked (errcheck)
        go hc.RunServer(stopCh, &wg)
                       ^
pkg/cmd/kube-router.go:112:16: Error return value of `hc.RunCheck` is not checked (errcheck)
        go hc.RunCheck(healthChan, stopCh, &wg)
                      ^
pkg/cmd/kube-router.go:121:12: Error return value of `mc.Run` is not checked (errcheck)
                go mc.Run(healthChan, stopCh, &wg)
                         ^

* cmd/kube-router/kube-router: Lint

cmd/kube-router/kube-router.go:31:24: Error return value of `flag.CommandLine.Parse` is not checked (errcheck)
        flag.CommandLine.Parse([]string{})
                              ^
cmd/kube-router/kube-router.go:33:10: Error return value of `flag.Set` is not checked (errcheck)
        flag.Set("logtostderr", "true")
                ^
cmd/kube-router/kube-router.go:34:10: Error return value of `flag.Set` is not checked (errcheck)
        flag.Set("v", config.VLevel)
                ^
cmd/kube-router/kube-router.go:62:27: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
                        fmt.Fprintf(os.Stdout, http.ListenAndServe("0.0.0.0:6060", nil).Error())
                                               ^

* kube-router_test: Lint

cmd/kube-router/kube-router_test.go:21:10: Error return value of `io.Copy` is not checked (errcheck)
                io.Copy(stderrBuf, stderrR)
                       ^
cmd/kube-router/kube-router_test.go:40:17: Error return value of `docBuf.ReadFrom` is not checked (errcheck)
        docBuf.ReadFrom(docF)
                       ^

* service_endpoints_sync: Lint

pkg/controllers/proxy/service_endpoints_sync.go:460:2: ineffectual assignment to `ipvsSvcs` (ineffassign)
        ipvsSvcs, err := nsc.ln.ipvsGetServices()
        ^
pkg/controllers/proxy/service_endpoints_sync.go:311:5: SA4006: this value of `err` is never used (staticcheck)
                                err = nsc.ln.ipAddrDel(dummyVipInterface, externalIP)
                                ^

* node: Lint

pkg/utils/node.go:19:16: SA1019: clientset.Core is deprecated: please explicitly pick a version if possible.  (staticcheck)
                node, err := clientset.Core().Nodes().Get(nodeName, metav1.GetOptions{})
                             ^
pkg/utils/node.go:27:15: SA1019: clientset.Core is deprecated: please explicitly pick a version if possible.  (staticcheck)
        node, err := clientset.Core().Nodes().Get(hostName, metav1.GetOptions{})
                     ^
pkg/utils/node.go:34:15: SA1019: clientset.Core is deprecated: please explicitly pick a version if possible.  (staticcheck)
                node, err = clientset.Core().Nodes().Get(hostnameOverride, metav1.GetOptions{})
                            ^

* aws: Lint

pkg/controllers/routing/aws.go:31:8: SA4006: this value of `err` is never used (staticcheck)
                URL, err := url.Parse(providerID)
                     ^

* health_controller: Lint

pkg/healthcheck/health_controller.go:54:10: Error return value of `w.Write` is not checked (errcheck)
                w.Write([]byte("OK\n"))
                       ^
pkg/healthcheck/health_controller.go:68:10: Error return value of `w.Write` is not checked (errcheck)
                w.Write([]byte("Unhealthy"))
                       ^
pkg/healthcheck/health_controller.go:159:2: S1000: should use a simple channel send/receive instead of `select` with a single case (gosimple)
        select {
        ^

* network_routes_controller_test: Lint

pkg/controllers/routing/network_routes_controller_test.go:1113:37: Error return value of `testcase.nrc.bgpServer.Stop` is not checked (errcheck)
                        defer testcase.nrc.bgpServer.Stop()
                                                         ^
pkg/controllers/routing/network_routes_controller_test.go:1314:37: Error return value of `testcase.nrc.bgpServer.Stop` is not checked (errcheck)
                        defer testcase.nrc.bgpServer.Stop()
                                                         ^
pkg/controllers/routing/network_routes_controller_test.go:2327:37: Error return value of `testcase.nrc.bgpServer.Stop` is not checked (errcheck)
                        defer testcase.nrc.bgpServer.Stop()
                                                         ^

* .golangci.yml: Increase timeout

Default is 1m, increase to 5m otherwise travis might fail

* Makefile: Update golangci-lint to 1.27.0

* kube-router_test.go: defer waitgroup

Co-authored-by: Aaron U'Ren <aauren@users.noreply.github.com>

* network_routes_controller: Incorporate review

* bgp_policies: Incorporate review

* network_routes_controller: Incorporate review

* bgp_policies: Log error instead

* network_services_controller: Incorporate review

Co-authored-by: Aaron U'Ren <aauren@users.noreply.github.com>
2020-06-03 22:29:06 +02:00
Aaron U'Ren
cb48a7f87b
fix(network_routes): missing node ip -> error log (#904)
Before we used to raise an error when a node was missing an IP, but it
turns out that this is not a required attribute of a node. And while it
is rare that a node would be missing an IP address, a node doesn't
require an IP address or a name or really much of anything in order to
exist.

This brings us to stronger conformance with the Kubernetes API and makes
it so that kube-router logs errors rather than changing it's health
status and potentially causing cascading failures across the fleet if a
user adds a node like this.
2020-05-26 00:18:21 +05:30