Added the following items to the original logic:
* Added map route entry deletion on withdrawl so that the system doesn't
incorrectly sync it back to the kernel's routing table
* Added an immediate route sync upon BGP path receive
* Added a mutex to ensure that deleted routes aren't accidentally synced
back to the system
* Added stopCh and wg (wait group) handling
* Increase default sync time from 15 seconds to 1 minute since this
scenario is unlikely and netlink calls could potentially be burdensome
in large clusters.
* fact(NSC): consolidate constants to top
* fix(NSC): increase IPVS add service logging
* fix(NSC): improve logging for FWMark IPVS entries
* fix(NSC): add missing parameter to logging
* feat(NSC): generate unique FW marks
Because we trim the 32-bit FNV-1a hash to 16 bits there is the potential
for FW marks to collide with each other even for unique inputs of IP,
protocol, and port. This reduces that chance up to the 16-bit max by
keeping track of which FW marks we've already allocated and what IP,
protocol, port combo they've been allocated for.
Fixes#1045
* fact(NSC): move utility funcs to utils
* fix(NSC): reduce IPVS service shell outs
This also aligns it more with the almost identical function used for
non-FWmarked services ipvsAddService() which is also called from
setupExternalIPServices and passes in this same list of ipvsServices.
* fix(NSC): fix & consolidate DSR cleanup code
A lot of this is refactor work, but its important to know why the DSR
mangle tables were not being cleaned up in the first place. When we
transitioned to iptables-save to look over the mangle rules, we didn't
realize that iptables-save changes the format of the marks from integer
values (which is what the CLI works with) to hexadecimal.
This made it so that we were never actually matching on a mangle rule,
which left them all behind. When these mangle rules were left, it meant
that IPs that used to be part of a DSR service were essentially
black-holed on the system and were no longer route-able.
Fixes#1167
* doc(dsr): expand DSR documentation
fixes#1055
* ensure active service map is updated for non DSR services
Co-authored-by: Murali Reddy <muralimmreddy@gmail.com>
* feat: simple CRI implementation in addition to Docker, required for DSR functionality. CRI compliant runtimes support (e.g. containerd, cri-o, etc.)
* upd: dependencies
* cleanup
* feat: cleanup gRPC connections after we did the job
* upd: go.sum
* Add support for reading peer passwords via a file
Syntax of the file is the same as for --peer-router-passwords, that is,
a comma separated list of base64 encoded passwords.
Passwords specified with --peer-router-passwords have precedence over
passwords read from peer-router-passwords-file.
* fix(options): peer password file linting and doc
Co-authored-by: Jean Raby <jean@raby.sh>
* fact(network_policy): validate ClusterIP CIDR
Ensure that --service-cluster-ip-range is a valid CIDR while controller
is starting up.
* fix(network_policy): parse/validate NodePort
Validate the NodePort range that is passed and allow for it to be
specified with hyphens which is what the previous example used to show
and is more cohesive with the way NodePort ranges are specified when
passed to the kube-apiserver.
* test(network_policy): add tests for input validation
* feat(network_policy): permit ExternalIP on input
fixes#934
* fix(network_policy): ensure pos with index offset
Because iptables list function now appears to be returning -N and -P
items in the chain results, we need to account for them when taking into
consideration the rule position.
* fix(network_policy): add uuid to comments on ensure
iptables list is now no longer keeping the position of parameters which
means that we can't compare string to string. In absence of a better way
to handle this, this adds a UUID to the comment string which can then be
looked for when determining what position a rule occupies.
* whitelist traffic to cluster IP and node ports in INPUT chain to bypass
netwrok policy enforcement
Fixes#905
* fix unit test failure
* ensure netpol firewall rules are configured after service proxy firewall rules
* Added flag and condition for open input on iptables #797
* Adding flag to docs.
* Updated to remove INPUT/CHAIN entirely. Name changed to IpvsDenyAll.
* Updated README.
* Updated docstring on ipvs-deny-all
* ipvsDenyAll -> ipvsPermitAll
* Updating user guide.
* Descriptions updates per review
GoBGP's default value for deferral time is 360 seconds.
That means that the routes are not sent to the BGP peer until
this timer is elapsed, so a server is unreachable for 360
seconds, when kube-router restarts.
The new parameter is --bgp-graceful-restart-deferral-time duration_with_unit
For example '--bgp-graceful-restart-deferral-time 10s'
* update netlink
* update libnetwork to get ipvs stats
* update gopkg.lock for libnetwork update
* update libnetwork
* add cli options
* make endpoints delete gracefully
* move conntrack flusher
* get some order in the mainloop
* update to alpine 3.9 & go 1.11.1
* revert to 1.10.3 just update alpine
* and revert travis.yml
* lock version
* test 1.12
* test
* Introduces the option --full-overlay, to always generate IPIP tunnels regardless of node subnets
* Use --overlay-type={subnet,full} instead of --full-overlay={true,false}
* add unit tests for implementing #75
Signed-off-by: Steven Armstrong <steven.armstrong@id.ethz.ch>
* integration tests for #75
Signed-off-by: Steven Armstrong <steven.armstrong@id.ethz.ch>
* update docs for #75
Signed-off-by: Steven Armstrong <steven.armstrong@id.ethz.ch>
* define new kube-router.io/service.advertise.* annotations
Signed-off-by: Steven Armstrong <steven.armstrong@id.ethz.ch>
* Implement per service annotations for advertising IPs.
Signed-off-by: Steven Armstrong <steven.armstrong@id.ethz.ch>
* more consistent annotation names
Signed-off-by: Steven Armstrong <steven.armstrong@id.ethz.ch>
* remove redundant tests
Signed-off-by: Steven Armstrong <steven.armstrong@id.ethz.ch>
When the number of nodes in a cluster is high enough, the
`disableSourceDestinationCheck()` logic creates a high number
of requests to EC2, resulting in throttling and subsequent
problems, such as the inability to attach EBS volumes. This is
not necessarily mitigated by the `ec2IamAuthorized` attribute
which was added to overcome this issue, as the number of
requests can still be high enough to reach Amazon's request
limits. In addition, it is not necessary to run this multiple
times in a loop for all the nodes in a cluster, as it is
sufficient to set it once when an instance boots.
This CLI option allows an administrator to turn off this
feature for kube-router so they can use some other means of
setting the attribute.
* Introduced new cmdline flag --bgp-port, which controls BGP Server listening port and remote port of in-cluster node peers
* Introduced new cmdline flag --peer-router-ports, which controls remote BGP port for external peers
* Introduced new node annotation kube-router.io/peer.ports with same effect as --peer-router-ports
* Introduces the option --override-nexthop, setting it to true will make
advertised next hop for the routers to the peers will be automatically
selected to be appropriate reachable local IP. This will be overrider
any next-hop set for the routes in the RIB. Kube-router by defauly set
the next-hop to `node IP` which is not correct in case of nodes with
multiple interfaces and use differnt interaces for differect external
peers.
Fixes#480
* add next-hop-self documentation