183 Commits

Author SHA1 Message Date
Aaron U'Ren
8afdee87d9 fact(NSC): differentiate headless services
Differentiate headless services from ClusterIP being none, in
preparation for handling the service.kubernetes.io/headless label. One
might thing that handling these is similar, which it sort of is and sort
of isn't. ClusterIP is an immutable field, whereas labels are mutable.
This changes our handling of ClusterIP none-ness from the presence of
the headless label.

When we consider what to do with ClusterIP being none, that is
fundamentally different, because once it is None, the k8s API guarantees
that the service won't ever change.

Whereas the label can be added and removed.
2024-01-05 10:27:23 -06:00
Martin -nexus- Mlynář
66890d5f12 feat: Disable binding overlay tunnels to specific device 2023-10-30 08:05:26 -05:00
Aaron U'Ren
1a4896f465 feat(lint): upgrade golangci-lint v1.50.1 -> v1.54.2 2023-10-07 14:20:28 -05:00
Aaron U'Ren
678b7129c3 fix(ecmp_vip.go): non-local service advertisement
With advertiseService set to false by default, it means that it won't
ever get re-evaluated if the service isn't a local host and will ALWAYS
result in withdrawing the VIPs which is incorrect. It needs to default
to true, and only override the boolean if serviceLocal is set to true.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
4c6e19f2e1 feat(ipset): consolidate ipset usage across controllers
Before this, we had 2 different ways to interact with ipsets, through
the handler interface which had the best handling for IPv6 because NPC
heavily utilizes it, and through the ipset struct which mostly repeated
the handler logic, but didn't handle some key things.

NPC utilized the handler functions and NSC / NRC mostly utilized the old
ipset struct functions. This caused a lot of duplication between the two
groups of functions and also caused issues with proper IPv6 handling.

This commit consolidates the two sets of usage into just the handler
interface. This greatly simplifies how the controllers interact with
ipsets and it also reduces the logic complexity on the ipset side.

This also fixes up some inconsistency with how we handled IPv6 ipset
names. ipset likes them to be prefixed with inet6:, but we weren't
always doing this in a way that made sense and was consistent across all
functions in the ipset struct.
2023-10-07 08:52:31 -05:00
Brad Davidson
aa107d6376 Make metrics registerer/gathererer replacable
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-07 08:52:31 -05:00
Aaron U'Ren
5cf1265fb7 fix(NRC): prevent adding routes with mixed families 2023-10-07 08:52:31 -05:00
Aaron U'Ren
bab0d4ff83 feat(bgp_policies.go): don't override-nexthop for internal peers
Previously when a user selected to override the next-hop via GoBGP's
NextHopActions: Self functionality, we did it for all exported routes.
However, in a dual-stack use-case this causes problems for internal pod
IP routes that are spread via BGP advertisements.

Currently, kube-router only peers with an internal peer once over
whatever it's primary IP is according to it's Kubernetes node
information. This means that when overriding next-hop the IP is either
an IPv4 or IPv6 address depending on how the node has configured itself.
Therefore when it attempts to add a route for an IPv6 address and
override next-hop is configured, if the node's primary IP was an IPv4
address this will not succeed as a next-hop for an IPv6 address cannot
be an IPv4 gateway.

Rather than making the code base overly complicated with both an IPv4
and IPv6 peering for internal nodes, this change presents a bit of a
middle ground. By choosing not to override the next-hop for pod subnet
advertisements to internal (Kubernetes node) peers, we eliminate this
problem.

This does change the functionality of kube-router a bit, but one of the
foundational aspects to Kubernetes networking is that all nodes should
be able to contact each other. So I cannot currently think of a good
use-case where overriding the next-hop for pod subnets of internal peers
would be necessary, so I think that this is an ok concession to make.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
944ab91725 fix(FoU): make more robust
FoU implementation now properly handles a whole host of things:

* It now actually handles IPv6 by changing the encapsulation protocol to
  GUE instead of generic FoU. I worked with generic FoU tunnels for
  several days and could get it to support IPv4 and IPv6 at all even
  when placing using it with the IPv6 proto and with iproute2 in IPv6
  mode (-6)
* It now handles converting between the two tunnel types seemlessly and
  without leaving legacy tunnel artifacts behind. Previously, you could
  change the encap type but it wouldn't change the tunnels
* Abstracted constants
2023-10-07 08:52:31 -05:00
Aaron U'Ren
bac4ae6299 fix(FoU): add docs, sanity checking, and logic reduction 2023-10-07 08:52:31 -05:00
Kartik Raval
2a57d6c163 Adding FoU encapsulation over IPIP tunnel : added checks for restart and multi-node cases 2023-10-07 08:52:31 -05:00
Kartik Raval
6ce37e6167 Support for FoU encapsulation for IPIP tunnel 2023-10-07 08:52:31 -05:00
Aaron U'Ren
384ed97a76 fix(bgp_policy): allow for statement add / remove
The previous version of the bgp_policies code only allowed for creating
a policy when the policy didn't exist already. However, with the advent
of dual-stack we need to be able to add / remove statements if we add or
lose a specific IP family (e.g. IPv4 or IPv6) since they are handled in
different statements.

Given that the owner of GoBGP has let us know that policies are
idempotent, this now involves quite a bit of work. We need to follow the
following procedure:

add statements if missing -> add them to a policy -> if policy doesn't
  equal the one already in GoBGP -> create the new policy and associate
  it -> de-associate the old policy -> remove the old policy
2023-10-07 08:52:31 -05:00
Aaron U'Ren
1d5c9ce25c fix(ecmp_vip): update VIPs based on svc change
Previously we used to do an idempotent sync all active VIPs any time we
got a service or endpoint update. However, this only worked when we
assumed a single-stack deployment model where IPs were never deleted
unless the whole service was deleted.

In a dual-stack model, we can add / remove LoadBalancer IPs and Cluster
IPs on updates. Given this, we need to take into account the finite
change that happens, and not just revert to sync-all because we'll never
stop advertising IPs that should be removed.

As a fall-back, we still have the outer Run loop that syncs all active
routes every X amount of seconds (configured by user CLI parameter). So
on that timer we'll still have something that syncs all active VIPs and
works as an outer control loop to ensure that desired state eventually
becomes active state if we accidentally remove a VIP that should have
been there.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
f5ac980b23 fix(bgp_policies.go): return -> continue on family set evaluation
When a single IP family's set looks to be equal, switch to continue
instead of return so that other families can still be evaluated as those
might have changes.
2023-10-07 08:52:31 -05:00
Erik Larsson
3387f5c1c6 use JoinHostPort for GRPC listen address
Signed-off-by: Erik Larsson <who+github@cnackers.org>
2023-10-07 08:52:31 -05:00
Erik Larsson
76ffcbdb13 add generation of router id based on hash of primary IP
When enabled, generate the router id by hashing the primary IP.
With this no explicit router id has to be provided on IPv6-only clusters.

Signed-off-by: Erik Larsson <who+github@cnackers.org>
2023-10-07 08:52:31 -05:00
Aaron U'Ren
57c9b08643 fix(ecmp_vip.go): ClusterIP -> ClusterIPs
Use ClusterIPs rather than ClusterIP so that we get all of the possible
IP addresses rather than just one.

Fixes #1443
2023-10-07 08:52:31 -05:00
Aaron U'Ren
fe939782c6 feat(bgp_policies_test.go): use different IP ranges
Use different IP ranges in BGP Policies unit test so that it becomes
more obvious when there are unit test failures resulting from
multi-processing of unit tests.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
31c22ff634 fix(bgp_policies.go): don't get BGP peers twice
Fixes a problem where a user would end up with redundant external peers
in their BGP policies because getting peers is IP family agnostic and
yet is run twice on the same list.

This also ruined unit test consistency.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
06f5f8babf feat(go): update package version to /v2
Do the necessary to update kube-router to a new major version following
upstream documentation: https://go.dev/doc/modules/major-version
2023-10-07 08:52:31 -05:00
Aaron U'Ren
367aedf846 fix(bgp_policies): add empty DS set checking
Without this logic, it appears that sometimes GoBGP is inclined to match
unintentional routes in policy because of the MATCHSET_ANY declaration
and the way that it interacts with empty sets.

In my testing, without this logic I found that it often resulted in
various routes not being advertised correctly and not even showing up in
GoBGP itself. My current guess is that policy keeps GoBGP from importing
the route into the RIB even from the Protobuf socket connection that
kube-router establishes directly.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
aeb51ba697 fact(bgp_policies): rename clusterIPPrefixSet -> serviceVIPIPPrefixSet 2023-10-07 08:52:31 -05:00
Aaron U'Ren
6e03836081 fact(bgp_policies): abstract get DS for GoBGP
We do a lot of getting defined sets for GoBGP and are planning to do
more of it in the future. This commit centralizes the logic for this and
reduces repetition.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
67254ad22d fix(ecmp_vip): handle ipv4 & ipv6 protocols 2023-10-07 08:52:31 -05:00
Aaron U'Ren
5f952e0f28 test(bgp_policies_test): add local address 2023-10-07 08:52:31 -05:00
Aaron U'Ren
5d7f62c5b3 fix(NRC): ensure local addr IP is bindable early 2023-10-07 08:52:31 -05:00
Aaron U'Ren
67abc4b80e fix(bgp_peers): adv. AfiSafi based on capabability
Advertise IPv4 / IPv6 AfiSafi capability based upon node's capabilities
rather than limiting to the node's configured protocol.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
c491bcb48d fix(bgp_peers): do peer only if IP protos match
For configured BGP peers only attempt peering if IP protos match,
otherwise skip and log warning
2023-10-07 08:52:31 -05:00
Aaron U'Ren
0023dedc4d fix(NRC): error when nec. host IP not found
If we can't find an appropriate IP to add for nextHop to injectRoute or
overlay tunnel, raise error rather than trying to continue.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
4f284be53e fix(NRC): add IPv6 logic to bgp-local-addresses 2023-10-07 08:52:31 -05:00
Aaron U'Ren
ddb0e63c46 feat(NRC): make NRC dual stack 2023-10-07 08:52:31 -05:00
Aaron U'Ren
01f2ff2aa1 fact(NRC): convert BGP set names to const
Convert all BGP set names to constants and then refer to them via the
constant across the code base so that we reduce the effect of typos.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
85cecb6e61 feat(pod_cidr): handle multiple pod CIDRs 2023-10-07 08:52:31 -05:00
Aaron U'Ren
5d7189734e fix(NRC): withdraw advertised VIPs based on annotation
Annotations were taken into account during startup, but after they were
advertised the affect of annotations was only additive because we
were only tracking current state of VIPs that should be advertised and
not taking into account VIPs that should be withdrawn for anything other
than service locality.

Fixes #1491
2023-07-17 08:20:05 -05:00
Kevin Sauter
4c751b0904 Register BGP sent metric 2023-01-31 17:24:22 -06:00
Kevin Sauter
4c7ca8afe6 Add sent metric to vip bgp announcement. To distinguish between the different sent counters, the new "type" label can be used. 2023-01-31 17:24:22 -06:00
Richard Kojedzinszky
e6fd1b2519
Support for kube-router.io/peer.localips annotation (#1392)
* Support for kube-router.io/peer.localips annotation

* Fix checking for valid addresses in kube-router.io/peer.localips
2022-11-15 15:19:29 -06:00
Tamihiro Lee
efd100154f fix invalid MTU in CNI config file 2022-10-20 08:48:36 -05:00
Manuel Rüger
1d37130447 Fix linting 2022-10-17 11:37:07 -05:00
Aaron U'Ren
4615e85496 fix(bgp): set graceful restart on enabled family
Rather than setting BGP Graceful Restart on both IPv4 and IPv6
regardless of which family is enabled, check the current mode via
nrc.isIpv6 and only set on appropriate family.

Note, this mode is exclusive as the current portions of NRC kube-router
code are only meant to work with IPv4 or IPv6 not both at the same time.

Fixes #1323
2022-07-12 19:44:15 -05:00
Aaron U'Ren
f97eb7cc1a fix: remove multiple MTU reductions
fixes cloudnativelabs#1033
2022-06-24 17:51:49 -05:00
Aaron U'Ren
e370cb018d gobgp: update to 3.X 2022-06-11 12:03:27 -05:00
Xiang Liu
8fcebb3106 fix(constant): use constant from resourcelock package 2022-05-26 22:55:40 -05:00
Aaron U'Ren
3771745872 fix(customimportreject): reject all in subnet
Changes the custom import reject annotation support to not only block
the given subnet exactly, but also all subnets of the subnet given.

For example, this change blocks 10.100.100.0/24 when customimportreject
annotation has 10.100.0.0/16 in it.
2022-03-23 09:27:38 -05:00
Lucas Mundim
badf8645be feat(bgp): add custom BGP import rejection policy support via node annotation 2022-03-23 09:27:38 -05:00
Aaron U'Ren
2d9fb92547 test(sync_routes): add unit testing 2022-03-18 15:02:02 -05:00
Aaron U'Ren
4fd7bc4d19 fix(sync_routes): add deletion / immediate syncing
Added the following items to the original logic:
* Added map route entry deletion on withdrawl so that the system doesn't
  incorrectly sync it back to the kernel's routing table
* Added an immediate route sync upon BGP path receive
* Added a mutex to ensure that deleted routes aren't accidentally synced
  back to the system
* Added stopCh and wg (wait group) handling
* Increase default sync time from 15 seconds to 1 minute since this
  scenario is unlikely and netlink calls could potentially be burdensome
  in large clusters.
2022-03-18 15:02:02 -05:00
RusoX89
23ac78cf94 Routes Synchronization Routine 2022-03-18 15:02:02 -05:00
Tamihiro Lee
1db19931a2 skip binding device to ipip tunnel if node's interface is loopback 2022-03-11 16:41:14 -06:00