The previous version of the bgp_policies code only allowed for creating
a policy when the policy didn't exist already. However, with the advent
of dual-stack we need to be able to add / remove statements if we add or
lose a specific IP family (e.g. IPv4 or IPv6) since they are handled in
different statements.
Given that the owner of GoBGP has let us know that policies are
idempotent, this now involves quite a bit of work. We need to follow the
following procedure:
add statements if missing -> add them to a policy -> if policy doesn't
equal the one already in GoBGP -> create the new policy and associate
it -> de-associate the old policy -> remove the old policy
Previously we used to do an idempotent sync all active VIPs any time we
got a service or endpoint update. However, this only worked when we
assumed a single-stack deployment model where IPs were never deleted
unless the whole service was deleted.
In a dual-stack model, we can add / remove LoadBalancer IPs and Cluster
IPs on updates. Given this, we need to take into account the finite
change that happens, and not just revert to sync-all because we'll never
stop advertising IPs that should be removed.
As a fall-back, we still have the outer Run loop that syncs all active
routes every X amount of seconds (configured by user CLI parameter). So
on that timer we'll still have something that syncs all active VIPs and
works as an outer control loop to ensure that desired state eventually
becomes active state if we accidentally remove a VIP that should have
been there.
When a single IP family's set looks to be equal, switch to continue
instead of return so that other families can still be evaluated as those
might have changes.
When enabled, generate the router id by hashing the primary IP.
With this no explicit router id has to be provided on IPv6-only clusters.
Signed-off-by: Erik Larsson <who+github@cnackers.org>
Use different IP ranges in BGP Policies unit test so that it becomes
more obvious when there are unit test failures resulting from
multi-processing of unit tests.
Fixes a problem where a user would end up with redundant external peers
in their BGP policies because getting peers is IP family agnostic and
yet is run twice on the same list.
This also ruined unit test consistency.
Without this logic, it appears that sometimes GoBGP is inclined to match
unintentional routes in policy because of the MATCHSET_ANY declaration
and the way that it interacts with empty sets.
In my testing, without this logic I found that it often resulted in
various routes not being advertised correctly and not even showing up in
GoBGP itself. My current guess is that policy keeps GoBGP from importing
the route into the RIB even from the Protobuf socket connection that
kube-router establishes directly.
We do a lot of getting defined sets for GoBGP and are planning to do
more of it in the future. This commit centralizes the logic for this and
reduces repetition.
Annotations were taken into account during startup, but after they were
advertised the affect of annotations was only additive because we
were only tracking current state of VIPs that should be advertised and
not taking into account VIPs that should be withdrawn for anything other
than service locality.
Fixes#1491
Rather than setting BGP Graceful Restart on both IPv4 and IPv6
regardless of which family is enabled, check the current mode via
nrc.isIpv6 and only set on appropriate family.
Note, this mode is exclusive as the current portions of NRC kube-router
code are only meant to work with IPv4 or IPv6 not both at the same time.
Fixes#1323
Changes the custom import reject annotation support to not only block
the given subnet exactly, but also all subnets of the subnet given.
For example, this change blocks 10.100.100.0/24 when customimportreject
annotation has 10.100.0.0/16 in it.
Added the following items to the original logic:
* Added map route entry deletion on withdrawl so that the system doesn't
incorrectly sync it back to the kernel's routing table
* Added an immediate route sync upon BGP path receive
* Added a mutex to ensure that deleted routes aren't accidentally synced
back to the system
* Added stopCh and wg (wait group) handling
* Increase default sync time from 15 seconds to 1 minute since this
scenario is unlikely and netlink calls could potentially be burdensome
in large clusters.