includes workaround for musl hardcoded protocol table that
is missing SCTP support by using protocol name to
numeric value mapping in ipset entries
closes: https://github.com/cloudnativelabs/kube-router/issues/1019
Signed-off-by: Roman Kuzmitskii <roman@damex.org>
It used to be when we were using iproute2's CLI we needed to have the
fwmark as a hex number so we were passing it as a string in that format.
However, now that we use the netlink library directly, we already have
the fwmark in the condition that we need it. So instead of doing all of
these string <-> int conversions, lets just keep this simpler.
This prepares the way for broader refactors in the way that we handle
nodes by:
* Separating frequently used node logic from the controller creation
steps
* Keeping reused code DRY-er
* Adding interface abstractions for key groups of node data and starting
to rely on those more rather than concrete types
* Separating node data from the rest of the controller data structure so
that it smaller definitions of data can be passed around to functions
that need it rather than always passing the entire controller which
contains more data / surface area than most functions need.
Only attempt to setup DSR inside containers for local endpoints. Setting
up DSR inside the containers network namespace requires local pods /
endpoints.
Upgrades to Go 1.21.7 now that Go 1.20 is no longer being maintained.
It also, resolves the race conditions that we were seeing with BGP
server tests when we upgraded from 1.20 -> 1.21. This appears to be
because some efficiency changed in 1.21 that caused BGP to write to the
events at the same time that the test harness was trying to read from
them. Solved this in a coarse manner by adding surrounding mutexes to
the test code.
Additionally, upgraded dependencies.
NodePort Health Check has long been part of the Kubernetes API, but
kube-router hasn't implemented it in the past. This is meant to be a
port that is assigned by the kube-controller-manager for LoadBalancer
services that have a traffic policy of `externalTrafficPolicy=Local`.
When set, the k8s networking implementation is meant to open a port and
provide HTTP responses that inform parties external to the Kubernetes
cluster about whether or not a local endpoint exists on the node. It
should return a 200 status if the node contains a local endpoint and
return a 503 status if the node does not contain a local endpoint.
This allows applications outside the cluster to choose their endpoint in
such a way that their source IP could be preserved. For more details
see:
https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-loadbalancer
Adds support for spec.internalTrafficPolicy and fixes support for
spec.externalTrafficPolicy so that it only effects external traffic.
Keeps existing support for kube-router.io/service-local annotation which
overrides both to local when set to true. Any other value in this
annotation is ignored.
With the advent of IPv6 integrated into the NSC we no longer get all IPs
from endpoints, but rather just the primary IP of the pod (which is
often, but not always the IPv4 address).
In order to get all possible endpoint addresses for a given service we
need to switch to using EndpointSlice which also nicely groups addresses
into IPv4 and IPv6 by AddressType and also gives us more information
about the endpoint status by giving us attributes for serving and
terminating, instead of just ready or not ready.
This does mean that users will need to add another permission to their
RBAC in order for kube-router to access these objects.
lookupFWMarkByService() was previous returning an error when no fwmark
was found in the tracking map for a given service. However, this isn't
really an error condition and shouldn't be treated as such. When it was
treated as an error condition users got a lot of confusing errors in the
logs.
* fact(NSC): consolidate constants to top
* fix(NSC): increase IPVS add service logging
* fix(NSC): improve logging for FWMark IPVS entries
* fix(NSC): add missing parameter to logging
* feat(NSC): generate unique FW marks
Because we trim the 32-bit FNV-1a hash to 16 bits there is the potential
for FW marks to collide with each other even for unique inputs of IP,
protocol, and port. This reduces that chance up to the 16-bit max by
keeping track of which FW marks we've already allocated and what IP,
protocol, port combo they've been allocated for.
Fixes#1045
* fact(NSC): move utility funcs to utils
* fix(NSC): reduce IPVS service shell outs
This also aligns it more with the almost identical function used for
non-FWmarked services ipvsAddService() which is also called from
setupExternalIPServices and passes in this same list of ipvsServices.
* fix(NSC): fix & consolidate DSR cleanup code
A lot of this is refactor work, but its important to know why the DSR
mangle tables were not being cleaned up in the first place. When we
transitioned to iptables-save to look over the mangle rules, we didn't
realize that iptables-save changes the format of the marks from integer
values (which is what the CLI works with) to hexadecimal.
This made it so that we were never actually matching on a mangle rule,
which left them all behind. When these mangle rules were left, it meant
that IPs that used to be part of a DSR service were essentially
black-holed on the system and were no longer route-able.
Fixes#1167
* doc(dsr): expand DSR documentation
fixes#1055
* ensure active service map is updated for non DSR services
Co-authored-by: Murali Reddy <muralimmreddy@gmail.com>
* remove IPVS metrics
Remove metrics for IPVS services when the IPVS service is deleted so
that the number of metrics does not grow without bound.
Fixes#734
* delete metricsMap key when IPVS service is removed
Delete the key in NetworkServicesController.metricsMap when the
respective IPVS configuration is removed.
Remove a period from a comment to conform to kube-router norms
* cleanup stale metrics in a distinct method
* remove unnecessary error return value on cleanupStaleMetrics