125 Commits

Author SHA1 Message Date
Aaron U'Ren
8afdee87d9 fact(NSC): differentiate headless services
Differentiate headless services from ClusterIP being none, in
preparation for handling the service.kubernetes.io/headless label. One
might thing that handling these is similar, which it sort of is and sort
of isn't. ClusterIP is an immutable field, whereas labels are mutable.
This changes our handling of ClusterIP none-ness from the presence of
the headless label.

When we consider what to do with ClusterIP being none, that is
fundamentally different, because once it is None, the k8s API guarantees
that the service won't ever change.

Whereas the label can be added and removed.
2024-01-05 10:27:23 -06:00
Aaron U'Ren
30d37695d6 fact(NSC): update Errorf syntax 2024-01-05 10:27:23 -06:00
Aaron U'Ren
a0fe844a93 feat(NSC): honor service-proxy-name label
Abide the service.kubernetes.io/service-proxy-name label as defined by
the upstream standard here:
https://github.com/kubernetes-sigs/kpng/blob/master/doc/service-proxy.md#ignored-servicesendpoints

Resolves the failing e2e test:
should implement service.kubernetes.io/service-proxy-name

Fixes: #979
2024-01-05 10:27:23 -06:00
Aaron U'Ren
ced5102d99 feat(NSC): add IPVS service timeouts
This is a feature that has been requested a few times over the years and
would bring us closer to feature parity with other k8s network
implementations for service proxy.
2023-12-26 14:26:11 -06:00
Aaron U'Ren
aebaa48ea1 fix(NSC): handle endpoint slice ready nil
In some cases it is possible for Endpoint.Conditions.Ready to be nil
during the early stages of initialization. When this happens it causes
kube-router to segfault. This fix tests for nil before testing for
Ready.
2023-12-08 14:38:50 -06:00
Aaron U'Ren
0f3714b9b7 fix(hairpin): set hairpin_mode for veth iface
It used to be that the kubelet handled setting hairpin mode for us:
https://github.com/kubernetes/kubernetes/pull/13628

Then this functionality moved to the dockershim:
https://github.com/kubernetes/kubernetes/pull/62212

Then the functionality was removed entirely:
https://github.com/kubernetes/kubernetes/commit/83265c9171f

Unfortunately, it was lost that we ever depended on this in order for
our hairpin implementation to work, if we ever knew it at all.
Additionally, I suspect that containerd and cri-o implementations never
worked correctly with hairpinning.

Without this, the NAT rules that we implement for hairpinning don't work
correctly. Because hairpin_mode isn't implemented on the virtual
interface of the container on the host, the packet bubbles up to the
kube-bridge. At some point in the traffic flow, the route back to the
pod gets resolved to the mac address inside the container, at that
point, the packet's source mac and destination mac don't match the
kube-bridge interface and the packet is black-holed.

This can also be fixed by putting the kube-bridge interface into
promiscuous mode so that it accepts all mac addresses, but I think that
going back to the original functionality of enabling hairpin_mode on the
veth interface of the container is likely the lesser of two evils here
as putting the kube-bridge interface into promiscuous mode will likely
have unintentional consequences.
2023-12-07 12:44:51 -06:00
Aaron U'Ren
4cd6d94826 fix(NSC): only run for enabled families
Don't run iptables or ipset logic for disabled families

Fixes #1558
2023-10-19 16:51:21 -05:00
Aaron U'Ren
1a891c33ee fix(dsr): add family specific link inside pod
For IPv6 we need to have family specific links inside the pod to receive
the ip6ip6 and ipip traffic that we are sending.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
514a8af7ed fix(dsr): add family for fwmark 2023-10-07 08:52:31 -05:00
Aaron U'Ren
9abe20d581 fix(NSC): compare all pod IPs for endpoint check
Don't just compare the primary IP according to k8s, but all IPs that the
pod contains.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
4c6e19f2e1 feat(ipset): consolidate ipset usage across controllers
Before this, we had 2 different ways to interact with ipsets, through
the handler interface which had the best handling for IPv6 because NPC
heavily utilizes it, and through the ipset struct which mostly repeated
the handler logic, but didn't handle some key things.

NPC utilized the handler functions and NSC / NRC mostly utilized the old
ipset struct functions. This caused a lot of duplication between the two
groups of functions and also caused issues with proper IPv6 handling.

This commit consolidates the two sets of usage into just the handler
interface. This greatly simplifies how the controllers interact with
ipsets and it also reduces the logic complexity on the ipset side.

This also fixes up some inconsistency with how we handled IPv6 ipset
names. ipset likes them to be prefixed with inet6:, but we weren't
always doing this in a way that made sense and was consistent across all
functions in the ipset struct.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
da73dea69b feat(NSC): use EndpointSlice instead of Endpoints
With the advent of IPv6 integrated into the NSC we no longer get all IPs
from endpoints, but rather just the primary IP of the pod (which is
often, but not always the IPv4 address).

In order to get all possible endpoint addresses for a given service we
need to switch to using EndpointSlice which also nicely groups addresses
into IPv4 and IPv6 by AddressType and also gives us more information
about the endpoint status by giving us attributes for serving and
terminating, instead of just ready or not ready.

This does mean that users will need to add another permission to their
RBAC in order for kube-router to access these objects.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
81bc9e20ef fix(nsc): don't modify netmask during flag setup
There is absolutely no reason that we should ever assume netmasks, and
even if we did, we shouldn't modify them as a side-effect of a
completely different operation. No idea was this was ever coded this
way. Netmask is now set upstream to the appropriate mask for the IP
family.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
903466b745 fix(nsc): fail fast during init
During our initial run, fail fatally when we encounter problems rather
than just continuing on and causing subsequent problems and potentially
burying the real error.
2023-10-07 08:52:31 -05:00
Aaron U'Ren
25ecb098c6 feat(nsc): add dualstack capabilities 2023-10-07 08:52:31 -05:00
Brad Davidson
aa107d6376 Make metrics registerer/gathererer replacable
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-07 08:52:31 -05:00
Aaron U'Ren
06f5f8babf feat(go): update package version to /v2
Do the necessary to update kube-router to a new major version following
upstream documentation: https://go.dev/doc/modules/major-version
2023-10-07 08:52:31 -05:00
Aaron U'Ren
ddb0e63c46 feat(NRC): make NRC dual stack 2023-10-07 08:52:31 -05:00
Aaron U'Ren
85cecb6e61 feat(pod_cidr): handle multiple pod CIDRs 2023-10-07 08:52:31 -05:00
Michal Rostecki
5d04a9fd97 netpol: Add dual-stack support
This change allows to define two cluster CIDRs for compatibility with
Kubernetes dual-stack, with an assumption that two CIDRs are usually
IPv4 and IPv6.

Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
2023-10-07 08:52:31 -05:00
Aaron U'Ren
1d1ff0599a fix(NSC): add check for podCidr before use
Fixes #1434
2023-01-31 12:05:57 -06:00
Manuel Rüger
1d37130447 Fix linting 2022-10-17 11:37:07 -05:00
Aaron U'Ren
f97eb7cc1a fix: remove multiple MTU reductions
fixes cloudnativelabs#1033
2022-06-24 17:51:49 -05:00
Manuel Rüger
9d315229c8 Ignore gosec linting issue 2022-05-29 13:37:32 -05:00
Xiang Liu
8fcebb3106 fix(constant): use constant from resourcelock package 2022-05-26 22:55:40 -05:00
Aaron U'Ren
4b6cf6c896 fact(protocol): standardize protocol conversions 2022-02-11 17:34:10 -06:00
Aaron U'Ren
28aab6ea20 fact(service_endpoints_sync): simplify external IP logic
This is an attempt to make the external IP logic easier to follow and
more straight forward for future changes like consolidating the iptables
logic.
2022-02-11 17:34:10 -06:00
noillir
d27c317891
change to account for internet headers also when setting MSS (#1232)
Co-authored-by: noillir <miq@noillir.eu>
2022-02-02 15:46:18 -06:00
Aaron U'Ren
b74689785a
feat(nsc): only hairpin endpoints on local node (#1208) 2021-12-10 23:19:20 +05:30
Aaron U'Ren
2ca39f14f8 fix(nsc): properly check hairpinning rule
Previously, we would iterate over rulesFromNode, but then check it
against the entirety of the rulesNeeded hash. This resulted in the loop
breaking as soon as it found any matching rule from the host rather than
it breaking if it matched the rule that we were currently processing.
2021-12-03 11:02:55 -06:00
Aaron U'Ren
146786ad8a fix(nsc): sync hairpinning on service modification
When we receive service or endpoint updates from Kubernetes we process a
type of partial sync because the service and endpoints have already been
updated by the handler. However, previously, this partial update did not
include updating the hairpinning rules for services.

This would cause hairpinning changes to be delayed until the next full
sync or until kube-router restart. This changes adds hairpinning into
the partial service sync flow.
2021-12-03 11:02:55 -06:00
Aaron U'Ren
8f13f069b6 fix(nsc): don't overwrite err & add comments 2021-12-03 11:02:55 -06:00
Kailun
bee2c2089f
fix bug when adding ip rule for fwmark (#1178)
Co-authored-by: Kailun Shi <kailun.shi@bytedance.com>
2021-11-05 18:42:24 +05:30
Aaron U'Ren
c3f90c54b3
Fix Misc DSR Issues (#1174)
* fact(NSC): consolidate constants to top

* fix(NSC): increase IPVS add service logging

* fix(NSC): improve logging for FWMark IPVS entries

* fix(NSC): add missing parameter to logging

* feat(NSC): generate unique FW marks

Because we trim the 32-bit FNV-1a hash to 16 bits there is the potential
for FW marks to collide with each other even for unique inputs of IP,
protocol, and port. This reduces that chance up to the 16-bit max by
keeping track of which FW marks we've already allocated and what IP,
protocol, port combo they've been allocated for.

Fixes #1045

* fact(NSC): move utility funcs to utils

* fix(NSC): reduce IPVS service shell outs

This also aligns it more with the almost identical function used for
non-FWmarked services ipvsAddService() which is also called from
setupExternalIPServices and passes in this same list of ipvsServices.

* fix(NSC): fix & consolidate DSR cleanup code

A lot of this is refactor work, but its important to know why the DSR
mangle tables were not being cleaned up in the first place. When we
transitioned to iptables-save to look over the mangle rules, we didn't
realize that iptables-save changes the format of the marks from integer
values (which is what the CLI works with) to hexadecimal.

This made it so that we were never actually matching on a mangle rule,
which left them all behind. When these mangle rules were left, it meant
that IPs that used to be part of a DSR service were essentially
black-holed on the system and were no longer route-able.

Fixes #1167

* doc(dsr): expand DSR documentation

fixes #1055

* ensure active service map is updated for non DSR services

Co-authored-by: Murali Reddy <muralimmreddy@gmail.com>
2021-10-14 16:14:05 +05:30
Aaron U'Ren
8572f3a17f fact(hairpin): remove one last direct ref of KUBE-ROUTER-HAIRPIN 2021-09-13 17:39:28 -05:00
Aaron U'Ren
5e1d033a44 fix(sysctl): revert is fatal check for some conditions 2021-09-13 17:39:28 -05:00
Aaron U'Ren
8f3861de40 fact(sysctl): consolidate sysctl usage into utils 2021-09-11 16:20:07 -05:00
Aaron U'Ren
da5f8e0044 fix: address minor PR feedback and misspells 2021-09-11 16:20:07 -05:00
Aaron U'Ren
1d90e215e9 feat(.golangci.yml): enable stylecheck linter and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
85f28411dc feat(.golangci.yml): enable long lines linter and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
874a746e30 feat(.golangci.yml): enable gosec and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
6208bfac46 feat(.golangci.yml): enable gomnd and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
f52fddddee feat(.golangci.yml): enable gocritic and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
d6ccc22519 feat(.golangci.yml): enable goconst and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
c5f4c00d63 feat(.golangci.yml): enable dupl and remediate 2021-09-11 16:20:07 -05:00
Aaron U'Ren
35d334ca96 fix: add sleeps between iptables and ipset cleanup
I found that without taking a brief pause between iptables cleanup and
ipset deletion, sometimes the system still thought that there were
iptables references to the ipsets and would error instead of cleaning
the ipsets.
2021-08-05 16:39:28 -05:00
Aaron U'Ren
fb070265a2 fix(NSC): actually remove IPVS definitions 2021-08-05 16:39:28 -05:00
Aaron U'Ren
bbc0666a4c fix(NSC): add exists checking to Cleanup() 2021-08-05 16:39:28 -05:00
Billie Cleek
d5a18cac67
remove IPVS metrics (#1133)
* remove IPVS metrics

Remove metrics for IPVS services when the IPVS service is deleted so
that the number of metrics does not grow without bound.

Fixes #734

* delete metricsMap key when IPVS service is removed

Delete the key in NetworkServicesController.metricsMap when the
respective IPVS configuration is removed.

Remove a period from a comment to conform to kube-router norms

* cleanup stale metrics in a distinct method

* remove unnecessary error return value on cleanupStaleMetrics
2021-07-31 01:25:58 +05:30
Aaron U'Ren
e9be04ef2f
fix: add nil checking to ipsetMutex cleanup actions (#1129) 2021-07-20 01:22:48 +05:30