kube-router

mirror of https://github.com/cloudnativelabs/kube-router.git synced 2025-11-19 12:01:17 +01:00

Author	SHA1	Message	Date
Aaron U'Ren	959022fdca	feat(NSC): add endpoint statuses to internal struct Add isReady, isServing, and isTerminating to internal EndpointSlice struct so that downstream consumers have more information about the service to make decisions later on.	2024-03-01 16:52:05 -06:00
Aaron U'Ren	16daa08c7b	feat(NSC): add endpoints that are ready or serving In order to be compliant with upstream network implementation expectations we choose to proxy an endpoint as long as it is either ready OR serving. This means that endpoints that are terminating will still be proxied which makes kube-router conformant with the upstream e2e tests.	2024-03-01 16:52:05 -06:00
Aaron U'Ren	fcd21b4759	feat: fully support service traffic policies Adds support for spec.internalTrafficPolicy and fixes support for spec.externalTrafficPolicy so that it only effects external traffic. Keeps existing support for kube-router.io/service-local annotation which overrides both to local when set to true. Any other value in this annotation is ignored.	2024-01-24 09:05:24 -08:00
Aaron U'Ren	84042603b0	feat: increase unit test coverage Prepare for upcoming changes by increasing unit test coverage to ensure that we correctly handle different boundary conditions when we change how service local / traffic policies work.	2024-01-24 09:05:24 -08:00
Aaron U'Ren	24505f03ae	fact(service_endpoints_sync.go): standardize error handling	2024-01-24 09:05:24 -08:00
Aaron U'Ren	d3cf4d13a7	feat(NSC): add / clarify log messages	2024-01-24 09:05:24 -08:00
Aaron U'Ren	d757f49d55	feat(NSC): honor headless label Abide the service.kubernetes.io/headless label as defined by the upstream standard. Resolves the failing e2e test: should implement service.kubernetes.io/headless	2024-01-05 10:27:23 -06:00
Aaron U'Ren	8afdee87d9	fact(NSC): differentiate headless services Differentiate headless services from ClusterIP being none, in preparation for handling the service.kubernetes.io/headless label. One might thing that handling these is similar, which it sort of is and sort of isn't. ClusterIP is an immutable field, whereas labels are mutable. This changes our handling of ClusterIP none-ness from the presence of the headless label. When we consider what to do with ClusterIP being none, that is fundamentally different, because once it is None, the k8s API guarantees that the service won't ever change. Whereas the label can be added and removed.	2024-01-05 10:27:23 -06:00
Aaron U'Ren	30d37695d6	fact(NSC): update Errorf syntax	2024-01-05 10:27:23 -06:00
Aaron U'Ren	a0fe844a93	feat(NSC): honor service-proxy-name label Abide the service.kubernetes.io/service-proxy-name label as defined by the upstream standard here: https://github.com/kubernetes-sigs/kpng/blob/master/doc/service-proxy.md#ignored-servicesendpoints Resolves the failing e2e test: should implement service.kubernetes.io/service-proxy-name Fixes: #979	2024-01-05 10:27:23 -06:00
Aaron U'Ren	ced5102d99	feat(NSC): add IPVS service timeouts This is a feature that has been requested a few times over the years and would bring us closer to feature parity with other k8s network implementations for service proxy.	2023-12-26 14:26:11 -06:00
Aaron U'Ren	eb462bae08	feat(linux_networking.go): add more error info Direct people to a potentially missing hostPID attribute in their kube-router deployment if they are getting a no such file or directory message.	2023-12-08 17:01:48 -06:00
Aaron U'Ren	aebaa48ea1	fix(NSC): handle endpoint slice ready nil In some cases it is possible for Endpoint.Conditions.Ready to be nil during the early stages of initialization. When this happens it causes kube-router to segfault. This fix tests for nil before testing for Ready.	2023-12-08 14:38:50 -06:00
Aaron U'Ren	0f3714b9b7	fix(hairpin): set hairpin_mode for veth iface It used to be that the kubelet handled setting hairpin mode for us: https://github.com/kubernetes/kubernetes/pull/13628 Then this functionality moved to the dockershim: https://github.com/kubernetes/kubernetes/pull/62212 Then the functionality was removed entirely: https://github.com/kubernetes/kubernetes/commit/83265c9171f Unfortunately, it was lost that we ever depended on this in order for our hairpin implementation to work, if we ever knew it at all. Additionally, I suspect that containerd and cri-o implementations never worked correctly with hairpinning. Without this, the NAT rules that we implement for hairpinning don't work correctly. Because hairpin_mode isn't implemented on the virtual interface of the container on the host, the packet bubbles up to the kube-bridge. At some point in the traffic flow, the route back to the pod gets resolved to the mac address inside the container, at that point, the packet's source mac and destination mac don't match the kube-bridge interface and the packet is black-holed. This can also be fixed by putting the kube-bridge interface into promiscuous mode so that it accepts all mac addresses, but I think that going back to the original functionality of enabling hairpin_mode on the veth interface of the container is likely the lesser of two evils here as putting the kube-bridge interface into promiscuous mode will likely have unintentional consequences.	2023-12-07 12:44:51 -06:00
Martin -nexus- Mlynář	66890d5f12	feat: Disable binding overlay tunnels to specific device	2023-10-30 08:05:26 -05:00
Aaron U'Ren	4cd6d94826	fix(NSC): only run for enabled families Don't run iptables or ipset logic for disabled families Fixes #1558	2023-10-19 16:51:21 -05:00
Aaron U'Ren	1a4896f465	feat(lint): upgrade golangci-lint v1.50.1 -> v1.54.2	2023-10-07 14:20:28 -05:00
Aaron U'Ren	678b7129c3	fix(ecmp_vip.go): non-local service advertisement With advertiseService set to false by default, it means that it won't ever get re-evaluated if the service isn't a local host and will ALWAYS result in withdrawing the VIPs which is incorrect. It needs to default to true, and only override the boolean if serviceLocal is set to true.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	1a891c33ee	fix(dsr): add family specific link inside pod For IPv6 we need to have family specific links inside the pod to receive the ip6ip6 and ipip traffic that we are sending.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	514a8af7ed	fix(dsr): add family for fwmark	2023-10-07 08:52:31 -05:00
Aaron U'Ren	c92f76aadf	fix(service_endpoints_sync.go): use save command	2023-10-07 08:52:31 -05:00
Aaron U'Ren	9abe20d581	fix(NSC): compare all pod IPs for endpoint check Don't just compare the primary IP according to k8s, but all IPs that the pod contains.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	9f23cf5a6e	fix(linux_networking.go): add better error messages	2023-10-07 08:52:31 -05:00
Aaron U'Ren	7ce09a64d9	fix(linux_networking.go): don't return err on warn	2023-10-07 08:52:31 -05:00
Aaron U'Ren	9d63cc689b	feat(debug): add some extra debug at level 3	2023-10-07 08:52:31 -05:00
Aaron U'Ren	4c6e19f2e1	feat(ipset): consolidate ipset usage across controllers Before this, we had 2 different ways to interact with ipsets, through the handler interface which had the best handling for IPv6 because NPC heavily utilizes it, and through the ipset struct which mostly repeated the handler logic, but didn't handle some key things. NPC utilized the handler functions and NSC / NRC mostly utilized the old ipset struct functions. This caused a lot of duplication between the two groups of functions and also caused issues with proper IPv6 handling. This commit consolidates the two sets of usage into just the handler interface. This greatly simplifies how the controllers interact with ipsets and it also reduces the logic complexity on the ipset side. This also fixes up some inconsistency with how we handled IPv6 ipset names. ipset likes them to be prefixed with inet6:, but we weren't always doing this in a way that made sense and was consistent across all functions in the ipset struct.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	c62e1b7902	feat(linux_networking.go): add more logging info Adds more logging information (in the form of warnings) when we come across common errors that are not big enough to stop processing, but will still confuse users when the error gets bubbled up to NSC.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	da73dea69b	feat(NSC): use EndpointSlice instead of Endpoints With the advent of IPv6 integrated into the NSC we no longer get all IPs from endpoints, but rather just the primary IP of the pod (which is often, but not always the IPv4 address). In order to get all possible endpoint addresses for a given service we need to switch to using EndpointSlice which also nicely groups addresses into IPv4 and IPv6 by AddressType and also gives us more information about the endpoint status by giving us attributes for serving and terminating, instead of just ready or not ready. This does mean that users will need to add another permission to their RBAC in order for kube-router to access these objects.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	15cd4eb099	feat(nsc): add more insight into sync steps	2023-10-07 08:52:31 -05:00
Aaron U'Ren	81bc9e20ef	fix(nsc): don't modify netmask during flag setup There is absolutely no reason that we should ever assume netmasks, and even if we did, we shouldn't modify them as a side-effect of a completely different operation. No idea was this was ever coded this way. Netmask is now set upstream to the appropriate mask for the IP family.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	903466b745	fix(nsc): fail fast during init During our initial run, fail fatally when we encounter problems rather than just continuing on and causing subsequent problems and potentially burying the real error.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	25ecb098c6	feat(nsc): add dualstack capabilities	2023-10-07 08:52:31 -05:00
Aaron U'Ren	f397a1f011	feat: increase log level for save/restore msgs	2023-10-07 08:52:31 -05:00
Aaron U'Ren	68a7d03bac	fix: take family metrics out of defer Deferring these will end up making the end times match for both families as the variables aren't tracked separately. Since these are the same metrics, it should be safe to emit them at time of generation.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	301e856a92	fix(NPC): remove redundant assign	2023-10-07 08:52:31 -05:00
Brad Davidson	b06b4f05c3	Move ipset restore outside policy loop Signed-off-by: Brad Davidson <brad.davidson@rancher.com>	2023-10-07 08:52:31 -05:00
Brad Davidson	e34ef29fe2	Add additional save/restore metrics Signed-off-by: Brad Davidson <brad.davidson@rancher.com>	2023-10-07 08:52:31 -05:00
Brad Davidson	aa107d6376	Make metrics registerer/gathererer replacable Signed-off-by: Brad Davidson <brad.davidson@rancher.com>	2023-10-07 08:52:31 -05:00
Aaron U'Ren	e6f668cbb7	fix: syntax updates for Go 1.20.X and k8s 1.27	2023-10-07 08:52:31 -05:00
Aaron U'Ren	5cf1265fb7	fix(NRC): prevent adding routes with mixed families	2023-10-07 08:52:31 -05:00
Aaron U'Ren	bab0d4ff83	feat(bgp_policies.go): don't override-nexthop for internal peers Previously when a user selected to override the next-hop via GoBGP's NextHopActions: Self functionality, we did it for all exported routes. However, in a dual-stack use-case this causes problems for internal pod IP routes that are spread via BGP advertisements. Currently, kube-router only peers with an internal peer once over whatever it's primary IP is according to it's Kubernetes node information. This means that when overriding next-hop the IP is either an IPv4 or IPv6 address depending on how the node has configured itself. Therefore when it attempts to add a route for an IPv6 address and override next-hop is configured, if the node's primary IP was an IPv4 address this will not succeed as a next-hop for an IPv6 address cannot be an IPv4 gateway. Rather than making the code base overly complicated with both an IPv4 and IPv6 peering for internal nodes, this change presents a bit of a middle ground. By choosing not to override the next-hop for pod subnet advertisements to internal (Kubernetes node) peers, we eliminate this problem. This does change the functionality of kube-router a bit, but one of the foundational aspects to Kubernetes networking is that all nodes should be able to contact each other. So I cannot currently think of a good use-case where overriding the next-hop for pod subnets of internal peers would be necessary, so I think that this is an ok concession to make.	2023-10-07 08:52:31 -05:00
Erik Larsson	afdf553fa8	add loadbalancer address allocator This adds a simple controller that will watch for services of type LoadBalancer and try to allocated addresses from the specified IPv4 and/or IPv6 ranges. It's assumed that kube-router (or another network controller) will announce the addresses. As the controller uses leases for leader election and updates the service status new RBAC permissions are required.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	944ab91725	fix(FoU): make more robust FoU implementation now properly handles a whole host of things: * It now actually handles IPv6 by changing the encapsulation protocol to GUE instead of generic FoU. I worked with generic FoU tunnels for several days and could get it to support IPv4 and IPv6 at all even when placing using it with the IPv6 proto and with iproute2 in IPv6 mode (-6) * It now handles converting between the two tunnel types seemlessly and without leaving legacy tunnel artifacts behind. Previously, you could change the encap type but it wouldn't change the tunnels * Abstracted constants	2023-10-07 08:52:31 -05:00
Aaron U'Ren	bac4ae6299	fix(FoU): add docs, sanity checking, and logic reduction	2023-10-07 08:52:31 -05:00
Kartik Raval	2a57d6c163	Adding FoU encapsulation over IPIP tunnel : added checks for restart and multi-node cases	2023-10-07 08:52:31 -05:00
Kartik Raval	6ce37e6167	Support for FoU encapsulation for IPIP tunnel	2023-10-07 08:52:31 -05:00
Aaron U'Ren	4861021797	fix(NPC): update IPBlocks to be ipFamily specific Previously, IPBlocks (like srcIPBlocks) only contained a single IP Family which meant that a len() > 0 would indicate that an IP block had been defined in the NetworkPolicy. However, now the IPBlocks structs are IP family specific which means that they will always contain 2 entries, one for the IPv4 family and one of the IPv6 family. Which means that this condition will evaluate to true for all NetworkPolicies and waste system resources creating empty ipsets and bad iptables rules.	2023-10-07 08:52:31 -05:00
Boleyn Su	f0d7f1e17a	netpol: Fix ipset only containing one IP when port name is used.	2023-10-07 08:52:31 -05:00
Aaron U'Ren	384ed97a76	fix(bgp_policy): allow for statement add / remove The previous version of the bgp_policies code only allowed for creating a policy when the policy didn't exist already. However, with the advent of dual-stack we need to be able to add / remove statements if we add or lose a specific IP family (e.g. IPv4 or IPv6) since they are handled in different statements. Given that the owner of GoBGP has let us know that policies are idempotent, this now involves quite a bit of work. We need to follow the following procedure: add statements if missing -> add them to a policy -> if policy doesn't equal the one already in GoBGP -> create the new policy and associate it -> de-associate the old policy -> remove the old policy	2023-10-07 08:52:31 -05:00
Aaron U'Ren	1d5c9ce25c	fix(ecmp_vip): update VIPs based on svc change Previously we used to do an idempotent sync all active VIPs any time we got a service or endpoint update. However, this only worked when we assumed a single-stack deployment model where IPs were never deleted unless the whole service was deleted. In a dual-stack model, we can add / remove LoadBalancer IPs and Cluster IPs on updates. Given this, we need to take into account the finite change that happens, and not just revert to sync-all because we'll never stop advertising IPs that should be removed. As a fall-back, we still have the outer Run loop that syncs all active routes every X amount of seconds (configured by user CLI parameter). So on that timer we'll still have something that syncs all active VIPs and works as an outer control loop to ensure that desired state eventually becomes active state if we accidentally remove a VIP that should have been there.	2023-10-07 08:52:31 -05:00

1 2 3 4 5 ...

383 Commits