239 Commits

Author SHA1 Message Date
Cat C
a051e0eec7
Merge pull request #2017 from 1fabi0/feature_kep-1860
feat(services): support ipMode Proxy for LoadBalancer ingresses
2026-03-31 19:28:21 -07:00
Aprazors
20a2e034b9 test(NSC): reorganize test files per reviewer feedback
Move tests from hardening_test.go into the files requested by aauren:
- TestShuffleDoesNotPanicOnEmptySlice → network_services_controller_test.go
- TestSetupMangleTableRuleRejectsInvalidIP → network_services_controller_test.go
- TestNodePortHealthCheckConcurrentAccess → new nodeport_healthcheck_test.go

Delete hardening_test.go (now empty).
2026-03-31 20:30:15 -05:00
Aprazors
59c5ec69fd test(NSC): add tests for shuffle, healthcheck concurrency, and invalid IP handling
Table-driven tests following project conventions (testify assertions,
subtests) covering:
- shuffle: empty, single, and multi-element slices don't panic
- NodePort healthcheck: concurrent read/write with RWMutex is safe
- ParseIP: invalid IPs correctly return nil
2026-03-31 20:30:15 -05:00
Aprazors
aba49a9892 fix(NSC): harden Network Services Controller against panics, races, and sync errors
This combines five defensive fixes in the Network Services Controller:

1. shuffle(): check rand.Int error before dereferencing result
   - rand.Int returns (nil, err) on failure, but the result was
     dereferenced before the error check, causing a nil panic

2. NodePort healthcheck: add RWMutex to protect shared maps
   - UpdateServicesInfo writes serviceInfoMap/endpointsInfoMap from
     the sync goroutine while HTTP handlers read concurrently

3. setupIpvsFirewall: use continue instead of return in dual-stack loop
   - return nil after clearing one IP family's chain skipped the
     second family entirely on dual-stack nodes

4. setupMangleTableRule/cleanupMangleTableRule: add nil check for ParseIP
   - net.ParseIP result was used without nil check, causing panic
     on malformed IP strings from service annotations

5. synctypeIpvs: track errors across both sync steps for heartbeat
   - err from syncIpvsServices was overwritten by syncHairpinIptablesRules,
     masking IPVS failures from the health check system
2026-03-31 20:30:15 -05:00
Kamp, Fabian
421fd43623 feat(services): support ipMode Proxy for LoadBalancer ingresses
This implements support for KEP-1860. When a LoadBalancer ingress has ipMode set to 'Proxy', kube-router will skip adding the IP to the local IPVS table and will not hijack the traffic. If ipMode is 'VIP' or unset, the current behavior is maintained.

Fixes #2014
2026-03-24 16:58:26 +01:00
Aaron U'Ren
a1f0b2eea3
fix: validate external IPs and LB IPs against configured ranges
Moves all Service VIP range configurations into pkg/svcip this is where
validation and querying of ranges goes rather than passing each range to
each controller.

It also centralizes the validation logic since NRC and NSC need
basically equivalent logic. It additionally adds a RangeQuerier
interface for the NPC and LBC controllers which require knowing the
literal ranges.
2026-03-15 20:46:54 -05:00
Aaron U'Ren
070d9565db feat(lint): add basic typos checker to ensure less spelling mistakes in the future 2026-03-15 12:29:17 -07:00
Roman Kuzmitskii
39efb9230c feat: add support for SCTP
includes workaround for musl hardcoded protocol table that
  is missing SCTP support by using protocol name to
  numeric value mapping in ipset entries

closes: https://github.com/cloudnativelabs/kube-router/issues/1019
Signed-off-by: Roman Kuzmitskii <roman@damex.org>
2026-03-09 19:42:08 -10:00
Aaron U'Ren
a1e6de9f8f test(NSC): add endpoint checking to tests that need them 2026-02-01 11:07:13 -06:00
Aaron U'Ren
3157e85eb8 test(NSC): fix two DSR service tests to create pods 2026-02-01 11:07:13 -06:00
Aaron U'Ren
ca6b644d32 test(NSC): mock netlink calls - attempt 1 2026-02-01 11:07:13 -06:00
Aaron U'Ren
b9cd0de978 test(NSC): add DSR unit tests - series 1 2026-02-01 11:07:13 -06:00
Aaron U'Ren
10f366ace6 test(NSC): implement traffic policy unit testing
Logic errors & regressions relating to traffic policies make up
approximately 8 or so preventable historical issues with the project.
Therefore prioritizing them as a unit testing surface.
2026-02-01 11:07:13 -06:00
Aaron U'Ren
048680706c fix(NSC): cleanup historical bad IPv6 TCPMSS vals 2026-02-01 10:56:40 -06:00
Aaron U'Ren
8aaba6505e test(NSC): add comprehensive TCPMSS unit tests 2026-02-01 10:56:40 -06:00
Aaron U'Ren
d208307d43 fact(test): reuse existing ValToPtr functions 2026-01-31 12:15:35 -06:00
Aaron U'Ren
ae39f279a7 fact(NSC): use LinuxNetworkingMock instead of creating a new one 2026-01-31 12:15:35 -06:00
Aaron U'Ren
59814eb67b fix: convert ginkgo tests to standard go tests 2026-01-31 12:15:35 -06:00
Richard Kojedzinszky
ee0940b87c fix(dsr): set TCPMSS based on address family 2026-01-25 12:00:21 -06:00
Cat C
440ad4d0a1 fix: Replace all netlink functions that throw ErrDumpInterrupted with a retry wrapper 2026-01-09 09:17:43 -06:00
ccoVeille
e06ddccabe feat(test): use safecast.RequireConvert as a replacement for safecast.Convert in tests 2025-11-21 21:20:44 -06:00
ccoVeille
1e8976bd79 build(deps): update github.com/ccoveille/go-safecast to v2.0.0 2025-11-08 01:13:51 +01:00
ccoVeille
e8a59fda2e build(deps): bump github.com/ccoveille/go-safecast to 1.8.1 2025-11-03 12:04:58 +01:00
Aaron U'Ren
846fbd8500 fix(ipset): don't strip inet6 prefixing of ipsets
The problem here stems from the fact that when netpol generates its list of expected ipsets, it includes the inet6:
prefix, however, when the proxy and routing controller sent their list of expected ipsets, they did not do so. This
meant that no matter how we handled it in ipset.go it was wrong for one or the other use-cases.

I decided to standardize on the netpol way of sending the list of expected ipset names so that BuildIPSetRestore() can
function in the same way for all invocations.
2025-10-27 21:25:33 -05:00
Aaron U'Ren
f44598bcb1 test(ipset): add unit tests for ipset regression testing 2025-10-27 21:25:33 -05:00
Bukal, Tomáš
720e2ca2bd fix(ipset): store kube-router-local-ips ipset 2025-10-11 08:26:43 -05:00
Aaron U'Ren
6c44013bc5 fix(ipset): ignore non-kube-router ipsets
Attempt to filter out sets that we are not authoritative for to avoid
race conditions with other operators (like Istio) that might be
attempting to modify ipsets at the same time.
2025-10-04 18:30:28 -05:00
Aaron U'Ren
a4fb70a095 feat(lint): update golangci-lint v2.0.2 -> v2.4.0 2025-09-20 16:30:54 -05:00
Aaron U'Ren
d7214cec4f feat(Endpoints): convert Endpoints -> EndpointSlices 2025-09-06 16:27:03 -05:00
Aaron U'Ren
732d7a72dc fix(nsc): add loadbalancer IPs to metrics 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
c2fd633373 fix(nsc): sync field name 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
b4a9ba70fd fix(nsc): rename network_services_metrics.go 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
7533c183a1 feat(nsc): getMetricsServiceMap() rebuilds only after services changed 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
5efb999169 feat(nsc): replace unsafe.Pointer with atomic.Pointer 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
d0163ab725 feat(nsc): move part of Collect() to getMetricsServiceMap() 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
4e8bb705b5 feat(nsc): move metrics logic to separate file 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
a224198c89 feat(nsc): optimize key in temporary serviceMap 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
4ed0cf4117 feat(nsc): improve Service statistics 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
1b4b6d6b2b feat(nsc): eliminate nested loops in Collect() 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
766627645e feat(nsc): collect service statistics on demand 2025-09-01 21:04:49 -05:00
Richard Kojedzinszky
4b4ebec81f feat(nsc): prepare serviceMap to be accessed by collector thread 2025-09-01 21:04:49 -05:00
Anupam Ghosh
5e397e50e7 fix failed message 2025-08-06 17:01:22 -07:00
Anupam Ghosh
bbb8f3b0d9 disable sloppy_tcp if there is no DSR & Maglev service 2025-08-06 17:01:22 -07:00
Anupam Ghosh
598fc86349 enable sloppy_tcp when DSR and Maglev is enabled 2025-08-06 17:01:22 -07:00
Aaron U'Ren
700620509f feat(DSR): disable routing DSR traffic via kube-bridge
This was originally added in PR #210, but it appears to cause more
problems in my testing scenarios than it solves. When this is enabled,
it makes it so that services cannot be routed to from kube workers to
DSR enabled services when routed to other nodes in the cluster.
2025-06-29 17:42:18 -05:00
Aaron U'Ren
2ebcac62ec feat(linux_networking): add some additional logging 2025-06-29 17:42:18 -05:00
Aaron U'Ren
8504c52e80 fix(DSR): setup source routing for all external IPs
Previously, kube-router was only considering externalIPs when setting up
source routing policy, notably absent was consideration of LoadBalancer
IPs which are equally important for getting right with DSR.

This appears to have been a long-standing use-case that was never
correctly considered since when kube-router added a LoadBalancer
controller.
2025-06-29 17:42:18 -05:00
Aaron U'Ren
e6edc853fb fix(ipAddrDel): check to see if IP exists on interface before delete
Rather than yolo'ing a delete of the IP on the interface, check to see
if it exists and save the user some warning in their logs.
2025-06-29 17:42:18 -05:00
Aaron U'Ren
69e58eda04 feat(NSC): add some additional debugging to traffic director 2025-06-29 17:42:18 -05:00
Aaron U'Ren
94bfc0d9ba fix(ipAddrDel): check for routes before trying to delete
Instead of deleting and just hoping for the best, this change makes it
so that we check first whether or not a route exists. This helps to
reduce needless warnings that the user receives and is just all around
more accurate.
2025-06-29 17:42:18 -05:00