- Replace fmt.Errorf %s/%v + err.Error() with %w for proper error
wrapping and errors.Is/As chain support across all packages
- Replace errors.New("msg" + err.Error()) with fmt.Errorf("msg: %w", err)
- Replace strings.Contains(err.Error(), ...) with errors.Is(err,
syscall.EEXIST) and errors.Is(err, syscall.ESRCH) in linux_networking.go
- Remove now-unused IfaceHasAddr and IpvsServerExists string constants
- Replace sort.Strings with slices.Sort in bgp_policies.go, ipset.go,
and testhelpers
- Replace sort.SliceStable with slices.SortStableFunc in bgp_policies.go
- Replace reflect.DeepEqual on []string with slices.Equal in bgp_policies.go
(also fixes bug: was comparing map to slice instead of slice to slice)
- Replace reflect.DeepEqual on []*gobgpapi.Prefix with slices.EqualFunc
comparing exported fields to avoid protobuf internal state comparison
- Replace strings.Index + manual slicing with strings.Cut in docker.go
and classify.go
- Update cni_test.go to use assert.EqualError instead of assert.Equal
for wrapped error comparison
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Moves all Service VIP range configurations into pkg/svcip this is where
validation and querying of ranges goes rather than passing each range to
each controller.
It also centralizes the validation logic since NRC and NSC need
basically equivalent logic. It additionally adds a RangeQuerier
interface for the NPC and LBC controllers which require knowing the
literal ranges.
includes workaround for musl hardcoded protocol table that
is missing SCTP support by using protocol name to
numeric value mapping in ipset entries
closes: https://github.com/cloudnativelabs/kube-router/issues/1019
Signed-off-by: Roman Kuzmitskii <roman@damex.org>
Attempt to filter out sets that we are not authoritative for to avoid
race conditions with other operators (like Istio) that might be
attempting to modify ipsets at the same time.
This commit allows ICMP traffic always, not just on the case that
network policy is not applied in a particular direction, as was
originally the intention for KUBE-NWPLCY-DEFAULT.
This commit also consolidates common matching logic for established /
related & invalid traffic flows which hopefully reduces how much
iptables rules we have to make by a significant factor.
Make the health controller more robust and extensible by adding in
constants for heart beats instead of 3 character random strings that are
easy to get wrong.
This prepares the way for broader refactors in the way that we handle
nodes by:
* Separating frequently used node logic from the controller creation
steps
* Keeping reused code DRY-er
* Adding interface abstractions for key groups of node data and starting
to rely on those more rather than concrete types
* Separating node data from the rest of the controller data structure so
that it smaller definitions of data can be passed around to functions
that need it rather than always passing the entire controller which
contains more data / surface area than most functions need.
This fixes the problem where if network policy is applied before any
communication between two pods, all subsequent communication fails
because the two pods aren't able to discovery each other.
At the very end of a NPC full sync we call ipset.Save() during the ipset
cleanup stage. This causes all of the current IPv4 and IPv6 sets that
are defined on the system (ours or not) to enter into the handler's
state.
Since `ipset restore` is not implicitly destructive (e.g. it doesn't
remove sets that aren't defined like iptables-restore does) we don't
really need this previous state, and in some ways it may come back to
cause bugs if the state isn't purged.
So this is a fail safe to clean them out to ensure that they don't end
up building up cruft. It also makes the restores go faster as
kube-router is only defining it's own rules rather than defining all
rules.
kube-router v2.X introduced the idea of iptables and ipset handlers that
allow kube-router to be dual-stack capable. However, the cleanup logic
for the various controllers was not properly ported when this happened.
When the cleanup functions run, they often have not had their
controllers fully initialized as cleanup should not be dependant on
kube-router being able to reach a kube-apiserver.
As such, they were missing these handlers. And as such they either
silently ended up doing noops or worse, they would run into nil pointer
failures.
This corrects that, so that kube-router no longer fails this way and
cleans up as it had in v1.X.
Before the logic ran like the following in terms of preference:
1. Prefer environment var NODE_NAME
2. `Use os.Hostname()`
3. Fallback to `--hostname-override` passed by user
This didn't make a whole lot of sense, as `--hostname-override` is
directly, and supposedly intentionally set by the user, therefore it
should be the MOST preferred, not the least preferred. Additionally,
none of the errors encountered were passed back to the user so that
future conditions could be considered, so if there was an error at the
API level, that error was swallowed. Now the logic looks like:
1. Prefer `--hostname-override` if it is set. If it is set and we
weren't able to resolve to a node object, return the error
2. Use environment var NODE_NAME if it is set. If it is set and we
weren't able to resolve to a node object, return the error
3. Fallback to `os.Hostname()`. If we weren't able to resolve to a node
object then return the error and give the user options
Use ipSetName utility method to ensure that ipset names are generated
correctly when they are formulated. This feeds into the activeIPSets map
later on, so it is important that we get the name right from the start.
Naming ipsets with the advent of IPv6 gets tricky because IPv6 ipsets
have to be prefixed with inet6:. This commit adds additional utilities
that help users find the correct name of ipsets.
Upgrades to Go 1.21.7 now that Go 1.20 is no longer being maintained.
It also, resolves the race conditions that we were seeing with BGP
server tests when we upgraded from 1.20 -> 1.21. This appears to be
because some efficiency changed in 1.21 that caused BGP to write to the
events at the same time that the test harness was trying to read from
them. Solved this in a coarse manner by adding surrounding mutexes to
the test code.
Additionally, upgraded dependencies.
Before this, we had 2 different ways to interact with ipsets, through
the handler interface which had the best handling for IPv6 because NPC
heavily utilizes it, and through the ipset struct which mostly repeated
the handler logic, but didn't handle some key things.
NPC utilized the handler functions and NSC / NRC mostly utilized the old
ipset struct functions. This caused a lot of duplication between the two
groups of functions and also caused issues with proper IPv6 handling.
This commit consolidates the two sets of usage into just the handler
interface. This greatly simplifies how the controllers interact with
ipsets and it also reduces the logic complexity on the ipset side.
This also fixes up some inconsistency with how we handled IPv6 ipset
names. ipset likes them to be prefixed with inet6:, but we weren't
always doing this in a way that made sense and was consistent across all
functions in the ipset struct.
Deferring these will end up making the end times match for both families
as the variables aren't tracked separately. Since these are the same
metrics, it should be safe to emit them at time of generation.
This adds a simple controller that will watch for services of type LoadBalancer
and try to allocated addresses from the specified IPv4 and/or IPv6 ranges.
It's assumed that kube-router (or another network controller) will announce the addresses.
As the controller uses leases for leader election and updates the service status new
RBAC permissions are required.
Previously, IPBlocks (like srcIPBlocks) only contained a single IP
Family which meant that a len() > 0 would indicate that an IP block had
been defined in the NetworkPolicy. However, now the IPBlocks structs are
IP family specific which means that they will always contain 2 entries,
one for the IPv4 family and one of the IPv6 family. Which means that
this condition will evaluate to true for all NetworkPolicies and waste
system resources creating empty ipsets and bad iptables rules.
Rather than just silently not adding policies for controllers that don't
support a given address family, emit a warning so that it is more
obvious in the logs that kube-router isn't able to add a policy for a
given family when the controller doesn't have that family enabled.
On dual-stack nodes there can still be pods that are single stack. When
this happens there won't be a pod IP for a given family and if
kube-router tries to add rules with a missing pod IP the iptables rules
won't be formatted correctly (because it won't have a valid source or
destination for that family).
So rather than breaking the whole iptables-restore we warn in the logs
and skip the pod policy chains for that family.
Without this, kube-router would end up sharing the index between ipv4
and ipv6 which would cause it to error out when one incremented beyond
the number of rules that actually existed in the chain.