doc(tunnel): add information about tunnels

* Reflow existing documentation to fit markdown standards
* Adds caveats about Azure
* Gives information about tunnel types in kube-router
This commit is contained in:
Aaron U'Ren 2023-06-30 23:48:41 -05:00 committed by Aaron U'Ren
parent 944ab91725
commit ddf857de3a
3 changed files with 267 additions and 98 deletions

View File

@ -2,40 +2,63 @@
## More Information
For a more detailed explanation on how to use Direct Server Return (DSR) to build a highly scalable and available ingress for Kubernetes see the following blog post:
https://cloudnativelabs.github.io/post/2017-11-01-kube-high-available-ingress/
For a more detailed explanation on how to use Direct Server Return (DSR) to build a highly scalable and available
ingress for Kubernetes see the following
[blog post](https://cloudnativelabs.github.io/post/2017-11-01-kube-high-available-ingress/)
## What is DSR?
When enabled, DSR allows the service endpoint to respond directly to the client request, bypassing the service proxy. When DSR is enabled kube-router will use LVS's tunneling mode to achieve this (more on how later).
When enabled, DSR allows the service endpoint to respond directly to the client request, bypassing the service proxy.
When DSR is enabled kube-router will use LVS's tunneling mode to achieve this (more on how later).
## Quick Start
You can enable DSR functionality on a per service basis.
Requirements:
* ClusterIP type service has an externalIP set on it or is a LoadBalancer type service
* kube-router has been started with `--service-external-ip-range` configured at least once. This option can be specified multiple times for multiple ranges. The external IPs or LoadBalancer IPs must be included in these ranges.
* kube-router must be run in service proxy mode with `--run-service-proxy` (this option is defaulted to `true` if left unspecified)
* kube-router has been started with `--service-external-ip-range` configured at least once. This option can be
specified multiple times for multiple ranges. The external IPs or LoadBalancer IPs must be included in these ranges.
* kube-router must be run in service proxy mode with `--run-service-proxy` (this option is defaulted to `true` if left
unspecified)
* If you are advertising the service outside the cluster `--advertise-external-ip` must be set
* If kube-router is deployed as a Kubernetes pod:
* `hostIPC: true` must be set for the pod
* `hostPID: true` must be set for the pod
* The container runtime socket must be mounted into the kube-router pod via a `hostPath` volume mount.
* `hostIPC: true` must be set for the pod
* `hostPID: true` must be set for the pod
* The container runtime socket must be mounted into the kube-router pod via a `hostPath` volume mount.
* A pod network that allows for IPIP encapsulated traffic. The most notable exception to this is that Azure does not
transit IPIP encapsulated packets on their network. In this scenario, the end-user may be able to get around this
issue by enabling FoU (`--overlay-encap=fou`) and full overlay networking (`--overlay-type=full`) options in
kube-router. This hasn't been well tested, but it should allow the DSR encapsulated traffic to route correctly.
To enable DSR you need to annotate service with the `kube-router.io/service.dsr=tunnel` annotation:
```
```sh
kubectl annotate service my-service "kube-router.io/service.dsr=tunnel"
```
## Things To Lookout For
* In the current implementation, **DSR will only be available to the external IPs or LoadBalancer IPs**
* **The current implementation does not support port remapping.** So you need to use same port and target port for the service.
* In order for DSR to work correctly, an `ipip` tunnel to the pod is used. This reduces the [MTU](https://en.wikipedia.org/wiki/Maximum_transmission_unit) for the packet by 20 bytes. Because of the way DSR works it is not possible for clients to use [PMTU](https://en.wikipedia.org/wiki/Path_MTU_Discovery) to discover this MTU reduction. In TCP based services, we mitigate this by using iptables to set the [TCP MSS](https://en.wikipedia.org/wiki/Maximum_segment_size) value to 20 bytes less than kube-router's primary interface MTU size. However, it is not possible to do this for UDP streams. Therefore, UDP streams that continuously use large packets may see a performance impact due to packet fragmentation. Additionally, if clients set the `DF` (Do Not Fragment) bit, services may see packet loss on UDP services.
* **The current implementation does not support port remapping.** So you need to use same port and target port for the
service.
* In order for DSR to work correctly, an `ipip` tunnel to the pod is used. This reduces the
[MTU](https://en.wikipedia.org/wiki/Maximum_transmission_unit) for the packet by 20 bytes. Because of the way DSR
works it is not possible for clients to use [PMTU](https://en.wikipedia.org/wiki/Path_MTU_Discovery) to discover this
MTU reduction. In TCP based services, we mitigate this by using iptables to set the
[TCP MSS](https://en.wikipedia.org/wiki/Maximum_segment_size) value to 20 bytes less than kube-router's primary
interface MTU size. However, it is not possible to do this for UDP streams. Therefore, UDP streams that continuously
use large packets may see a performance impact due to packet fragmentation. Additionally, if clients set the `DF`
(Do Not Fragment) bit, services may see packet loss on UDP services.
## Kubernetes Pod Examples
As mentioned previously, if kube-router is run as a Kubernetes deployment, there are a couple of things needed on the deployment. Below is an example of what is necessary to get going (this is NOT a full deployment, it is just meant to highlight the elements needed for DSR):
```
As mentioned previously, if kube-router is run as a Kubernetes deployment, there are a couple of things needed on the
deployment. Below is an example of what is necessary to get going (this is NOT a full deployment, it is just meant to
highlight the elements needed for DSR):
```sh
apiVersion: apps/v1
kind: DaemonSet
metadata:
@ -74,18 +97,26 @@ spec:
...
```
For an example manifest please look at the [kube-router all features manifest](../daemonset/kubeadm-kuberouter-all-features-dsr.yaml) with DSR requirements for Docker enabled.
For an example manifest please look at the
[kube-router all features manifest](../daemonset/kubeadm-kuberouter-all-features-dsr.yaml) with DSR requirements for
Docker enabled.
### DSR with containerd or cri-o
As of kube-router-1.2.X and later, kube-router's DSR mode now works with non-docker container runtimes. Officially only containerd has been tested, but this solution should work with cri-o as well.
As of kube-router-1.2.X and later, kube-router's DSR mode now works with non-docker container runtimes. Officially only
containerd has been tested, but this solution should work with cri-o as well.
Most of what was said above also applies for non-docker container runtimes, however, there are some adjustments that
you'll need to make:
Most of what was said above also applies for non-docker container runtimes, however, there are some adjustments that you'll need to make:
* You'll need to let kube-router know what container runtime socket to use via the `--runtime-endpoint` CLI parameter
* If running kube-router as a Kubernetes deployment you'll need to make sure that you expose the correct socket via `hostPath` volume mount
* If running kube-router as a Kubernetes deployment you'll need to make sure that you expose the correct socket via
`hostPath` volume mount
Here is an example kube-router daemonset manifest with just the changes needed to enable DSR with containerd (this is not a full manifest, it is just meant to highlight differences):
```
Here is an example kube-router daemonset manifest with just the changes needed to enable DSR with containerd (this is
not a full manifest, it is just meant to highlight differences):
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
@ -115,11 +146,28 @@ spec:
In order to facilitate troubleshooting it is worth while to explain how kube-router accomplishes DSR functionality.
1. kube-router adds iptables rules to the `mangle` table which marks incoming packets destined for DSR based services with a unique FW mark. This mark is then used in later stages to identify the packet and route it correctly. Additionally, for TCP streams, there are rules that enable [TCP MSS](https://en.wikipedia.org/wiki/Maximum_segment_size) since the packets will change MTU when traversing an ipip tunnel later on.
2. kube-router adds the marks to an `ip rule` (see: [ip-rule(8)](https://man7.org/linux/man-pages/man8/ip-rule.8.html)). This ip rule then forces the incoming DSR service packets to use a specific routing table.
3. kube-router adds a new `ip route` table (at the time of this writing the table number is `78`) which forces the packet to route to the host even though there are no interfaces on the host that carry the DSR IP address
4. kube-router adds an IPVS server configured for the custom FW mark. When packets arrive on the localhost interface because of the above `ip rule` and `ip route`, IPVS will intercept them based on their unique FW mark.
5. When pods selected by the DSR service become ready, kube-router adds endpoints configured for tunnel mode to the above IPVS server. Each endpoint is configured in tunnel mode (as opposed to masquerade mode), which then encapsulates the incoming packet in an ipip packet. It is at this point that the pod's destination IP is placed on the ipip packet header so that a packet can be routed to the pod via the kube-bridge on either this host or the destination host.
6. kube-router then finds the targeted pod and enters it's local network namespace. Once inside the pod's linux network namespace, it sets up two new interfaces called `kube-dummy-if` and `ipip`. `kube-dummy-if` is configured with the externalIP address of the service.
7. When the ipip packet arrives inside the pod, the original source packet with the externalIP is then extracted from the ipip packet via the `ipip` interface and is accepted to the listening application via the `kube-dummy-if` interface.
8. When the application sends its response back to the client, it responds to the client's public IP address (since that is what it saw on the request's IP header) and the packet is returned directly to the client (as opposed to traversing the Kubernetes internal network and potentially making multiple intermediate hops)
1. kube-router adds iptables rules to the `mangle` table which marks incoming packets destined for DSR based services
with a unique FW mark. This mark is then used in later stages to identify the packet and route it correctly.
Additionally, for TCP streams, there are rules that enable
[TCP MSS](https://en.wikipedia.org/wiki/Maximum_segment_size) since the packets will change MTU when traversing an
ipip tunnel later on.
2. kube-router adds the marks to an `ip rule` (see: [ip-rule(8)](https://man7.org/linux/man-pages/man8/ip-rule.8.html)).
This ip rule then forces the incoming DSR service packets to use a specific routing table.
3. kube-router adds a new `ip route` table (at the time of this writing the table number is `78`) which forces the
packet to route to the host even though there are no interfaces on the host that carry the DSR IP address
4. kube-router adds an IPVS server configured for the custom FW mark. When packets arrive on the localhost interface
because of the above `ip rule` and `ip route`, IPVS will intercept them based on their unique FW mark.
5. When pods selected by the DSR service become ready, kube-router adds endpoints configured for tunnel mode to the
above IPVS server. Each endpoint is configured in tunnel mode (as opposed to masquerade mode), which then
encapsulates the incoming packet in an ipip packet. It is at this point that the pod's destination IP is placed on
the ipip packet header so that a packet can be routed to the pod via the kube-bridge on either this host or the
destination host.
6. kube-router then finds the targeted pod and enters it's local network namespace. Once inside the pod's linux network
namespace, it sets up two new interfaces called `kube-dummy-if` and `ipip`. `kube-dummy-if` is configured with the
externalIP address of the service.
7. When the ipip packet arrives inside the pod, the original source packet with the externalIP is then extracted from
the ipip packet via the `ipip` interface and is accepted to the listening application via the `kube-dummy-if`
interface.
8. When the application sends its response back to the client, it responds to the client's public IP address (since
that is what it saw on the request's IP header) and the packet is returned directly to the client (as opposed to
traversing the Kubernetes internal network and potentially making multiple intermediate hops)

39
docs/tunnels.md Normal file
View File

@ -0,0 +1,39 @@
# Tunnels in kube-router
There are several situations in which kube-router will use tunnels in order to perform certain forms of overlay /
underlay routing within the cluster. To accomplish this, kube-router makes use of
[IPIP](https://en.wikipedia.org/wiki/IP_in_IP) overlay tunnels that are built into the Linux kernel and instrumented
with iproute2.
## Scenarios for Tunnelling
By default, kube-router enables the option `--enable-overlay` which will perform overlay networking based upon the
`--overlay-type` setting (by default set to `subnet`). So out of the box, kube-router will create a tunnel for
pod-to-pod traffic any time it comes across a kube-router enabled node that is not within the subnet of it's primary
interface.
Additionally, if `--overlay-type` is set to `full` kube-router will create an tunnel for all pod-to-pod traffic and
attempt to transit any pod traffic in the cluster via an IPIP overlay network between nodes.
Finally, kube-router also uses tunnels for DSR ([Direct Server Return](dsr.md)). In this case, the inbound traffic is
encapsulated in an IPIP packet by IPVS after it reaches the node and before it is set to the pod for processing. This
allows the return IP address of the sender to be preserved at the pod level so that it can be sent directly back to the
requestor (rather than being routed in a synchronous fashion).
## Encapsulation Types
* IPIP (IP in IP) - This is the default method of encapsulation that kube-router uses
* FoU (Foo over UDP) - This is an optional type of IPIP encapsulation that kube-router uses if the user enables it
### FoU Details
Specifically, kube-router uses GUE
([Generic UDP Encapsulation](https://developers.redhat.com/blog/2019/05/17/an-introduction-to-linux-virtual-interfaces-tunnels#gue))
in order to support both IPv4 and IPv6 FoU tunnels. This option can be enabled via the kube-router parameter
`--overlay-encap=fou`. Optionally, the user can also specify a desired port for this traffic via the
`--overlay-encap-port` parameter (by default set to `5555`).
## IPIP with Azure
Unfortunately, Azure doesn't allow IPIP encapsulation on their network. So users that want to use an overlay network
will need to enable `fou` support in order to deploy kube-router in an Azure environment.

View File

@ -5,23 +5,36 @@
The best way to get started is to deploy Kubernetes with Kube-router is with a cluster installer.
### kops
Please see the [steps](https://github.com/cloudnativelabs/kube-router/blob/master/docs/kops.md) to deploy Kubernetes cluster with Kube-router using [Kops](https://github.com/kubernetes/kops)
Please see the [steps](https://github.com/cloudnativelabs/kube-router/blob/master/docs/kops.md) to deploy Kubernetes
cluster with Kube-router using [Kops](https://github.com/kubernetes/kops)
### bootkube
Please see the [steps](https://github.com/cloudnativelabs/kube-router/tree/master/contrib/bootkube) to deploy Kubernetes cluster with Kube-router using [bootkube](https://github.com/kubernetes-incubator/bootkube)
Please see the [steps](https://github.com/cloudnativelabs/kube-router/tree/master/contrib/bootkube) to deploy Kubernetes
cluster with Kube-router using [bootkube](https://github.com/kubernetes-incubator/bootkube)
### kubeadm
Please see the [steps](https://github.com/cloudnativelabs/kube-router/blob/master/docs/kubeadm.md) to deploy Kubernetes cluster with Kube-router using [Kubeadm](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/)
Please see the [steps](https://github.com/cloudnativelabs/kube-router/blob/master/docs/kubeadm.md) to deploy Kubernetes
cluster with Kube-router using [Kubeadm](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/)
### k0sproject
k0s by default uses kube-router as a CNI option.
Please see the [steps](https://docs.k0sproject.io/latest/install/) to deploy Kubernetes cluster with Kube-router using [k0s](https://docs.k0sproject.io/)
Please see the [steps](https://docs.k0sproject.io/latest/install/) to deploy Kubernetes cluster with Kube-router using
[k0s](https://docs.k0sproject.io/)
### generic
Please see the [steps](https://github.com/cloudnativelabs/kube-router/blob/master/docs/generic.md) to deploy kube-router on manually installed clusters
Please see the [steps](https://github.com/cloudnativelabs/kube-router/blob/master/docs/generic.md) to deploy kube-router
on manually installed clusters
### Amazon specific notes
When running in an AWS environment that requires an explicit proxy you need to inject the proxy server as a [environment variable](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/) in your kube-router deployment
When running in an AWS environment that requires an explicit proxy you need to inject the proxy server as a
[environment variable](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
in your kube-router deployment
Example:
@ -29,12 +42,20 @@ Example:
- name: HTTP_PROXY
value: "http://proxy.example.com:80"
### Azure specific notes
Azure does not support IPIP packet encapsulation which is the default packet encapsulation that kube-router uses. If you
need to use an overlay network in an Azure environment with kube-router, please ensure that you set
`--overlay-encap=fou`. See [kube-router Tunnel Documentation](tunnels.md) for more information.
## deployment
Depending on what functionality of kube-router you want to use, multiple deployment options are possible. You can use the flags `--run-firewall`, `--run-router`, `--run-service-proxy` to selectively enable only required functionality of kube-router.
Depending on what functionality of kube-router you want to use, multiple deployment options are possible. You can use
the flags `--run-firewall`, `--run-router`, `--run-service-proxy` to selectively enable only required functionality of
kube-router.
Also you can choose to run kube-router as agent running on each cluster node. Alternativley you can run kube-router as pod on each node through daemonset.
Also you can choose to run kube-router as agent running on each cluster node. Alternativley you can run kube-router as
pod on each node through daemonset.
## command line options
@ -104,73 +125,107 @@ Usage of kube-router:
## requirements
- Kube-router need to access kubernetes API server to get information on pods, services, endpoints, network policies etc. The very minimum information it requires is the details on where to access the kubernetes API server. This information can be passed as `kube-router --master=http://192.168.1.99:8080/` or `kube-router --kubeconfig=<path to kubeconfig file>`.
- Kube-router need to access kubernetes API server to get information on pods, services, endpoints, network policies
etc. The very minimum information it requires is the details on where to access the kubernetes API server. This
information can be passed as:
- If you run kube-router as agent on the node, ipset package must be installed on each of the nodes (when run as daemonset, container image is prepackaged with ipset)
```sh
kube-router --master=http://192.168.1.99:8080/` or `kube-router --kubeconfig=<path to kubeconfig file>
```
- If you choose to use kube-router for pod-to-pod network connectivity then Kubernetes controller manager need to be configured to allocate pod CIDRs by passing `--allocate-node-cidrs=true` flag and providing a `cluster-cidr` (i.e. by passing --cluster-cidr=10.1.0.0/16 for e.g.)
- If you run kube-router as agent on the node, ipset package must be installed on each of the nodes (when run as
daemonset, container image is prepackaged with ipset)
- If you choose to run kube-router as daemonset in Kubernetes version below v1.15, both kube-apiserver and kubelet must be run with `--allow-privileged=true` option. In later Kubernetes versions, only kube-apiserver must be run with `--allow-privileged=true` option and if PodSecurityPolicy admission controller is enabled, you should create PodSecurityPolicy, allowing privileged kube-router pods.
- Additionally, when run in daemonset mode, it is highly recommended that you keep netfilter related userspace host tooling like `iptables`, `ipset`, and `ipvsadm` in sync with the versions that are distributed by Alpine inside the kube-router container. This will help avoid conflicts that can potentially arise when both the host's userspace and kube-router's userspace tooling modifies netfilter kernel definitions. See: https://github.com/cloudnativelabs/kube-router/issues/1370 for more information.
- If you choose to use kube-router for pod-to-pod network connectivity then Kubernetes controller manager need to be
configured to allocate pod CIDRs by passing `--allocate-node-cidrs=true` flag and providing a `cluster-cidr` (i.e. by
passing --cluster-cidr=10.1.0.0/16 for e.g.)
- If you choose to use kube-router for pod-to-pod network connecitvity then Kubernetes cluster must be configured to use CNI network plugins. On each node CNI conf file is expected to be present as /etc/cni/net.d/10-kuberouter.conf .`bridge` CNI plugin and `host-local` for IPAM should be used. A sample conf file that can be downloaded as `wget -O /etc/cni/net.d/10-kuberouter.conf https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/cni/10-kuberouter.conf`
- If you choose to run kube-router as daemonset in Kubernetes version below v1.15, both kube-apiserver and kubelet must
be run with `--allow-privileged=true` option. In later Kubernetes versions, only kube-apiserver must be run with
`--allow-privileged=true` option and if PodSecurityPolicy admission controller is enabled, you should create
PodSecurityPolicy, allowing privileged kube-router pods.
- Additionally, when run in daemonset mode, it is highly recommended that you keep netfilter related userspace host
tooling like `iptables`, `ipset`, and `ipvsadm` in sync with the versions that are distributed by Alpine inside the
kube-router container. This will help avoid conflicts that can potentially arise when both the host's userspace and
kube-router's userspace tooling modifies netfilter kernel definitions. See:
https://github.com/cloudnativelabs/kube-router/issues/1370 for more information.
- If you choose to use kube-router for pod-to-pod network connecitvity then Kubernetes cluster must be configured to use
CNI network plugins. On each node CNI conf file is expected to be present as /etc/cni/net.d/10-kuberouter.conf
`bridge` CNI plugin and `host-local` for IPAM should be used. A sample conf file that can be downloaded as
```sh
wget -O /etc/cni/net.d/10-kuberouter.conf https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/cni/10-kuberouter.conf`
```
## running as daemonset
This is quickest way to deploy kube-router in Kubernetes v1.8+ (**dont forget to ensure the requirements above**). Just run:
This is quickest way to deploy kube-router in Kubernetes v1.8+ (**dont forget to ensure the requirements above**).
Just run:
```
```sh
kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kube-router-all-service-daemonset.yaml
```
Above will run kube-router as pod on each node automatically. You can change the arguments in the daemonset definition as required to suit your needs. Some samples can be found at https://github.com/cloudnativelabs/kube-router/tree/master/daemonset with different argument to select set of the services kube-router should run.
Above will run kube-router as pod on each node automatically. You can change the arguments in the daemonset definition
as required to suit your needs. Some samples can be found at
https://github.com/cloudnativelabs/kube-router/tree/master/daemonset with different argument to select set of the
services kube-router should run.
## running as agent
You can choose to run kube-router as agent runnng on each node. For e.g if you just want kube-router to provide ingress firewall for the pods then you can start kube-router as
```
You can choose to run kube-router as agent runnng on each node. For e.g if you just want kube-router to provide ingress
firewall for the pods then you can start kube-router as:
```sh
kube-router --master=http://192.168.1.99:8080/ --run-firewall=true --run-service-proxy=false --run-router=false
```
## cleanup configuration
Please delete kube-router daemonset and then clean up all the configurations done (to ipvs, iptables, ipset, ip routes etc) by kube-router on the node by running below command.
Please delete kube-router daemonset and then clean up all the configurations done (to ipvs, iptables, ipset, ip routes
etc) by kube-router on the node by running below command.
```
```sh
docker run --privileged --net=host cloudnativelabs/kube-router --cleanup-config
```
## trying kube-router as alternative to kube-proxy
If you have a kube-proxy in use, and want to try kube-router just for service proxy you can do
```
```sh
kube-proxy --cleanup-iptables
```
followed by
```
```sh
kube-router --master=http://192.168.1.99:8080/ --run-service-proxy=true --run-firewall=false --run-router=false
```
and if you want to move back to kube-proxy then clean up config done by kube-router by running
```
```sh
kube-router --cleanup-config
```
and run kube-proxy with the configuration you have.
- [General Setup](/README.md#getting-started)
## Advertising IPs
kube-router can advertise Cluster, External and LoadBalancer IPs to BGP peers.
It does this by:
* locally adding the advertised IPs to the nodes' `kube-dummy-if` network interface
* advertising the IPs to its BGP peers
To set the default for all services use the `--advertise-cluster-ip`,
`--advertise-external-ip` and `--advertise-loadbalancer-ip` flags.
- locally adding the advertised IPs to the nodes' `kube-dummy-if` network interface
- advertising the IPs to its BGP peers
To selectively enable or disable this feature per-service use the
`kube-router.io/service.advertise.clusterip`, `kube-router.io/service.advertise.externalip`
and `kube-router.io/service.advertise.loadbalancerip` annotations.
To set the default for all services use the `--advertise-cluster-ip`, `--advertise-external-ip` and
`--advertise-loadbalancer-ip` flags.
To selectively enable or disable this feature per-service use the `kube-router.io/service.advertise.clusterip`,
`kube-router.io/service.advertise.externalip` and `kube-router.io/service.advertise.loadbalancerip` annotations.
e.g.:
`$ kubectl annotate service my-advertised-service "kube-router.io/service.advertise.clusterip=true"`
@ -181,33 +236,31 @@ e.g.:
`$ kubectl annotate service my-non-advertised-service "kube-router.io/service.advertise.externalip=false"`
`$ kubectl annotate service my-non-advertised-service "kube-router.io/service.advertise.loadbalancerip=false"`
By combining the flags with the per-service annotations you can choose either
a opt-in or opt-out strategy for advertising IPs.
By combining the flags with the per-service annotations you can choose either a opt-in or opt-out strategy for
advertising IPs.
Advertising LoadBalancer IPs works by inspecting the services
`status.loadBalancer.ingress` IPs that are set by external LoadBalancers like
for example MetalLb. This has been successfully tested together with
Advertising LoadBalancer IPs works by inspecting the services `status.loadBalancer.ingress` IPs that are set by external
LoadBalancers like for example MetalLb. This has been successfully tested together with
[MetalLB](https://github.com/google/metallb) in ARP mode.
## Hairpin Mode
Communication from a Pod that is behind a Service to its own ClusterIP:Port is
not supported by default. However, It can be enabled per-service by adding the
`kube-router.io/service.hairpin=` annotation, or for all Services in a cluster by
Communication from a Pod that is behind a Service to its own ClusterIP:Port is not supported by default. However, it
can be enabled per-service by adding the `kube-router.io/service.hairpin=` annotation, or for all Services in a cluster by
passing the flag `--hairpin-mode=true` to kube-router.
Additionally, the `hairpin_mode` sysctl option must be set to `1` for all veth
interfaces on each node. This can be done by adding the `"hairpinMode": true`
option to your CNI configuration and rebooting all cluster nodes if they are
Additionally, the `hairpin_mode` sysctl option must be set to `1` for all veth interfaces on each node. This can be
done by adding the `"hairpinMode": true` option to your CNI configuration and rebooting all cluster nodes if they are
already running kubernetes.
Hairpin traffic will be seen by the pod it originated from as coming from the
Service ClusterIP if it is logging the source IP.
Hairpin traffic will be seen by the pod it originated from as coming from the Service ClusterIP if it is logging the
source IP.
### Hairpin Mode Example
10-kuberouter.conf
```json
{
"name":"mynet",
@ -222,48 +275,64 @@ Service ClusterIP if it is logging the source IP.
```
To enable hairpin traffic for Service `my-service`:
```
```sh
kubectl annotate service my-service "kube-router.io/service.hairpin="
```
If you want to also hairpin externalIPs declared for Service `my-service` (note, you must also either enable global hairpin or service hairpin (see above ^^^) for this to have an effect):
```
If you want to also hairpin externalIPs declared for Service `my-service` (note, you must also either enable global
hairpin or service hairpin (see above ^^^) for this to have an effect):
```sh
kubectl annotate service my-service "kube-router.io/service.hairpin.externalips="
```
## SNATing Service Traffic
By default, as traffic ingresses into the cluster, kube-router will source nat the traffic to ensure symmetric routing if it needs to proxy that traffic to ensure it gets to a node that has a service pod that is capable of servicing the traffic. This has a potential to cause issues when network policies are applied to that service since now the traffic will appear to be coming from a node in your cluster instead of the traffic originator.
This is an issue that is common to all proxy's and all Kubernetes service proxies in general. You can read more information about this issue here: https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-nodeport
By default, as traffic ingresses into the cluster, kube-router will source nat the traffic to ensure symmetric routing
if it needs to proxy that traffic to ensure it gets to a node that has a service pod that is capable of servicing the
traffic. This has a potential to cause issues when network policies are applied to that service since now the traffic
will appear to be coming from a node in your cluster instead of the traffic originator.
In addition to the fix mentioned in the linked upstream documentation (using `service.spec.externalTrafficPolicy`), kube-router also provides DSR, which by its nature preserves the source IP, to solve this problem. For more information see the section above.
This is an issue that is common to all proxy's and all Kubernetes service proxies in general. You can read more
information about this issue here:
https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-nodeport
In addition to the fix mentioned in the linked upstream documentation (using `service.spec.externalTrafficPolicy`),
kube-router also provides [DSR](dsr.md), which by its nature preserves the source IP, to solve this problem. For more
information see the section above.
## Load balancing Scheduling Algorithms
Kube-router uses LVS for service proxy. LVS support rich set of [scheduling alogirthms](http://kb.linuxvirtualserver.org/wiki/IPVS#Job_Scheduling_Algorithms). You can annotate
the service to choose one of the scheduling alogirthms. When a service is not annotated `round-robin` scheduler is selected by default
Kube-router uses LVS for service proxy. LVS support rich set of
[scheduling alogirthms](http://kb.linuxvirtualserver.org/wiki/IPVS#Job_Scheduling_Algorithms). You can annotate
the service to choose one of the scheduling alogirthms. When a service is not annotated `round-robin` scheduler is
selected by default
```
For least connection scheduling use:
kubectl annotate service my-service "kube-router.io/service.scheduler=lc"
```sh
#For least connection scheduling use:
$ kubectl annotate service my-service "kube-router.io/service.scheduler=lc"
For round-robin scheduling use:
kubectl annotate service my-service "kube-router.io/service.scheduler=rr"
#For round-robin scheduling use:
$ kubectl annotate service my-service "kube-router.io/service.scheduler=rr"
For source hashing scheduling use:
kubectl annotate service my-service "kube-router.io/service.scheduler=sh"
#For source hashing scheduling use:
$ kubectl annotate service my-service "kube-router.io/service.scheduler=sh"
For destination hashing scheduling use:
kubectl annotate service my-service "kube-router.io/service.scheduler=dh"
#For destination hashing scheduling use:
$ kubectl annotate service my-service "kube-router.io/service.scheduler=dh"
```
## HostPort support
If you would like to use `HostPort` functionality below changes are required in the manifest.
- By default kube-router assumes CNI conf file to be `/etc/cni/net.d/10-kuberouter.conf`. Add an environment variable `KUBE_ROUTER_CNI_CONF_FILE` to kube-router manifest and set it to `/etc/cni/net.d/10-kuberouter.conflist`
- By default kube-router assumes CNI conf file to be `/etc/cni/net.d/10-kuberouter.conf`. Add an environment variable
`KUBE_ROUTER_CNI_CONF_FILE` to kube-router manifest and set it to `/etc/cni/net.d/10-kuberouter.conflist`
- Modify `kube-router-cfg` ConfigMap with CNI config that supports `portmap` as additional plug-in
```
```json
{
"cniVersion":"0.3.0",
"name":"mynet",
@ -287,23 +356,33 @@ If you would like to use `HostPort` functionality below changes are required in
]
}
```
- Update init container command to create `/etc/cni/net.d/10-kuberouter.conflist` file
- Restart the container runtime
For an e.g manifest please look at [manifest](../daemonset/kubeadm-kuberouter-all-features-hostport.yaml) with necessary changes required for `HostPort` functionality.
For an e.g manifest please look at [manifest](../daemonset/kubeadm-kuberouter-all-features-hostport.yaml) with necessary
changes required for `HostPort` functionality.
## IPVS Graceful termination support
As of 0.2.6 we support experimental graceful termination of IPVS destinations. When possible the pods's TerminationGracePeriodSeconds is used, if it cannot be retrived for some reason
the fallback period is 30 seconds and can be adjusted with `--ipvs-graceful-period` cli-opt
As of 0.2.6 we support experimental graceful termination of IPVS destinations. When possible the pods's
TerminationGracePeriodSeconds is used, if it cannot be retrived for some reason the fallback period is 30 seconds and
can be adjusted with `--ipvs-graceful-period` cli-opt
graceful termination works in such a way that when kube-router receives a delete endpoint notification for a service it's weight is adjusted to 0 before getting deleted after he termination grace period has passed or the Active & Inactive connections goes down to 0.
graceful termination works in such a way that when kube-router receives a delete endpoint notification for a service
it's weight is adjusted to 0 before getting deleted after he termination grace period has passed or the Active &
Inactive connections goes down to 0.
## MTU
The maximum transmission unit (MTU) determines the largest packet size that can be transmitted through your network. MTU for the pod interfaces should be set appropriately to prevent fragmentation and packet drops thereby achieving maximum performance. If `auto-mtu` is set to true (`auto-mtu` is set to true by default as of kube-router 1.1), kube-router will determine right MTU for both `kube-bridge` and pod interfaces. If you set `auto-mtu` to false kube-router will not attempt to configure MTU. However you can choose the right MTU and set in the `cni-conf.json` section of the `10-kuberouter.conflist` in the kube-router [daemonsets](../daemonset/). For e.g.
The maximum transmission unit (MTU) determines the largest packet size that can be transmitted through your network. MTU
for the pod interfaces should be set appropriately to prevent fragmentation and packet drops thereby achieving maximum
performance. If `auto-mtu` is set to true (`auto-mtu` is set to true by default as of kube-router 1.1), kube-router will
determine right MTU for both `kube-bridge` and pod interfaces. If you set `auto-mtu` to false kube-router will not
attempt to configure MTU. However you can choose the right MTU and set in the `cni-conf.json` section of the
`10-kuberouter.conflist` in the kube-router [daemonsets](../daemonset/). For e.g.
```
```json
cni-conf.json: |
{
"cniVersion":"0.3.0",
@ -323,7 +402,10 @@ The maximum transmission unit (MTU) determines the largest packet size that can
}
```
If you set MTU yourself via the CNI config, you'll also need to set MTU of `kube-bridge` manually to the right value to avoid packet fragmentation in case of existing nodes on which `kube-bridge` is already created. On node reboot or in case of new nodes joining the cluster both the pod's interface and `kube-bridge` will be setup with specified MTU value.
If you set MTU yourself via the CNI config, you'll also need to set MTU of `kube-bridge` manually to the right value
to avoid packet fragmentation in case of existing nodes on which `kube-bridge` is already created. On node reboot or
in case of new nodes joining the cluster both the pod's interface and `kube-bridge` will be setup with specified MTU value.
## BGP configuration