Navigating Kubernetes Networking in Public and Private clouds
1.0 Introduction
In this blog we will take a look at networking options for deploying Kubernetes (k8s) Clusters in Public and Private Clouds. I will describe what Container Networking Interface (CNI) is, discuss various CNI plugins, qualify various CNIs for Kubernetes clusters in public vs private cloud, and attempt to make recommendations.
2.0 Container Network Interface (CNI)
CNI that stands for Container Network Interface, is a standard designed to configure networking for a container during its lifecycle. The CNI specification, implemented as a network plugin or CNI plugin, interfaces with the container runtime and is responsible for inserting a network interface into the container network namespace and making any necessary changes on the host.
In the context of Kubernetes, the cluster configuration provides information on the network a pod should join and the plugin that is needed. This allows kubelet to automatically configure networking for the pods it starts by calling the plugins it finds. The plugin adds the interface into the pod/container network namespace as one side of a veth pair. It then makes changes on the host machine, including wiring up the other part of the veth to a network bridge. Afterwards, it allocates an IP address and sets up routes by calling a separate IPAM (IP Address Management) plugin.
The CNIs plugins that depend on Linux Networking Stack use the linux kernel packet forwarding and routing functionality to move packets between containers and nodes. The physical interfaces on the nodes are connected directly to the networking stack and a combination of routing (for example BGP) and/or tunneling is used for multi host overlay networking.
Similarly, the CNI plugins can also work with software virtual switches/forwarders such as OpenvSwitch or VPP (Vector Packet Processing) running on each node. The physical interfaces of the node and containers are connected to the virtual switch, and connectivity is achieved by programming the virtual switch. The CNI plugin can program these OVS switches either directly or via software defined networking abstractions. The software forwarder typically runs in user space (as opposed to kernel space) and can provide enhanced traffic forwarding performance.
Good news (or not) is that there are number of primary CNI plugin options available for k8s deployment, as listed below (note that the list keeps growing):
- Calico
- Contiv
- Cilium
- Multus
- Weave
- ACI-CNI
- AWS VPC CNI
And there are a host of secondary CNI plugins, which the primary CNI plugin can invoke and are meant for functions such as IPAM and service-mesh etc.
- Infoblox
- Istio
Of course, this presents a major challenge for k8s deployment architects to ensure choosing the optimal CNI plug-in depending on the use-cases, capabilities etc.
3.0 k8s CNI for Public Clouds
There are number of public cloud providers offering flexible Kubernetes deployment options. In the first version, this blog focuses on Amazon Web Services (AWS) and identifies two options for deploying Kubernetes clusters -
(i) AWS managed Elastic Kubernetes Services (EKS)
(ii) Self-managed Kubernetes cluster on EC2 instances
In terms of the options for networking plugins, one can choose AWS offered VPC CNI plugin or any 3rd party CNI plugin such as Calico CNI.
3.1 AWS managed EKS
Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that allows you to get a Kubernetes Cluster on AWS without needing to install, operate, and maintain the Kubernetes control plane or nodes.
Amazon EKS runs Kubernetes control plane instances across multiple Availability Zones to ensure high availability. Amazon EKS runs a single tenant Kubernetes control plane for each cluster, and control plane infrastructure is not shared across clusters or AWS accounts. On the other hand, you will have full control of the worker nodes that are part of this cluster.
3.1.1 AWS managed EKS — VPC CNI
AWS offers a CNI plugin referred to as the AWS VPC CNI plugin. This plugin developed by AWS uses VPC (Virtual Private Cloud) networking and Security Group functionality and as far as the documentation reveals it uses native Linux kernel networking stack. The Amazon VPC CNI plugin for Kubernetes has two components:
· CNI plugin — Responsible for adding the correct network interface to the pod namespace and wiring the interface to the host.
· L-IPAM daemon — Responsible for creating and attaching the secondary ENI’s to Amazon EC2 instances and using them to assign Pod IP’s which are routable. When the number of pods running on the node exceeds the number of addresses that can be assigned to a single network interface, the plugin can allocate a new network interface provided the maximum number of network interfaces for the instance aren’t already attached.
When you create an Amazon EKS cluster, you specify the VPC subnets for your cluster to use. The VPC subnets can be a combination of private and public subnets. The use of VPC subnets provides reachability to and from the pods with VPC routing. In case of private subnets, the plugin offers address translation of the pod IPs to the Primary Private-IP of the node to enable Internet access for the pods. Using a VPC with public and private subnets enables Kubernetes to create load balancers (AWS Elastic Load Balancer) in the public subnets which can load balance traffic to pods running on nodes that are in private subnets.
A few limitations of the AWS VPC CNI that have to be considered are follows.
1. Number of PODs that will be deployed in the cluster and corresponding density per node will depend on the number of network interfaces per EC2 instance and secondary IPs per network interface.
2. Proper design of the Kubernetes Cluster needs to be done before installation to cater for all future IP addressing requirements. For example, you can alleviate IP address exhaustion by using Custom Networking features of the plugin and either allocate a dedicated CIDR per node for pod IP address pool or use Overlay networking that is provided by AWS VPC CNI.
Although it should be noted that after enabling a custom network the primary network interface for the worker node is not used for pod placement and hence the pod density per node will be reduced.
3. Although the AWS VPC CNI plugin supports Security Groups, there’s no native integration with Kubernetes Network Policy. This can be fixed by integrating with another plugin such as Calico. This is possible due to the fact that Calico is built with modular components and its policy capabilities are deployed as a daemonset and along with matching the veth naming convention of AWS VPC CNI plugin, rules can be enforced using IPtables/IPsets in the Linux Kernel.
AWS documentation states that EKS officially supports only AWS VPC CNI while mentioning Calico, Cilium and Weave-net among others as compatible CNI plugins.
Source: https://docs.aws.amazon.com/eks/latest/userguide/alternate-cni-plugins.html
When it comes to EKS, using the default AWS VPC CNI and enhancing it with an additional CNI such as Calico would seem to be the best possible way forward. However, implications of limited POD density and increased overhead to manage the additional Kubernetes clusters, VPC’s and EC2 instances should be carefully weighed in.
Another important point to be remembered is that for every EKS cluster, AWS launches the control plane in a VPC which is managed by AWS while the worker nodes are deployed in a tenant or customer managed VPC.
Amazon EKS allows resilient access to the Kubernetes API server via an Elastic Load Balancer using different options such as (i) Public Endpoints (ii) Public and Private Endpoints and (iii) Private Endpoints. Accessing the Public Endpoint will be via the Internet and accessing the Private Endpoint is made possible by creating “EKS owned” ENIs in the Tenant VPC which sets up bi-directional reachability between the two VPCs.
However, when a pod wants to access the Kubernetes API Server, it typically uses the “kubernetes.default.svc” hostname to query the API server which in turn maps to a Cluster-IP that is assigned to the Kubernetes API Service running on the Master nodes. Therefore, before the traffic leaves the worker-nodes, destination NAT for the Cluster-IP to the IP address of any of the Master Node is performed and this is handled by IPtable rules created by kube-proxy.
3.1.1.1 AWS managed EKS — VPC CNI and Custom Networking
To address the IP exhaustion/scarcity issue, AWS VPC CNI plugin was enhanced to allow the use of secondary VPC CIDR range for pods. With custom networking, pods no longer use the host’s primary ENI for IP assignment with the exception of those that are set to use hostNetwork and more importantly the pod density still depends on the number of secondary ENIs and secondary IPs supported by each ENI.
Anyway , for traffic initiated from the pods to the EKS control plane, the AWS VPC CNI has to be configured to do NAT when custom networking is used. SNAT is applied to all traffic sent from the pods to destination outside the tenant VPC CIDR block using the Primary IP of the EC2 instance. This will allow the Kubernetes API Server to accept the requests.
The other workaround would be to deploy pods that require control plane access with hostNetwork:true in their pod spec. This method will also allow the control plane nodes to initiate network connections to those pods.
3.1.2 AWS managed EKS — Calico CNI
Calico provides a solution with native Linux kernel performance and cloud-native scalability. And recently Calico has also added eBPF “extended Berkeley Packet Filter” data plane that enables new security features. If the preference is to avoid the limitations and dependencies of the AWS VPC CNI, Calico can be considered which provides independence to scale the Kubernetes cluster with no dependencies on the underlying AWS cloud network.
The optimal way of deploying Calico networking would be in cross-subnet overlay mode. The cross-subnet configuration is an enhancement to the VXLAN or IP-in-IP overlays where within each subnet, the underlying network acts as an L2 network and hence traffic within a single subnet does not need to be encapsulated while traffic sent across subnets is encapsulated resulting in a very flexible deployment model. Additionally, Calico creates a BGP mesh between all nodes of the cluster and advertises the routes for container networks to all worker nodes.
And finally, Calico provides network security policy for individual containers.
When deploying Calico CNI plugin with EKS, Calico networking cannot be installed on the EKS control plane nodes and is only applicable to the worker nodes that are deployed in the tenant managed VPC.
For creating Calico IP Pools, VPC subnets or custom user defined subnets can be used. Calico has the ability to advertise the Pod Subnets and Host IPs used for Kubernetes Services and External Load balancers with BGP but when deployed in an AWS environment, there is no way to setup peering with AWS network infrastructure.
Access to the Kubernetes API server can be done via an Elastic Load Balancer can be done using different options such as (i) Public Endpoints (ii) Public and Private Endpoints and (iii) Private Endpoints.
For outgoing connections from the pods to EKS control plane, there are no issues if the Calico IP Pool uses VPC subnets. However, if custom user-defined subnets are in play, Calico should be configured to NAT traffic and for traffic from the pods to any destination outside of all other Calico IP pools, NAT comes into play. The other workaround would be to deploy pods that require control plane nodes access with hostNetwork:true in their pod spec.
Calico supports integration with AWS Cloud Provider to provision Elastic Load balancer to expose Kubernetes services externally. Whenever a Kubernetes Service of type “LoadBalancer” annotated with “service.beta.kubernetes.io/aws-load-balancer-type” with value “nlb,” for example, a network load balancer (NLB) is created to allow external access to Kubernetes Services. The NLB gets deployed with a listener corresponding to the Kubernetes Service port and forwards traffic to the EC2 nodes where it is further handled with the NodePort construct of Kubernetes. The external IP specified by Kubernetes for the load balancer can be either from a private or public VPC subnet.
4.0 k8s CNIs for Private Cloud
When it comes to Private Cloud, several options are available to choose and deploy CNIs. For example Calico, Cilium are Linux Kernel based CNIs while for OpenShift platform the CNI OpenShift SDN which configures an overlay network using Open vSwitch.
There are other CNIs such as Cisco ACI-CNI that provides integration between the Kubernetes Cluster and the network controller. ACI-CNI uses Open vSwitch to provide connectivity to containers.
Most CNI plugins allow the container to one network interface only. But there are other use cases such as NFV where multiple network interfaces on the container are desired. In such cases the Multus CNI allows pods to have multiple network interface connections
In the first version, we will look at how Calico CNI can be integrated with the Data Center network where the Kubernetes Cluster is deployed.
4.1 Calico CNI for Private Cloud
As stated earlier, Calico supports either direct routing or network overlay using VXLAN or IPinIP (default) encapsulations to exchange traffic between workloads. The advantage of the overlay network approach is the underlying physical network only needs to provide IP connectivity between the Kubernetes nodes whereas in the direct routing approach, the underlying network should also be aware of the IP addresses used by workloads. However, the overlay approach means additional performance overhead on the Kubernetes nodes.
The preferred mode when deploying Calico CNI plugin in an on-prem Kubernetes cluster would be to use direct routing since the underlying network fabric can provide reachability to and from the workload IPs but also to the Kubernetes services running in the cluster.
Digging a little bit more on how Calico handles routing, on each node there exists a virtual router backed by the Linux kernel. The virtual router advertised the endpoints that are reachable through it to other nodes as well as to the network fabric using BGP (Border Gateway Protocol). When it comes to setting up BGP with Calico, there are two options;
1. BGP AS per rack model
2. BGP AS per compute server model
If the Kubernetes with Calico CNI plugin is being deployed with Cisco ACI Fabric for example, the recommendation is to use the AS per compute server model.
In this model, each of the Calico BGP virtual routers peer with the ToR switches which help to exchange routes amongst all the Calico virtual routers in the cluster. On each node, the Calico virtual router will advertise the Node Subnet, POD Subnet and Service Subnets along with any host routes for Kubernetes Services or Load Balancers to the ACI Fabric.
For egress traffic from the Kubernetes cluster, the nodes only require a default route that uses the ACI L3Out as the next-hop both for node-to-node traffic and for node-to-outside communications.
5.0 Recommendations
When deploying AWS EKS clusters, if the reachability from the Worker Nodes and PODs to the Kubernetes API is designed properly, then using a single CNI like Calico that offers both connectivity and network policy enforcement would be the right way to go. On the other hand, if the clusters are small, then AWS VPC CNI augmented by Calico CNI would work fine.
For self-managed Kubernetes clusters in AWS, since you are in complete control of the cluster, going with Calico or other CNIs that are feature rich would be the best possible option. Note that even though AWS does support both Linux Kernel based CNIs and OVS based CNIs, Linux based CNIs should be the starting choice due to their simplicity and you could potentially use eBFP with the right AMIs.
For Private Clouds, Calico provides a good start due to its support for direct routing, Network Policy enforcement and routing protocols to interact with the data center network. Additionally, Calico can also use eBPF mode to improve performance. However, there are other CNIs that completely run in User-Space and use DPDK to provide even higher performance.