How to manage CPU in Kubernetes and keep your foot safe

In today’s complex cloud-native landscapes, Kubernetes (aka k8s) stands as a pivotal technology that simplifies the management and orchestration of containerized applications. However, as developers and operators dive deeper into Kubernetes, it’s essential to understand how resource management within K8s operates, particularly regarding CPU requests and limits. This article aims to shed light on Kubernetes, its CPU resource management capabilities, and how it intricately ties with the Linux Completely Fair Scheduler (CFS). In other words we’ll try to answer this question: “How to manage CPU in Kubernetes?” Also we’ll look into how to avoid shooting yourself in the foot with all these lovely capabilities.

What is Kubernetes?

In case you’ve been happily living under a rock for a while already Kubernetes is awesome. It’s an open-source platform for automating the deployment, scaling, and management of containerized applications. Kubernetes was originally developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF). K8s provides a robust framework to run distributed systems resiliently. It handles load balancing, provides self-healing capabilities, automated rollouts and rollbacks, and secret and configuration management seamlessly.

Kubernetes operates through a collection of services and controllers that run across clusters composed of master (or control plane) and worker (or node) infrastructures. The control plane provides the overall orchestration commands like scheduling decisions, maintaining application state, and scaling operations. Each worker node runs runtime components, such as a container runtime (usually Docker or containerd). It also interacts with the control plane to manage the lifecycle of containers running on it.

One critical aspect of Kubernetes is its ability to abstract physical and cloud resources and present them as unified entities. That allows developers to focus more on application logic rather than the underlying infrastructure.

Typically k8s is a key enabler for software based on microservices. You can learn more about microservices from my blog here & here.

Understanding CPU Requests and Limits in Kubernetes

Resource management in Kubernetes is crucial for ensuring that applications run efficiently and reliably within a cluster. “How to manage CPU in Kubernetes?” one would ask. Two primary mechanisms in Kubernetes for managing CPU resources are requests and limits:

Requests: a request is the amount of CPU that a container is guaranteed to receive. A typical worker node in a cluster has multiple containers running. Kubernetes ensures that each container gets at least its requested amount of CPU time. This reservation ensures that your essential services have the resources they need to function properly, even under load.

Limits: a limit is the maximum amount of CPU a container is allowed to use. A container trying to exceed this limit will be throttled. “Throttling” means that container will only be allowed to use up to the specified limit. This prevents a container from hogging resources and impacting other applications running on the same node.

These mechanisms allow Kubernetes to provide Quality of Service (QoS) classes to pods. These, in turn, dictate their scheduling and execution priorities:

Guaranteed: a pod is considered guaranteed if every container in it defines requests and limits for every resource. What is also important is the values for requests and limits are the same. These pods have high priority.

Burstable: a pod is burstable if it has a specified request but limit is larger than the request. If limit is not specified at all then it’s set to match the node’s capacity pod is scheduled to. Such pods can use more resources when available but will be throttled once their limit is reached (if any). Spare CPU capacity (if any) will be split between burstable pods when they need it.

Best Effort: pods without any requests or limits fall under this category. Requests are considered equal to zero in such case. Such pods have the lowest priority in terms of resource allocation.

Completely Fair Scheduler in Linux

The Linux Completely Fair Scheduler (CFS) is the default process scheduler in the Linux kernel. It’s been introduced in version 2.6.23. The CFS is designed for efficiency and fairness, aiming to allocate CPU time to processes in a balanced manner. It also aims to minimize the need for queue management found in traditional round-robin or priority-based scheduling models.

CFS operates by simulating an “ideal, precise multitasking CPU” which divides CPU time equally among all running processes. Instead of relying on fixed time slices, CFS uses a model called “fair clock,” in which each process accumulates virtual runtime based on its execution. The scheduler tries to ensure that all tasks accumulate the same amount of virtual runtime. To achieve that CFS adjusts their access to CPU cycles accordingly. Obviously “fair” approach takes into account process priorities, so higher prios means more CPU cycles.

Processes that haven’t run as much as others get preferential CPU time until their runtimes catch up. That helps ensure fairness across processes with varying CPU usage demands. CFS achieves this balance using a red-black tree to manage processes according to their virtual runtimes. The process with the smallest virtual runtime is the next one to run.

When you ask “How to manage CPU in Kubernetes?” the CFS is just around the corner.

Implementing CPU Limits in Kubernetes Using CFS

Kubernetes leverages the capabilities of CFS to enforce the CPU limits specified in a pod’s configuration. When a container is started within a Kubernetes-managed system, its resource requests and limits are translated into Linux’s control groups (cgroups) parameters that the CFS uses to enforce these constraints.

CPU Requests: Kubernetes, via the container runtime, configures cgroups to ensure that the minimum CPU requested by the container is proportionally dedicated. If the CPU on the node is available beyond this request, the container may use more, but it has at least the requested CPU time guaranteed.

CPU Limits: to enforce CPU usage limits, Kubernetes configures the CFS bandwidth control settings in the Linux kernel. This uses the period duration (in microseconds) and quota (again in microseconds) parameters in cgroups. Quota corresponds to the available CPU time for the container within the period specified, effectively capping its CPU usage. Quota can also exceed the period what means the container is allowed to use more than 1 CPU core.

For example, if a container specifies a limit of 0.5 CPU and the period is set to 100,000 microseconds (0.1 seconds), the quota is set to 50,000 microseconds as well. This means that on average the container won’t use more than 0.5 CPU core – but CPU can only be used in full or not used at. It means that typically the container will run for 50 milliseconds every 100 milliseconds. That way CFS prevents it from consuming more than the allotted share of CPU time. No more guesses when asked “How to manage CPU in Kubernetes?”

The default k8s period is 100,000 microseconds what matches CFS default. That’s untuneable and somewhat “low granularity” setting. For comparison upon scheduling CFS aims at timeslices of few microseconds (per sched_min_granularity_ns and sched_latency_ns settings). The machinery behind calculating an actual timeslice is pretty complex but the difference is illustrative enough. The feature of making the period customizable is behind a feature gate (CustomCPUCFSQuotaPeriod) since version 1.12 of k8s (the latest at the time of writing is 1.31) and still in “Alpha” stage. Effectively that means we can assume the period is 100 ms for all cgroups managed by k8s.

CFS Bandwidth Control in Practice

Personally I think CPU limits make the most sense for heavily multi-tenant clusters since whenever a pod is throttled its performance sucks. If you want to ensure tenants won’t suffer from a noisy neighbor limits are the way to go. Otherwise you can just use requests and allow the spare CPU to be spread across containers which need it. Let’s dive deep into some experiments with CPU-intensive Python code on Minikube with Docker driver – very far from an optimal test setting but still pretty illustrative.

Let’s say I have this weird function (please don’t use this for anything serious):

def extra_slow_fib(n):
    if n < 2:
        return 1
    return extra_slow_fib(n - 1) + extra_slow_fib(n - 2)

This code has multiple flaws leading to excessive CPU & stack usage but it’s perfect to generate some CPU load =)

If I run this code to calculate 36th Fibonacci number in an infinite loop via Minikube I get one CPU core at 100% yielding a result every ~6 seconds. If I set up a CPU limit of 0.5 then the usage is dropped to 50% (CFS rocks) but results start to appear every ~15 seconds on average (and spread of values increased as well). With CPU limit of 0.1 usage drops to 10% but we start to get a result every 3 minutes (180 seconds) with even higher spread. Let’s summarize these observations in the following table:

CPU Limit	Expected Perfomance Drop	Actual Performance Drop
None	Baseline	N/A
0.5 CPU	x2	x2.5
0.1 CPU	x10	x30

The less CPU we allow to consume the higher the overhead becomes killing performance.

Another interesting effect of CFS throttling is that quota is assigned to all CPU-bound threads inside of a container. To illustrate that I use multiprocessing Python module to sidestep GIL and run 2 Fibonacci calculations in parallel. Without limits I get two CPU cores at 100% again yielding results every ~6 seconds (actually a bit slower as there’s some overhead from multiprocessing), so it’s 2 calculations every 6 seconds.

If I set up a CPU limit of 1 then 2 CPU cores are used at 50% yielding results every ~15 seconds, so the CPU core is shared between 2 CPU-bound threads who both want it eagerly hence the 50/50 split.

Conclusion

Kubernetes offers powerful mechanisms for orchestrating containers across distributed cloud environments, with effective resource management being a cornerstone of its functionality. By understanding how CPU requests and limits are implemented, and the role of the Linux Completely Fair Scheduler, operators can ensure efficient resource utilization, fair resource allocation, and stable application performance. The synergy between Kubernetes and the CFS not only optimizes the use of available resources but also ensures that applications run smoothly within the multi-tenant environments that modern enterprises demand. I bet the question of “How to manage CPU in Kubernetes?” shouldn’t pop up anymore. Just don’t shoot yourself in the foot!