Production kubernetes environments are complex beasts. DevOps teams need to ensure their deployments are highly available, scalable and cost-effective. Efficiently managing Kubernetes resources feeds into all these requirements. In this blog post we take a deep dive into Kubernetes resources and how you can ensure optimum resource usage.
Kubernetes resources are the bread and butter of a production environment. Containers and pods deploy on the underlying cloud on on-premise infrastructure, consuming resources in the process. As environments scale, resource usage increases and costs go up. Inefficiencies in the shape of over-capacity also creep in, driving costs up even more. Both IT managers and ops teams need to keep an eye on Kubernetes resource usage to ensure their production environments are cost-effective, optimized and run seamlessly.
There are two types of resources in Kubernetes; CPU and Memory. CPU can refer to 1 AWS vCPU, 1 Azure core or corresponding abstractions on other cloud providers or physical infrastructure.
For reference, an m5.xlarge AWS instance has four vCPUs. Kubernetes, however, doesn't stop there; vCPU's can be broken down further into discrete bits of compute called millicpu. Memory is measured in units of bytes. An m5.xlarge AWS instance has 16 GiB of memory.
Kubernetes resource usage refers to the amount of resources that are consumed by a container or pod in a production Kubernetes environment. Keeping tabs on the resource usage of containers and pods is an important function of ops teams.
One obvious reason for this is cost. Since Kubernetes resources are abstractions on the underlying compute infrastructure (AWS, Azure, bare-metal), usage translates into cost. Monitoring kubernetes resource usage for containers and pods is also important for cost allocation efforts.
Resource usage is also an important indicator of how optimized your Kubernetes environment is. Ideally, ops teams would want to maximize the percentage of resources consumed by pods, of the total amount of resources available on the cluster.
An easy way of thinking about optimizing Kubernetes resource usage is in terms of cloud provider instances. Let's assume you have a fleet of containers leveraging an m5.xlarge instance on AWS. This particular AWS instance has four vCPUs, which translate into 4000 millicores or millicpu on Kubernetes. An optimized Kubernetes environment is one in which the average CPU usage of the containers running on this instance is close to 4000 millicores.
Kubernetes, like all shared systems, also has an inherent risk of resource contention between containers or pods running on a single machine. By default (and rushed devs), containers are provisioned without any limits on the amount of resources they can consume.
Since there are no limits on resource consumption, containers can hog cluster resources, when they exceed their regular resource consumption patterns. This leads to resource deficiency with other pods/containers not having enough resources to run.
Pods can also fail to run because nodes on your cluster do not have enough resources to run them. Keeping track of resource usage on specific nodes or clusters and ensuring there are enough resources on individual nodes to support them is also important.
There are a number of Kubernetes features and open source tools that can help you manage and optimize resource usage.
One handy way of managing Kubernetes resources is to use resource requests and limits. Resource requests and limits allow you to allocate discrete bits of compute and memory to each container.
Resource requests are the amount of resources being requested for a container. Kubernetes will ensure that the requested amount of resources are reserved for that specific container. Resource limits, on the other hand are the maximum amount of resources that can be consumed by the container. Container level resource requests and limits add up to pod resource requests and limits.
Decisions about resource requests and limits are hard to make in the absence of any historical data about resource usage patterns of containers. Manual allocation of resource requests and limits can lead to bloated pods, using only a fraction of the total resources allocated.
One way to avoid bloated pods with a larger resource footprint than required is to set resource requests and limits on a namespace level. Namespaces are virtual partitions of your Kubernetes clusters that are allocated to specific services and teams.
Resource quotas allow you to set resource requests and limits as a whole for all the containers inside that namespace.
You can also control the maximum amount of resources that can be allocated to individual containers inside a namespace, using Limit range.
Resource quotas and limit ranges ultimately run into the same problem; resources have to be allocated using a combination of guess work and approximations.
The cluster autoscaler is an open source tool that can be used to resize individual clusters, based on resource requirements and usage. The cluster autoscaler adds nodes to a cluster whenever a pod fails to run on that cluster due to resource deficiencies. It can also remove nodes whenever they are underutilized.
Horizontal pod autoscaling (HPA) is a Kubernetes feature which automatically adds or removes pod replicas from Kubernetes replication controllers, based on resource utilization metrics. These include replica sets, deployments and replication controller. HPA makes scaling decisions based on Kubernetes metrics like CPU as well as custom defined metrics.
Kubernetes also recently introduced vertical pod autoscaling (VPA). The vertical pod autoscaler (which is still in alpha) has two functions: automate the process of reserving pod resources and improving cluster resource utilization.
VPA monitors pod utilization and OOM (out of memory) events and comes up with recommended resource request amounts for individual pods. Recommended resource requests can then be applied to newly created pods.
Updates can also be applied to already existing pods, however, to do this they have to be terminated first. Pods can then be re-started with the updated resource requests. It is important to note that VPA only controls the resource requests and sets the limits to infinity.
Kubernetes resource usage is a prominent feature of production Kubernetes implementations, not least because of its large cost footprint. This means that managing Kubernetes resources to ensure optimal resource usage, ranks pretty high on the “to-do” list of ops teams . Kubernetes built-in features and open source tools do help you get started with resource utilization. However, enterprise setups with hybrid environments made up of multiple public, private clouds and on-premise infrastructure need a solution that covers a bigger surface area with more granular resource usage monitoring and real-time optimization. Replex provides all this plus complete control over infrastructure governance and cost.
Request a quick 20 minute demo to see how you can ensure optimal Kubernetes resource usage and save up to 30% on infrastructure costs using Replex.io
Fan of all things cloud, containers and micro-services!
Part 4 of our Introduction to FinOps for Kubernetes: Challenges and Best Practices article series, which outlines a comprehensive list of best practices aimed at implementing FinOps processes for cloud native Kubernetes environments.
August 26, 2021
5 min read
In a recent report, CNCF identified "a more granular and active Kubernetes cost-monitoring strategy" as a primary means of reducing K8s cost. In this article we identify major takeaways from the report and outline the contours of a comprehensive Kubernetes cost monitoring strategy.
August 12, 2021
5 min read
Part 3 of our Introduction to FinOps for Kubernetes: Challenges and Best Practices article series, which outlines a comprehensive list of best practices aimed at implementing FinOps processes for cloud native Kubernetes environments.
July 12, 2021
5 min read