As of Dec, 2017, AWS accounted for 57% of the total Kubernetes workloads in operation. This is not surprising, given AWS is the leading public cloud service provider and has built an ecosystem around its services. What could come as a surprise to most CFO’s and IT managers is the AWS bill they get handed at the end of each month.
Actual Kubernetes costs on AWS can vary significantly from estimated amounts, leaving a lot of room for cost reductions to be made. Why does this happen and what can you do about it? Read on to learn about the 7 things that can help optimize your AWS Kubernetes costs.
Kubernetes deployments on AWS abstract resources from the underlying EC2 compute infrastructure. In a production environment, a Kubernetes node will most likely refer to an AWS instance. Choosing the right AWS instance to run Kubernetes clusters has a direct impact on AWS Kubernetes costs.
Instances come in all shapes and sizes, with different combinations of memory and compute resources. The same is true of Kubernetes pods, with differing resource requirements. The challenge with controlling AWS Kubernetes costs is to ensure pods stack efficiently onto your AWS instances.
A helpful way of thinking about this is in terms of the video game Tetris. In this analogy, the screen represents an AWS instance and the blocks represent Kubernetes pods. However, one crucial difference with a game of tetris is that you can change the screen size i.e. choose another AWS instance, which more closely matches your pod size. You can also change the block (pod) size, but we will get into that later.
The size, number and historical resource usage patterns of pods all factor into the decision of choosing an appropriate AWS instance. Applications can also be either memory or CPU intensive, which also has an impact on the type of instance that should be chosen.
Ultimately, ensuring that the resource usage of your Kubernetes pods corresponds with the total CPU and memory available on the AWS instances they leverage is crucial to ensuring optimal resource usage and reducing AWS costs.
AWS instances also come in several billing profiles; on-demand, reserved and spot instances. Spot instances are the cheapest but can be terminated with a notice of 2 minutes, whereas on-demand instances are the costliest but provide the highest level of reliability. You can also reserve instances for a fixed period, to take advantage of lower costs. In turn decisions about which instance type to use directly impact the costs of running Kubernetes on AWS.
Reserved instances can reduce AWS instance costs by up-to 75%, right off the bat.
Spot instances are usually recommended for non mission-critical workloads with a high level of tolerance for interruptions. Recently however, intelligent workload management tools are enabling companies to leverage spot instances for mission critical applications too.
One such open source tool is the K8s spot rescheduler, which allows Kubernetes deployments to run on AWS spot instances. It leverages an AWS AutoScaling group backed by on-demand instances and managed by the Kubernetes cluster autoscaler. Pods running on spot instances, are seamlessly transferred to on-demand instances, whenever those spot instances are to be terminated. A related tool, the K8s spot instance termination handler monitors AWS spot instances for termination notices. Once the spot instances come back up, the re-scheduler moves the pods back to the spot instances, allowing the on-demand instances to be scaled down by the cluster autoscaler.
Spot instances can save you up-to 90% of your regular EC2 on-demand instance costs.
Regardless of whether you choose on-demand, reserved or spot instances to run your Kubernetes clusters, ensuring that you terminate underutilized nodes is essential to cost control. AWS EC2 instances are billed based on the amount of time they are provisioned. Underutilized instances have a much larger resource footprint than required, but they still contribute the full costs of running an instance.
The K8s AWS autoscaler from Onfido is one such tool that can help terminate underutilized instances. It allows you to define an AWS autoscaling group and set a benchmark for minimum allowed resource requests.
The autoscaler loops through the autoscaling group every 300 seconds and detaches, drains and terminates all nodes where the current average resource usage is less than the benchmark. It also has a nice slack hook that will notify the slack channel of your choice whenever a scaling event happens.
Pod size is also an important element to controlling AWS Kubernetes costs. Kubernetes allows DevOps to set pod sizes by reserving a minimum amount of resources for each container.
Reserving resources introduces an element of uncertainty, since the resource consumption of pods tends to fluctuate over time. Additionally, these decisions are hard to make in the absence of historical data on resource usage. Devs usually counter this by reserving more resources than are required which ultimately leads to bloated pods and stranded resources.
Monitoring the historical resource usage of pods and ensuring they are allocated the appropriate amount of resources, can lead to significant cost reductions for the underlying AWS infrastructure.
Kube resource explorer is an open source tool which allows you to monitor historical as well as average resource usage. Ops teams can use the information collected to make informed decisions about resizing pods/updating resource requests.
Unpredictable workloads and applications whose traffic profile fluctuates, can result in pods sitting idle at certain times. Kubernetes test environments can also get congested really fast with idle pods, if one is not careful.
The horizontal pod autoscaler is a Kubernetes feature that allows you to scale pod replicas that are part of a deployment. It will also reduce the number of pods in response to utilization metrics. Since pods consume resources, having a leaner deployment more attuned to real-time usage can help make an impact on AWS Kubernetes costs.
A related concept to Kubernetes requests and limits is quality of service. Kubernetes classifies pods into different quality of service tiers depending on whether they have been allocated guaranteed resources or not. There are three QOS classes in Kubernetes: Guaranteed, Burstable and Best-Effort.
Best-Effort pods are ones which do not have any resource requests or limits defined. These pods are first in line to be terminated, whenever the system runs out of memory. However, one interesting aspect of Best-Effort pods is that they can use any amount of free memory in the node.
Kubernetes also recommends automatic bin packing, using a mix of critical and best-effort pods to improve utilization. Best-Effort pods could be a good fit for non-critical stateless applications that have a high level of tolerance for disruptions. Even though this does not result in outright AWS cost reduction, it eats into stranded memory, and can help you match expected to actual costs.
Tagging resources is good practice in any environment, be it cloud, on-premise or containers. In enterprise Kubernetes environments, with multiple test, staging and production environments, some resources are bound to fall below the radar. These resources are a constant drain on AWS costs, while serving no purpose. Tagging allows companies to ensure resources do not end up in resource-purgatory.
AWS has a pretty extensive tagging mechanism which can be used to tag resources allocated to Kubernetes. These tags will allow you to keep track of resources, resource owners and utilization. Efficient tagging helps quickly identify the owners of idle or underutilized resources and terminate them. Once these tags are activated in AWS Billing and Cost Management console, you will also be able to allocate costs and access cost breakdowns for different resources.
The screenshot below is an optimization report Replex generated for one of our customers. The customer was leveraging a mixed fleet of 2500 reserved and spot instances for their Kubernetes deployment. Replex.io’s proprietary algorithms compiled and analyzed metrics from AWS, third party monitoring tools and Kubernetes itself to recommend a more optimized AWS instance fleet. We also identified and corrected resource inefficiencies, resulting in a cost reduction of 30%.
Request a quick 20 minute demo to learn how best to levereage AWS instances for your Kubernetes clusters and make infrastructure cost savings of up to 30% using Replex.io.
Fan of all things cloud, containers and micro-services!
This is the second instalment in our blog series about monitoring Kubernetes resource metrics in production. In this post, we complement the process of Kubernetes resource monitoring with Prometheus by installing Grafana and leveraging the Prometheus data source to create information-rich dashboards in a user-friendly visual format.
April 3, 2019
6 min read
In this blog post we will take a deep dive into Kubernetes QOS classes. We will start by looking at the factors that determine whether a pod is assigned a Guaranteed, Burstable or BestEffort QOS class. We will then look at how QOS class impacts the way pods are scheduled by the Kubernetes Scheduler, how it impacts the eviction order of pods by the Kubelet as well as what happens to them during node OOM events.
March 22, 2019
6 min read
In this instalment of the Kubernetes in Production blog series, we take a deep dive into monitoring Kubernetes resource metrics. We will see why monitoring resources is important for Kubernetes in production, choosing which resource metrics to monitor, setting up the tools required including Metrics-Server and Prometheus and querying metrics.
March 7, 2019
6 min read