This is part 4 of our Introduction to FinOps for Kubernetes: Challenges and Best Practices article series. In Part 1 we outlined some of the core challenges associated with implementing FinOps processes for cloud native Kubernetes environments. Part 2 and 3 outlined real world FinOps best practices that could be employed in cloud native Kubernetes environments.
This instalment will extend the best practices list outlined in the previous editions as well as identifying how best to overcome the challenges that Kubernetes throws up in a FinOps context.
Despite the best efforts of FinOps teams some proportion of Kubernetes cluster costs are bound to end up as unallocated. These costs are usually leftover costs incurred when clusters/nodes are running but not being used. Most organizations will see some idle/unallocated costs in the course of their cluster lifecycle. Properly allocating these costs is crucial to ensuring the integrity and accuracy of the overall K8s FinOps process.
A best practice when allocating idle costs is to allocate these costs to a central team operating/managing that cluster. Since teams responsible for operating or managing clusters already have a defined budget, allocating idle costs to them will ensure that they take an active interest in reducing wastage from idle resources.
Another way of dealing with idle costs for a cluster is to allocate them to developer teams running containers on that cluster. A best practice is to proportionally allocate idle costs to these teams: for example a team that uses 40% of the cluster resources will be allocated 40% of the idle resources on that cluster.
Containers and pods are base Kubernetes abstractions that consume underlying cloud resources like vCPU and memory. Kubernetes allows developers and engineers to set the amount of resource requests and limits for these containers. Most environments also provision pods from pre-configured yaml configuration files that might not necessarily be optimized in terms of resource usage.
A best practice in this context is to rightsize containers and pods to ensure efficient resource usage and utilization of the underlying cloud VMs.
Here is a screenshot of pod right sizing recommendations from the Replex UI:
As can be seen from the screenshot above, CPU is over provisioned while RAM is under-provisioned. Therefore the Replex system recommends reducing CPU requests to reduce wastage while increasing memory requests to avoid OOM events.
The density with which pods are packed onto underlying nodes affects the utilization and in turn the amount of idle resources on that node. The tighter pods are packed on a node the less nodes need to run. On the cluster level, by intelligently relocating pods, nodes with low utilization can be freed up and safely turned-off.
A best practice in this context is to regularly monitor cluster utilization and flag any underutilized nodes. Admins can then tweak the scheduler or manually re-locate pods to free up those nodes for decommissioning.
Here is a screenshot of cluster resource utilization from the Replex UI:
The dashboard tracks resource efficiency for CPU, RAM and Disk as well as flagging the idle resources for all three.
Another best practice is to deploy the horizontal pod autoscaler (HPA). HPA autoscales the number of pods in a deployment based on a defined resource utilization threshold. If the utilization is lower than the defined threshold HPA will reduce the number of pods.
Once pods have been right sized and efficiently packed onto nodes, any nodes with lower utilization are potential candidates for decommissioning.
A best practice is to flag these nodes and de-commission them after notifying the team that owns those nodes. This is an important step that needs to be codified as part of the broader K8s FinOps framework, since some of those nodes might be running mission-critical applications or running pods that require significant resource overhead to handle spikes in usage.
Here is a screenshot from the Replex Nodes dashboard:
The Nodes dashboard outlines the instance type of each node, the clusters they belong to as well as the CPU and memory utilization of nodes. Using this dashboard, K8s admins can quickly identify low-performing nodes and notify the requisite team to initiate the decommissioning process.
Most cloud providers differentiate between instances based on the workloads or applications they are best suited to run. For example AWS breaks down instances into compute optimized, memory optimized or storage optimized instances. As is apparent from the names, these instances are optimized to run compute, memory or storage intensive applications.
A best practice in this context is to ensure that the CPU to memory ratio of K8s workloads matches that of the underlying nodes. A better CPU to memory ratio will ensure efficient utilization of node resources, reduced wastage and turn lower costs.
Interested in learning more about the FinOps framework? Download our detailed guide to Cloud FinOps for FinOps teams, executives, DevOps, engineering, finance and procurement.
Fan of all things cloud, containers and micro-services!
In a recent report, CNCF identified "a more granular and active Kubernetes cost-monitoring strategy" as a primary means of reducing K8s cost. In this article we identify major takeaways from the report and outline the contours of a comprehensive Kubernetes cost monitoring strategy.
August 12, 2021
4 min read
Part 3 of our Introduction to FinOps for Kubernetes: Challenges and Best Practices article series, which outlines a comprehensive list of best practices aimed at implementing FinOps processes for cloud native Kubernetes environments.
July 12, 2021
4 min read