Public cloud adoption is at a historic high. According to a 2019 survey by Rightscale, 91% of respondents are using a public cloud provider. This is not surprising given the flexibility, agility and scalability that public cloud brings to enterprise workloads. What is surprising however, is that only 10% of on-premises workloads have moved to the cloud. Out of these nearly 80% have moved back.
Given these numbers public cloud providers have started to position Kubernetes offerings specifically targeted towards on-premises deployments. GKE on-premises which is a part of the larger Anthos set of tools is Google's offering. Vmware, Pivotal and Rancher also have solutions specifically targeted towards on-premises Kubernetes.
The increased interest in on-premises Kubernetes deployments is also supported by our conversations with customers. A large part of our customers approach us exclusively for cost allocation, governance and efficiency monitoring for on-premises kubernetes environments.
In this article we will outline the challenges of cost allocation, governance, future proofing, forecasting and performance for on-premises Kubernetes deployments. We will also review the replex solution, dive into how it addresses those challenges and provide a walkthrough of how it works on-premises.
The increased uptake in on-premises Kubernetes is a direct outcome of its inherent portability. Containerized workloads are no longer locked into proprietary VM formats and are freed from OS dependencies. Abstracting the underlying infrastructure layer into nodes and clusters also means that workloads can be moved seamlessly between various on-premises and cloud hosts.
Deploying a highly available, secure and scalable Kubernetes on-premises environment is a hard nut to crack. Besides the obvious setup and configuration involved in spinning up Kubernetes clusters, preparing the underlying server, networking and storage infrastructure also has a steep learning curve.
Once up and running, on-premises Kubernetes environments also throw up multiple day 2 challenges. These challenges mostly encompass the mechanics of operating Kubernetes clusters in production. They also include challenges such as cost allocation, governance, cost control, efficiency and utilization that are central to the security posture of the Kubernetes environment as well as cost control, future proofing and forecasting.
Most of these challenges arise as a result of the shared resources model of Kubernetes. Kubernetes clusters are abstractions that pool together resources from the underlying on-premises infrastructure layer. Clusters are a collection of nodes that represent individual physical machines.
As these resources are shared among the workloads (which themselves are abstracted into containers and pods) running on top of the cluster, it is often difficult to figure out how much resources individual workloads are consuming.
This lack of visibility extends to individual teams, projects and clients that are sharing cluster resources. This in turn translates into an inability to allocate costs, which essentially means that ITDMs don't have complete visibility into how much resources are being consumed by individual teams/projects as well as the share of those teams/projects of the total on-premises Kubernetes costs.
Future proofing and forecasting are also important challenges that need to be addressed. Without visibility into the efficiency of the underlying infrastructure, ITDMs are left without a mechanism to forecast future requirements and plan accordingly. This also holds true for the individual environments (dev, staging, production) and teams that are sharing the on-premises Kubernetes environment.
ITDMs need insights into the efficiency of these teams and environments in order to make informed resource allocation and re-distribution decisions and to ensure optimal usage of the available resources. For example, efficiency is a pretty good indicator of whether teams or environments are over/under requesting resources. Once teams or environments that are under/over requesting resources are identified ITDMs can trim or increase allocated resources.
Replex helps enterprises gain cost and performance insights into their on-premises Kubernetes environments. Let’s take a detailed look at each:
The replex solution when deployed on-premises helps enterprises allocate the costs of the underlying infrastructure/total costs of running their Kubernetes environments on-premises to primitives like pods, containers, namespaces, jobs, StatefulSets, Deployments, Services and DaemonSets. Costs can also be allocated to custom organizational groupings like teams, business units, projects, environments (Dev, staging, production), clients and departments.
ITDMs also gain granular insights into the efficiency of the clusters/underlying infrastructure, the Kubernetes primitives running on top as well as custom groupings like teams and applications. Once they have access to these insights, ITDMs can then make informed forecasting and capacity planning decisions to increase or decrease the resource footprint of the underlying infrastructure. They can also make informed re-allocation decisions by trimming or increasing the resources allocated to individual teams/projects based on efficiency, in the process controlling the costs of those teams.
For on-premises Kubernetes environments the replex solution can be deployed in a number of ways: via a helm chart, a yaml file or kubectl. Installing the helm chart will spin up the replex agent as a Kubernetes deployment. The agent maintains one instance/deployment per cluster and requires Prometheus or any other metrics provider to be pre-installed in the cluster.
Once installed the replex agent starts collecting metrics and forwards them to the replex server. Cluster metrics collected include container CPU usage, container memory usage and container disk usage etc. The agent also collects cluster topology information from the Kubernetes API. Cluster topology includes information about the structure of the cluster as well as information about the Kubernetes primitives and objects running inside the cluster.
Once metrics have been collected and aggregated, performance and cost insights for the on-premises Kubernetes environment can be accessed on the replex web interface. Let’s take a look at some of these performance and cost insights.
The replex solution aggregates performance metrics and cluster topology information to provide granular performance and efficiency insights about the customer’s on-premises Kubernetes environment.
On the Kubernetes layer, customers can access performance insights for Kubernetes primitives including clusters, nodes, pods, containers and deployments as well as custom organizational groupings including teams, projects, applications and clients.
These insights allow enterprises to forecast future usage, make informed scaling decisions and future proof infrastructure. They can also make proactive resource allocation and re-distribution decisions for teams and projects based on usage and efficiency numbers.
In addition to efficiency and performance metrics, the replex solution also helps enterprises allocate the total costs of running on-premises Kubernetes environments to both Kubernetes primitives as well as custom organizational groupings.
For cloud setups, replex aggregates cost information through the cloud providers billing API. As this information is not available for on-prem setups, customers are able to upload cost information via the replex interface. Cost information usually includes unit costs for CPU, memory, RAM, storage and networking as well as other miscellaneous costs. This is a prerequisite for cost allocation for on-premises Kubernetes environments.
Once unit costs have been added, the replex solution then aggregates these costs and analyses them based on the cluster topology information already collected and generates granular cost allocation reports. Cost allocation reports are provided across native Kubernetes objects like namespaces, Jobs, StatefulSets, Deployments and Services as well as for custom organizational groupings including teams, projects, environments and clients.
In this article we outlined day 1 and day 2 challenges that enterprises face when operating and managing Kubernetes environments on-premises. We took a deep dive into some of these challenges including cost allocation, performance, forecasting and governance before outlining how replex helps mitigate these challenges. We also reviewed the mechanics of the replex solution, how it works in practice and how it helps enterprises monitor performance and efficiency, future proof infrastructure and allocate costs for on-premises Kubernetes deployments.
Interested in learning more?
Fan of all things cloud, containers and micro-services!
A step by step walkthrough of deploying a highly available, reliable and resilient Kubernetes cluster leveraging AWS EC2 spot instances as worker nodes using both Kops and EKS.
October 9, 2019
6 min read
FinOps is a cross domain discipline that represents a set of tools, best practices and processes aimed towards making software and infrastructure more cost effective. In this article we provide an introduction to Kubernetes Finops.
September 10, 2019
6 min read