Kubernetes and containers have made software applications more portable, scalable and helped improve resource utilisation. For DevOps however Kubernetes has a much broader appeal: the ability to configure, manage and operate containerised microservices at scale. Kubernetes allows them to bake in a degree of automation into the creation, deployment, scaling and configuration of these applications that significantly reduces the management overhead and the probability of mistakes happening.
There is one caveat to all this however: It doesn’t work so well for stateful applications. Deploying, scaling, operating and configuring stateful applications and building in automation requires a lot more input from DevOps in the shape of application specific domain knowledge.
Enter Kubernetes operators. Operators are built for specific applications that make it easier to create, configure and manage those applications on Kubernetes. Most operators also extend across the entire application lifecycle making it easier to perform operational tasks like scaling, upgrading, backup and recovery of complex stateful applications. Since they use and extend the Kubernetes API, they are tightly integrated in the Kubernetes framework.
Here is a list of some of the most common functions that Kubernetes operators perform:
Operators leverage the extensibility and modularity of Kubernetes to help automate administrative and operational tasks involved in creating, configuring and managing Kubernetes applications.
Operators build on the concepts of custom Kubernetes controllers (CRDs) and custom resources and allow DevOps to incorporate operational knowledge into how applications are managed on Kubernetes. They act on CRDs to ensure the actual state of the cluster matches that defined in the CRDs.
The prometheus operator from CoreOs is a great example. It is deployed as a custom Kubernetes controller that watches the Kubernetes API for four custom resource definitions: Prometheus, ServiceMonitor, PrometheusRule and AlertManager. Once deployed the Prometheus operator installs and configures a full Prometheus stack that includes Prometheus servers, Alertmanager, Grafana, Host node_exporter and kube-state-metrics. DevOps can then easily scale the number of individual replicas of each component, make configuration changes, update alerting rules or automatically monitor new services.
Now that we have covered the concept of Kubernetes operators let’s outline some useful operators that every DevOps should know about.
The RBAC Manager is a Kubernetes operator from Fairwinds that aims to make RBAC on Kubernetes easier to setup, configure and manage. Kubernetes authorisation is often tedious and repetitive, requires lots of manual configuration and is hard to scale. RBAC manager significantly reduces the configuration involved in managing RBAC and creating, deleting or updating role bindings, cluster role bindings and service accounts. It serves as a single source of truth for understanding RBAC state by summarising role bindings across multiple namespaces in a single RBAC Definitions file.
The MongoDB operator helps DevOps standardize the process of creating MongoDB clusters at scale and makes it repeatable. The operator can be configured to take over typical administrative tasks involved in spinning up and managing MongoDB clusters including provisioning storage and compute, configuring network connections and setting up users. The Kubernetes operator also integrates with other MongoDB management tools like MongoDB Ops Manager and MongoDB Cloud Manager to provide backup, monitoring and performance optimisation.
The HPA operator from Banzai cloud is another useful operator that makes it easier to add pod autoscaling features to Helm charts. It watches for Kubernetes deployments or StatefulSets and automatically creates, deletes or updates Horizontal Pod Autoscalers (HPAs) based on annotations defined in the config. HPA’s Github page provides Kafka as an example. The Helm chart for Kafka does not define any HPAs for the cluster which means that deploying it will not bring up any HPAs as part of the Kafka deployment. To ensure HPAs are deployed as part of the Helm chart, DevOps can add annotations for min and maxReplicas. Once added the HPA operator will spin up the desired number of HPA replicas based on the annotations. The HPA operator also takes Prometheus based custom metrics exposed by Kube Metrics Adapter.
Cert-manager from Jetstack is a Kubernetes operator that aims to automate the management and issuance of TLS certificates. DevOps can use this operator to automate recurring tasks like ensuring certificates are valid and up to date and renewal. Once deployed Cert-manager runs as a Kubernetes deployment. DevOps can configure a list of certificates and certificate issuers as Kubernetes CRDs. Once configured certificates can be requested on the fly by referring to one of the configured issuers.
The ArgoCD operator manages the complete life cycle for ArgoCD and its components. ArgoCD is one of the highest rated continuous delivery tools in the CNCF landscape and is specifically targeted towards Kubernetes. The operator makes it easy to configure and install ArgoCD, as well as making it easier to upgrade, backup, restore and scale ArgoCD components. The operator does this by watching for three Kubernetes CRDs including ArgoCD, which defines the desired state for an ArgoCD cluster and ArgoCDExport which defines the desired state for export and recovery of ArgoCD components.
Istio has emerged as the go-to service mesh tool to manage, orchestrate, secure and monitor communications across microservices deployed on Kubernetes. The Istio operator makes it easier to install, upgrade and troubleshoot Istio. Installation requires only istioctl as a prerequisite, small customizations are easier to make since they don’t require API changes, and version specific upgrade hooks can be easily implemented. Installing Istio using the operator also ensures that all API fields are validated. The operator API supports all 6 built-in installation config profiles including default, demo, minimal and remote. DevOps and SREs can start off with any one of these and make configuration changes further along to tailor the service mesh to their specific needs.
Etcd serves as the primary data store for all cluster data on Kubernetes and as such is a critical component of each cluster. Managing and configuring etcd clusters on Kubernetes is a time-consuming task and requires hands-on expertise. Ensuring high availability, monitoring and disaster recovery add additional complexities. The etcd operator helps DevOps and SRE simplify these tasks by making it easier to create, configure and manage etcd clusters on Kubernetes. Teams can easily spin up multiple highly available etcd instances without having to specify detailed configuration settings, modify cluster spec to resize clusters, configure automated backup policies for disaster recovery, and initiate graceful upgrades without downtime.
Elastic cloud on Kubernetes (ECK) is the official Kubernetes operator from elastic.co and aims to provide a seamless experience for deploying, managing and operating the entire elastic stack on Kubernetes. In addition to making it easier to deploy elastic search and Kibana on Kubernetes, it also simplifies critical operations tasks including managing and monitoring multiple clusters, initiating graceful upgrades, scaling both cluster capacity and local storage, making configuration changes, and backups. The default ECK distribution is free and open-source with built-in features including frozen indices for dense storage, Kibana Spaces, Canvas and Elastic Maps, and also supports monitoring of Kubernetes logs and infrastructure.
The Grafana operator, offered by RedHat, simplifies the process of creating, configuring and managing Grafana instances on Kubernetes. In addition to helping deploy Grafana it also supports making Grafana available via ingress, automated dashboard and data source discovery, and installation of dashboard dependencies. The operator can be installed using either Ansible or manually via kubectl commands and creating a custom resource. Once installed the operator watches for dashboard definitions in either its own namespace or all namespaces, depending on the flag passed during deployment, discovers dashboards, adds error messages to the status field of the dashboard in case of any invalid json, and automatically installs any plugins specified. DevOps and SREs can then add data sources in the GrafanaDataSource as well as add extra configuration files.
The Jaeger Kubernetes Operator helps deploy, manage and configure Jaeger instances on Kubernetes. When installing DevOps and SREs can specify configuration options for jaeger including storage options, deriving dependencies, injecting Jaeger agent sidecars and UI configurations among others. Once installed, the operator can be used to create a Jaeger instance and associate it with a deployment strategy. DevOps and SREs can choose any one of three supported deployment strategies: allInOne, production and streaming. When using the Production strategy, the operator will spin up a more scalable and highly available environment as well as deploy each of the backend components separately. The allInOne strategy is meant for testing and development purposes while the streaming strategy augments the production strategy and provides streaming between the collector and backend storage.
Ready for Production? Download our Kubernetes Production readiness and best practices checklist:
Fan of all things cloud, containers and micro-services!
A review of the best practices, processes and cultural paradigms that are recommended by the FinOps foundation. These best practices and processes are instrumental in developing and operating a successful FinOps practice that views the cloud as a driver of innovation and business value while at the same time improving transparency and accountability.
April 12, 2021
6 min read
Part 3 of the Ultimate guide to cloud FinOps blog series, which outlines core FinOps principles, and provides an in-depth review of each one.
April 6, 2021
6 min read
Part 2 of the Ultimate guide to cloud FinOps blog series, which takes a deep dive into FinOps domains and roles, reviews the main responsibilities of those domains and identifies the current organizational roles that are candidates for inclusion in FinOps teams.
March 22, 2021
6 min read