[Kubernetes in Production, Production Readiness Checklist, Kubernetes Best Practices]

Kubernetes in Production: Best Practices for Governance, Cost Management, Security and Access Control

Instalment three of the Kubernetes best practices series which outlines a checklist of Kubernetes best practices in the context of governance, cost management and security and access control.

Hasham Haider

Hasham Haider

July 4, 2019

15 minute read

This is the third instalment in our best practices blog series for Kubernetes in Production. In the first instalment, we outlined a checklist of best practices to ensure highly available and reliable Kubernetes production environments. The second instalment dealt with best practices for resource management. We have compiled both into an ebook which also includes best practices for security, scalability and monitoring and can be downloaded here.

In this article, we will outline Kubernetes best practices in the context of governance. We will identify native kubernetes tooling as well as open source solutions that can help set up a basic governance framework for Kubernetes in production.

However, before we do that, let’s take a look at the concept of governance and why we need a new way of thinking about it in the cloud-native era.

What is Governance?

Governance refers to a set of rules codified as policies aimed at minimizing risk, controlling costs and driving efficiency, transparency and accountability for an environment. Governance policies can reflect both external legislative and compliance requirements as well as internal security, resource access, cost management and deployment acceleration conventions.

Download the Complete Checklist with Checks, Recipes and Best Practices for Resource Management, Security, Scalability and Monitoring for Production-Ready Kubernetes

Download Checklist

Mission critical Kubernetes workloads in production require a broad and robust governance and operational framework that can help managers gain visibility and control over these dynamic environments.

We have divided the Kubernetes governance topic into three sections: security and access control, cost management and deployment acceleration. In this post, we will cover best practices for security and access control and cost management. We also intend to tackle best practices for deployment acceleration in an upcoming blog post.

Security and Access Control

Authentication and authorization are core concepts of access control. Together they allow organizations to create a security perimeter around their IT resources, identify the users and processes that are allowed access as well as govern the use of resources.

There are two ways requests can be authenticated or identified in Kubernetes: either as user accounts allotted to teammates or service accounts created for individual processes. Once requests have been authenticated they are then given access to Kubernetes resources. This process is called authorization.

Kubernetes also allows requests to be passed through an additional filter after they are authenticated and authorized. This set of filters are called admission controllers. Admission controllers allow Kubernetes administrators to exercise more granular control over their Kubernetes environments.

Let’s now outline Kubernetes best practices in the context of Security and access control.

Kubernetes Governance Best Practices: Security and Access Control

Enabled at least two Authentication Methods?

Kubernetes recommends at least two authentication methods to be enabled; one each for service accounts and user accounts. The recommended method for authenticating service accounts is via service accounts tokens. Service account tokens are enabled automatically.

Enable either OpenID Connect or X509 Client Certificates for user account authentication. X509 Client Certificate authentication can be enabled by passing the --client-ca-file=/mydirectory/ca.crt to the API server. Here are the complete instructions to enable OpenID Connect.

Once the authentication methods have been enabled make sure to disable static token-based authentication too. This can be done by removing the --token-auth-file=FILENAME flag from the API server pod spec.

Disabled Anonymous Authentication?

Kubernetes versions 1.6 and later allow anonymous authentication by default. Requests that have not been rejected by other authentication methods are assigned a system:anonymous username and system:unauthenticated group. An authentication best practice, therefore, is to pass the --anonymous-auth=false flag to the API server.

Anonymous requests should also be disallowed for Kubelet. This can be done by editing the Kubelet config or service files.

Enabling RBAC will also mitigate the threat from anonymous access since RBAC requires anonymous requests to be explicitly authorized before allowing access to resources. RBAC can be enabled by starting the API server with --authorization-mode=RBAC.

Disabled Unauthenticated Access to the API server?

The Kubernetes API server can serve requests on two ports; Localhost port and secure port. The Localhost port is intended for testing purposes and for other master components to talk to the API. Requests to the Localhost port bypass authentication and authorization modules.

To avoid unauthenticated access to the master node a best practice is to set the --insecure-port flag to 0 and remove the --insecure-bind-address flag from the API server manifest.

Another best practice is to remove the --secure-port flag from the API server spec. This will ensure that all requests to the secure port are authenticated and authorized.

Using User-Access Best Practices for Service Account Tokens?

Service account tokens, used to authenticate service accounts, are saved in secrets. These secrets can be used by malicious actors to authenticate as service accounts. Therefore it is a best practice to follow user-access best practices and the policy of least privilege when giving read access to these secrets.

Configured Master Node communication?

The API server communicates with both the Kubelet process running on each node as well as any pods, nodes or services. By default connections to Kubelet are not verified by the API server. To make the connection safe to run over untrusted paths and avoid any malicious attacks a best practice is to provide the API server with a root certificate bundle using the --kubelet-certificate-authority flag. This will enable the API server to verify the Kubelet’s serving certificate.

Alternatively, use SSH tunnelling between the API server and Kubelet as well as for communication between the API server and any nodes, pods or services.

Enabled RBAC?

Kubernetes RBAC allows admins to configure and control access to Kubernetes resources as well as the operations that can be performed on those resources. 

Kubernetes admins need to ensure they follow user access best practices when configuring an RBAC policy. A good of thumb is to follow the principle of least privilege and keep the scope of permissions small.

Admins, however, do need to be careful when deciding between a broad or a fine-grained RBAC policy. Both have advantages and drawbacks; fine-grained Roles and Role bindings ensure a smaller set of permissions and access but have a larger management overhead. Broader Roles and Role bindings are easier to manage but have a larger permission footprint.

Most of the times an optimal design will sit somewhere between the two configurations and will depend on the unique team structure and application design of organisations.  

Cluster roles grant access and permission for the entire cluster across all Namespaces. Kubernetes admins therefore need to be careful when granting these to users or groups of users. 

RBAC can be enabled by starting the API server with --authorization-mode=RBAC flag.

Disabled Default Service Account?

All newly created pods and containers without a service account are automatically assigned the default service account. The default service account has a very wide range of permissions in the cluster and should, therefore, be disabled.

You can do this by setting automountServiceAccountToken: false on the service account.

Configured Access to Etcd clusters?

In Kubernetes, access to Etcd is the same as root access to the cluster. A best practice, therefore, is to restrict access to etcd from only the API server and nodes that need that access. Here is a complete walkthrough of restricting access to the etcd cluster and giving the API server access to it.

Enabled Audit Policy?

Kubernetes audit enables the collection of records that document authentication, authorization and login activities undertaken by individual users, administrators or other system components. Version 1.7 and 1.8 introduced advanced audit with support for event filtering, external system integration and audit policy. The audit policy lays out rules about what events to record.

Advanced audit is enabled by default. To ensure that events are logged a best practice is to pass the --audit-policy-file=<audit-policy.yaml> flag to the API server.

Version 1.8 also requires kind, apiversion and a minimum of one rule to be provided in the audit policy file.

Configured Service Account Permissions?

Kubernetes role bindings allow a set of permissions defined in Roles to be allocated to individual Subjects. Subjects can include user accounts, service accounts or groups.

By default service accounts outside the kube-system Namespace have no permissions.

A best practice to implement a fine-grained and granular authorization framework is to allot roles to service accounts created for individual applications. For this to work a serviceAccountname has to be included in the application pod spec. Permissions can then be granted to  this service account using role bindings such as restricting it to a specific namespace.

Enabled Recommended Admission Controllers?

As outlined above Kubernetes admission controllers are an additional set of filters that requests can be filtered through. Admission controllers kick in once requests have been authenticated and authorized and provide granular controls over which requests are allowed to persist.

Kubernetes provides a set of recommended ready-to-deploy admission controllers. A best practice is to enable these recommended admission controllers by passing the --enable-admission-plugins=<admission_controller_name> flag to the API server.

Here is a list of the recommended admission controllers for Kubernetes v 1.10:

NamespaceLifecycle, LimitRanger, ServiceAccount, DefaultStorageClass, DefaultTolerationSeconds, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, Priority,ResourceQuota

Besides these recommended admission controllers, a best practice is to enable the following admission controllers:

PodSecurityPolicy

PodSecurityPolicy admission controller allows Kubernetes admins to define a set of conditions that pods have to comply with to be allowed to run. Pod security policy can be enabled using:

kube-apiserver --enable-admission-plugins=PodSecurityPolicy

Before enabling this admission controller, however, policies should be added and authorized in the PodSecurityPolicy object.

AlwaysPullImages

AlwaysPullImages is an admission controller which ensures that images are always pulled with the correct authorization and cannot be re-used by other pods without first providing credentials. This is very useful in a multi-tenant environment and ensures that images can only be used by users with the correct credentials.

You can enable AlwaysPullImages images using

kube-apiserver --enable-admission-plugins=AlwaysPullImage

Here is how to check which admission controllers have been enabled:

kube-apiserver -h | grep enable-admission-plugins

Using the Latest Kubernetes Version?

Kubernetes regularly pushes out new versions with critical bug fixes and new security features. Make sure you have upgraded to the latest version to take advantage of these features.

Defined Pod Security Policy and Enabled it in the Admission Controller?

Pod security policies outline a set of security sensitive policies that pods must fulfil in order to be scheduled. PodSecurityPolicy is also a recommended admission controller. Here is an example of a restrictive pod security policy. This policy forces all users to run as unprivileged users and also disables privilege escalation as well as enabling other restrictive security policies.

Pod security policy can be enabled using

--enable-admission-plugins=PodSecurityPolicy

Configured Kubernetes Secrets?

Sensitive information related to your Kubernetes environment like a password, token or a key should always be stored in a Kubernetes secrets object. You can see a list of the secrets already created using:

kubectl get secrets

Enabled Data Encryption at Rest?

Encrypting data at rest is another security best practice. In Kubernetes, data can be encrypted using either of these four providers: aescbc, secretbox, aesgcm or kms.

Encryption can be enabled by passing the --encryption-provider- config flag to kube-apiserver process.

To ensure all secrets are encrypted update secrets using:

kubectl get secrets --all-namespaces -o json | kubectl replace -f –

This will also apply server-side encryption to all secrets.

Scanned Containers for Security Vulnerabilities?

Another security best practice is to scan your container images for known security vulnerabilities.

You can do this using open source tools like Anchore and Clair which will help you identify common vulnerabilities and exposures (CVEs) and mitigate them.

Configured Security Context for Pods, Containers and Volumes?

Security context specifies privilege and access control settings for pods and containers. Pod security context can be defined by including the securityContext field in the pod specification. Once a security context has been specified for a pod it automatically propagates to all the containers that belong to the pod.

A best practice when setting the security context for a pod is to set both runAsNonRoot and readOnlyRootFileSystem fields to true and allowPriviligeEscalation to false. This will introduce more layers into your container and Kubernetes environment and prevent privilege escalation.

Enabled Kubernetes Logging?

Kubernetes logs will help you understand what is happening inside your cluster as well as debug problems and monitor activity. Logs for containerized applications are usually written to the standard output and standard error streams.

A best practice when implementing Kubernetes logging is to configure a separate lifecycle and storage for logs from pods, containers and nodes. You can do this by implementing a cluster level logging architecture.

Cost Management

Cost management refers to the continuous process of implementing policies to control costs. In the context of Kubernetes, there are a number of ways organisations can control and optimize costs. These include native Kubernetes tools to manage and govern resource usage and consumption as well as proactive monitoring and optimization of the underlying infrastructure. Below we will outline some best practices for cost management of production Kubernetes environments.

Kubernetes Governance Best Practices: Cost Management

Created Separate Namespaces for Teams?

Kubernetes namespaces are virtual partitions of Kubernetes clusters. It is recommended best practice to create separate namespaces for individual teams, projects or customers. Splitting up Kubernetes environments into separate namespaces serves both a security and access management as well as a resource management purpose.

To display a list of namespaces use:

kubectl get namespaces

Or

kubectl get namespaces --show-labels

You can also display a list of all the pods running inside a namespace with kubectl get pods --all-namespaces.

Configured Default Resource Requests and Limits?

A best practice when working with Kubernetes containers is to always specify resource requests and limit values. Containers without any limits on the amount of resources they are allowed to consume can lead to resource contention with other containers and unoptimized consumption of compute resources.

A good way to ensure all containers are allotted resource requests and limit values is to create a LimitRange object for each Namespace. The LimitRange object allows you to specify default values for resource requests and limits for individual containers inside namespaces. Any container created inside that namespace, without request and limit values explicitly specified, will be allotted the default values.

To check whether default values have been set use:

kubectl describe namespace <namespace_name>

Configured Minimum and Maximum Resource Limits?

A LimitRange object also allows Kubernetes admins to apply limits to the resource request and limit values for individual containers.

Configuring a minimum LimitRange value for a namespace will ensure that the resource request value for each container inside that namespace is greater than that value. The maximum LimitRange value corresponds to the resource limit value and ensures that resource limits for individual containers inside that namespace do not exceed the maximum value.

To check whether minimum and maximum resource limits have been set use:

kubectl describe namespace <namespace_name>

Configured Resource Quotas for Namespaces?

Another best practice is to limit the total resource consumption of all containers inside a Namespace. This can be accomplished using a resource quota object. Resource quotas can be configured for individual namespaces and provide another layer of control over consumption of compute resources.

Defining a resource quota for a namespace will limit the total amount of CPU, memory or storage resources that can be consumed by all containers belonging to that namespace.

To check whether resource quotas have been configured use:

kubectl describe namespace <namespace_name>

Configured Quotas for other Kubernetes Objects?

Quotas can also be defined for other Kubernetes objects. These include pod quotas which limit the total number of pods that can run inside a namespace and API quotas which limit the total number of API objects ( PersistentVolumeClaims, Services and ReplicaSets). Pod and API quotas are another way to manage resource consumption and costs for Kubernetes clusters.

A best practice, therefore, is for Kubernetes admins to configure both pod and API quotas whenever new namespaces are created.

To check whether quotas have been configured:

kubectl describe namespace <namespace_name>

Configured the Horizontal Pod Auto-scaler?

The horizontal pod auto-scaler (HPA) is another way to effectively manage Kubernetes resource consumption. The HPA automatically scales the number of Pods in a Kubernetes deployment (or other controllers) based on CPU utilization or other custom metrics. HPA periodically queries CPU utilization of pods and increases or decreases the number of pods based on a target CPU utilization value. For example, it will decrease the number of Pod replicas if CPU utilization is lower than the target. In case CPU utilization is higher than the target value it will scale up and increase the number of Pod replicas.

This ensures that Kubernetes pods no longer seeing sustained usage are killed and do not take over resources that could be used elsewhere.

Configured the Cluster Auto-scaler?

The cluster auto-scaler (CA) allows Kubernetes admins to dynamically increase or decrease the resource footprint of a cluster. It does this by trimming or adding to the number of cloud virtual machines (VMs) that are part of the cluster pool.

The cluster auto-scaler scales the number of nodes based on two signals: whether there are any pending pods and the resource utilization of nodes.

If the CA detects any pending pods during its periodic checks, it requests more nodes from the cloud provider. The CA will also downscale the cluster and remove idle nodes if they are under-utilised.

Since VMs cost money, decreasing the number of nodes provisioned in response to utilisation signals will result in significant cost reductions.

Have a Plan for Rightsizing Instances?

On public cloud Kubernetes groups together multiple VMs (nodes) into clusters. Pods run on top of these nodes and consume resources from them.

A best practice is to ensure the resource footprint of Kubernetes nodes matches that of the pods running on top. This is most often expressed in terms of Node utilization. Node utilization is a percentage of the total resource capacity of the Node compared to the resources that are actually being used. A lower usage compared to capacity points to low utilization and wasted resources.

To ensure resources are used optimally a best practice is to monitor the historical and real-time resource usage of individual nodes and resize them whenever utilization falls below a certain threshold. We cover the configuration of just such a monitoring pipeline using Prometheus and Grafana.

Configured the Vertical Pod Auto-Scaler to Resize Pods?

The vertical pod auto-scaler (VPA) scales pod resources up or down based on recommended values for CPU and memory requests.

Setting pod resource requests is not an exact science. Situations do arise where the resources being requested and actual usage vary widely. When this happens the resource footprint of a pod far exceeds that which is required resulting in wasted resources.

The VPA helps avoid this by scaling pod resource requests up and down based on the resource consumption of individual pods. A VPA component called recommender monitors current and past resource consumption and provides recommended values for resource requests. Any pods which do not have the correct resource requests set are killed, recreated and allocated the correct resource requests by the updater and the admission plugin components.

Have a Plan for Using Reserved/Spot Instances?

Another best practice when spinning up Kubernetes clusters is to ensure a healthy mix of on-demand, reserved and spot instances. The instance type chosen is often dictated by workload demands, e.g. mission-critical workloads are often hosted using on-demand or reserved instances. These instance types however, contribute much higher costs as compared to spot instances which are usually preferred to run fault-tolerant applications where some amount of downtime is acceptable.

There are a number of open source tools that do allow Kubernetes admins to supplement on-demand instances with reserved or spot instances, resulting in reduced costs. K8s spot rescheduler, and the K8s spot instance termination handler are two such open source tools.

Tagging all Resources?

Tagging resources is another best practice to govern and manage Kubernetes costs. In enterprise environments where resources are being provisioned by multiple teams that are part of different projects, some resources are bound to fall below the radar. These resources keep on adding costs in spite of not being used. Tagging is an effective strategy to ensure that these resources are tracked throughout their lifecycle and are part of a central inventory. It also allows unused assets to be easily identified and removed.

Download the Complete Checklist with Checks, Recipes and Best Practices for Resource Management, Security, Scalability and Monitoring for Production-Ready Kubernetes

Download Checklist

Kubernetes Production Readiness and Best Practices Checklist Kubernetes Production Readiness and Best Practices Checklist Cover Download Checklist
Hasham Haider

Author

Hasham Haider

Fan of all things cloud, containers and micro-services!

Want to Dig Deeper and Understand How Different Teams or Applications are Driving Your Costs?

Request a quick 20 minute demo to see how you can seamlessly allocate Kubernetes costs while saving up to 30% on infrastructure costs using Replex.

Schedule a Meeting