[Kubernetes, Kubernetes in Production]

The Ultimate Guide to Deploying Kubernetes Cluster on AWS EC2 Spot Instances Using Kops and EKS

A step by step walkthrough of deploying a highly available, reliable and resilient Kubernetes cluster leveraging AWS EC2 spot instances as worker nodes using both Kops and EKS.

Hasham Haider

Hasham Haider

October 9, 2019

31 minute read

Ever since AWS first introduced spot instances DevOps teams and CTOs have been asking themselves: is it possible to run my workloads on spot instances and still retain an acceptable level of performance and reliability/availability? The answer to that question is a definite YES!

With all the buzz around containerisation and the way organisations of all shapes and sizes have flocked to Kubernetes, that question has changed slightly: Is it possible to deploy my Kubernetes cluster on EC2 spot instances without compromising on performance, reliability and availability?

That too is a definite YES! In fact that is exactly what we plan on doing in this guide. We will take a deep dive into the mechanics of setting up a Kubernetes cluster that leverages AWS EC2 spot instances as worker nodes.

Deploying a Kubernetes cluster on EC2 spot instances is only half the story though. The trick is to ensure that using spot instances does not compromise on the reliability, robustness and availability of your clusters. In this guide we will pay special attention to that and will use both native AWS concepts and tools as well as open source ones to make our Kubernetes clusters highly available, robust and resilient.

Before we get started with the walkthrough, however, let’s first get our heads around a few concepts that will help us throughout this guide.

Alternatively, you can jump straight to the walkthrough

Want to learn more about reducing Kubernetes costs? Download the complete guide to Kubernetes in Production for CIOs and CTOs.

Download Guide

AWS EC2 Instance Types and Billing Models

First off lets distinguish between EC2 instance types and EC2 billing models. EC2 instance types are virtual machines that differ based on their CPU, memory, storage, and networking capacity. Here is a complete list.

Customers can pay for these instances in four ways or billing models. As of today AWS offers four billing models: On-Demand, Reserved Instances, Spot Instances and Dedicated Hosts.

AWS EC2 Spot Instances

EC2 spot instances are spare compute capacity at AWS data centres. The spot billing model allows AWS customers to pay for this spare capacity at discounted rates. Spot instance usage can result in cost savings of up to 90% as compared to on-demand instances. There is a caveat though: AWS can reclaim spot instances at a 2 minute notice.

This is bad news for us, since availability is a centrepiece of the kubernetes cluster we want to provision. For a highly available, robust and resilient Kubernetes cluster we need to minimise the fallout from spot instance interruptions and terminations.

The new spot instance billing model from AWS lays the groundwork for minimising interruptions. Let’s take a quick look at it.

AWS EC2 Spot Billing Model

The spot billing model sees regular updates from AWS. The newest round of updates makes some crucial changes to the billing model and the way spot instances are accessed.

These changes include a new access model for spot capacity that makes it as easy to use spot instances as on-demand ones. Users can request spot capacity without having to analyse spot market prices or develop a bidding strategy to place bids.

Users simply pay the effective hourly price for the specific spot instance requested. The hourly spot price is calculated by AWS based on the supply and demand for spare EC2 capacity, not bid prices. Users can also optionally indicate a maximum price they are willing to pay for the spot instance requested.

With the new billing model, price fluctuations are less frequent and more predictable. Even though the 2 minute interruption notice is still in effect, spot instance terminations are less likely resulting in fewer interruptions to production workloads.

Even though the new spot billing model lays the groundwork for fewer spot terminations, for a truly robust and resilient Kubernetes cluster, we need to move to a more resilient architecture with baked-in high availability.

For the purposes of this guide we will incorporate a number of native AWS and Kubernetes concepts as well as open source tools to further improve the robustness and resilience of our kubernetes cluster.

Let’s take a quick look at some of these concepts and tools

AWS Spot Instance Advisor

The native Spot Instance Advisor tool from AWS is a good start. It allows us to get an overview of the interruption rates for each instance type for individual availability zones as well as the projected savings. A good rule of thumb is to choose instances that have a lower interruption rate. Lower interruption rates translate into higher availability for our cluster.

Spot Pools/Capacity Pools

AWS groups together spare EC2 capacity within each availability zone into spot pools. Each individual instance type has a separate spot pool within that availability zone.

For example m4.xlarge instances have separate spot pools in eu-central-1a and eu-central-1b. Since these are separate spot pools, requesting m4.xlarge spot capacity for our cluster at both availability zones will increase the chances of being allocated that capacity. It will also minimize disruptions to our cluster, since the chances of both spot pools running out of capacity at the same time is lower.

Spot pools can be extended further by requesting spot capacity across AMIs, subnets and regions. For example requesting m5.xlarge spot capacity at eu-west-1a will extend the spot pool and add an additional layer of reliability to the cluster, further reducing disruptions.

AWS provides a number of native tools which allow us to group together multiple spot pools and manage them collectively. These include AWS autoscaling groups, EC2 fleets and Spot fleets. Let’s quickly review autoscaling groups.

AWS Auto Scaling Groups

Auto scaling groups are sets of AWS instances that are managed as a group. These groups can comprise both on-demand and spot instances as well as a mix of both.

Auto scaling groups can both scale in (terminate) as well as scale out (launch) instances based on scaling policies. To take advantage of on-demand and spot billing models, auto scaling groups need to be configured to use a launch template.

The latest iteration of AWS auto scaling groups, also supports the deployment of multiple instance types as part of the same group. These autoscaling groups can also be configured to span across availability zones.

All of this functionality helps improve the availability and resilience of our kubernetes cluster. Having instance groups with multiple instance types and purchase options, deployed across availability zones, increases the number of capacity pools that our cluster has access to and results in fewer disruptions.

The goto tool for scaling kubernetes clusters is the cluster autoscaler. The autoscaler is not part of core Kubernetes, but has seen widespread adoption in the community. Let’s review the cluster autoscaler and the mechanics of its integration with auto scaling groups.

Cluster Autoscaler

Cluster autoscaler is a tool that scales (both in and out) the number of nodes in a Kubernetes cluster based on the scheduling status of pods and the utilisation of individual nodes. On AWS, the cluster autoscaler adds new instances to a cluster, whenever it detects pending pods that failed to schedule. It will also decrease the capacity of the cluster if it detects under-utilised instances, by removing those instances from the cluster pool.

The cluster autoscaler makes scaling decisions based on a template node. The template node is the first node of the instance group, that the cluster autoscaler detects and is assumed to be representative of all the nodes in the cluster. Whenever the cluster autoscaler needs to make a scaling decision it does so based on the capacity of the template node.

Using the Cluster Autoscaler with Auto Scaling Groups

Since the cluster autoscaler makes scaling decisions based on a template instance, it works best with auto scaling groups that have the same instance type. Scaling might not work properly with mixed autoscaling groups that have multiple instance types.

The official workaround for this is to use instance types that have the same CPU and memory resources. For example both the t2.medium and c5.large EC2 instances have 2 CPUs and 4 GB of RAM. Both these instances can be used as part of the same autoscaling group with the cluster autoscaler.

We will use both approaches in this guide: multiple instance groups each having its own instance type and a single mixed instance group with multiple instance types. Each instance group will leverage spot instances.

We will also support our kubernetes cluster with on-demand instances, which can take up the slack in the event of any interruptions to spot instances. This will further improve availability and reliability.

We will first deploy the cluster using kops.

Kops: Deploying Kubernetes Cluster Leveraging EC2 Spot Instances

In this section we will walkthrough the process of deploying a Kubernetes cluster leveraging EC2 spot instances using Kops. We will deploy a Kubernetes cluster with multiple instance groups - each with its own instance type - as well as a cluster with a single mixed instance group with multiple instance types. We will also deploy the cluster autoscaler for both clusters. 

Let’s start by provisioning a cluster with multiple instance groups using Kops.

Kops: Deploying Kubernetes Cluster with Multiple Spot and On-demand Instance Groups

In total, we will provision 3 instance groups; 2 instance groups will leverage spot instances and the remaining 1 will leverage on-demand instances exclusively. 

We assume that you have already installed kops, kubectl and the AWS CLI tools.

Create a kubernetes cluster using Kops

kops create cluster \
--name demo.cloudchap.cf \
--state s3://hash-kops-kube-bucket \
--cloud aws \
--master-size t2.medium \
--master-count 1 \
--master-zones eu-west-1a \
--node-size t2.medium \
--node-count 1 \
--zones eu-west-1a,eu-west-1b,eu-west-1c \
--ssh-public-key ~/.ssh/id_rsa.pub \

Review the cluster and the AWS resources that will be created

kops update cluster demo.cloudchap.cf

Create the cluster

kops update cluster demo.cloudchap.cf --yes

This will create the Kubernetes cluster and will also create two instance groups: one each for the master node and the worker nodes

Verify that the instance groups have been created

kops get ig --name demo.cloudchap.cf

img-31

We can also see the corresponding autoscaling groups in the AWS console.

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-32

Next we will create the two spot instance groups. Each instance group will leverage a separate EC2 spot instance type.

As mentioned before the new spot instance billing model, no longer requires us to submit bids. AWS users simply pay the spot price for the instance that is in effect for that hour.

However, we do have the option of setting an optional maximum amount that we are willing to pay for the spot instance. The default maximum price is the on-demand price for that instance.

The spot instance pricing history (in the EC2 console under Spot requests) gives us access to the current and historical pricing for a spot instance. Here is the pricing history for the t2.medium spot instance.

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-33.1

The spot pricing for t2.medium instance is relatively stable at $0.0150 for the last 3 months. To leave some headroom for price increase, let's set the max price we are willing to pay as $0.0170.

We will also increase maxSize and minSize to 7 and 1 respectively. These are the upper and lower limits on the number of instances that are allowed to run in the instance group.

kops create ig spot-ig

Change machine type to t2.medium

Add the following under spec

maxPrice: "0.0170"
maxSize: 7
minSize: 1

As well as the following nodeLabel

nodeLabels:
  on-demand: "false"

Here is what the configuration looks like

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2019-09-13T13:11:19Z
labels:
kops.k8s.io/cluster: demo.cloudchap.cf name: spot-ig spec: image: kope.io/k8s-1.13-debian-stretch-amd64-hvm-ebs-2019-08-16 machineType: t2.medium maxPrice: "0.0170" maxSize: 7 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: spot-ig on-demand: "false" role: Node subnets: - eu-west-1a - eu-west-1b - eu-west-1c

As mentioned before, every instance type in each availability zone has its own capacity pool. To increase the chances of being allocated spot capacity as well as to ensure fewer interruptions, we will create another instance group with another instance type.

kops create ig spot-ig-2

Change machine type to c5.large

Add the following under spec. 

maxPrice: "0.0410"
maxSize: 7
minSize: 1

maxPrice is again based on the pricing history for the last 3 months as displayed in the EC2 console.

Add the following nodeLabel

nodeLabels:
  on-demand: "false"

Here is the complete configuration:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2019-09-13T13:23:35Z
labels:
kops.k8s.io/cluster: demo.cloudchap.cf name: spot-ig-2 spec: image: kope.io/k8s-1.13-debian-stretch-amd64-hvm-ebs-2019-08-16 machineType: c5.large maxPrice: "0.0410" maxSize: 7 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: spot-ig-2 on-demand: "false" role: Node subnets: - eu-west-1a - eu-west-1b - eu-west-1c

As mentioned before, we want to support the spot instance group already created with an on-demand one that can take up the slack from spot instance interruptions. We will use the “nodes” instance group already created by kops as the on-demand instance group.

Since we want to use the on-demand instance group as a backup, we will taint the EC2 instances in it with PreferNoSchedule. Taints allow us to mark nodes so that the kubernetes scheduler avoids them when making scheduling decisions for pods. The PreferNoSchedule taint is a softer version of the NoSchedule taint. It tries to avoid the tainted nodes but is not required.

kops edit ig nodes

Update the maxSize and minSize

maxSize: 7
minSize: 1

And add the following nodeLabel

nodeLabels:
  on-demand: "true"

And taint the instances by adding

taints:
- on-demand=true:PreferNoSchedule

Here is the complete instance group configuration:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2019-09-13T14:16:11Z
  labels:
    kops.k8s.io/cluster: demo.cloudchap.cf
  name: nodes
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21
  machineType: t2.medium
  maxSize: 7
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: on-demandig
    on-demand: "true"
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c
  taints:
  - on-demand=true:PreferNoSchedule

Update the cluster to review the changes

kops update cluster demo.cloudchap.cf

Add --yes to apply the changes

kops update cluster demo.cloudchap.cf --yes

Verify that the instance groups have been created

kops get ig

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-34

We can also see the spot requests that are initiated in the AWS EC2 console

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-35

Since we have two spot instance groups with a minSize of 1, we can see two spot requests. As the instance group scales and the number of instances increases, the spot requests initiated will also increase.

Verify the updated corresponding autoscaling groups on AWS

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-36

Kops: Deploying the Cluster Autoscaler for Multiple Instance Groups

Now that we have deployed our cluster, let’s integrate the cluster autoscaler. The cluster autoscaler will automatically increase or decrease the size of our kubernetes cluster based on the presence of pending pods and the utilisation of individual nodes (instances).

It will spin up instances if there are pending pods that could not be scheduled because of insufficient resources on the already existing nodes. The cluster autoscaler will also decommission instances if they are consistently under-utilised and will schedule the pods from those instances on other ones.

To deploy the cluster autoscaler, we first need to create an IAM policy and attach it to the instance group we want to autoscale.

Create a ig-policy.json file locally and copy the following code into it

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*"
        }
    ]
}

Create the policy

aws iam create-policy --policy-name ig-policy --policy-document file://ig-policy.json 

You will see the following output

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-8

Attach the policy to the nodes.demo.cloudchap.cf role by inserting the policy arn from the output above into the following command

aws iam attach-role-policy --policy-arn arn:aws:iam::209925384246:policy/ig-policy --role-name nodes.demo.cloudchap.cf 

Next add the following cloudlabels to all three instance groups

spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/node-template/label: ""

Now we are ready to deploy the cluster autoscaler. Here is the yaml file for the autoscaler. Edit the file and input the names of the instance groups as well as the correct values for the min and max sizes as shown below:

command:
  - ./cluster-autoscaler
  - --v=4
  - --stderrthreshold=info
  - --cloud-provider=aws
  - --skip-nodes-with-local-storage=false
  - --expander=least-waste
  - --nodes=1:7:nodes.demo.cloudchap.cf
  - --nodes=1:7:spot-ig.demo.cloudchap.cf
  - --nodes=1:7:spot-ig-2.demo.cloudchap.cf

Also add the correct certificate path under hostPath

hostPath:
  path: "/etc/ssl/certs/ca-bundle.crt"
  path: "/etc/ssl/certs/ca-certificates.crt"

Deploy the cluster autoscaler using

kubectl apply -f cluster-autoscaler-multi-asg.yaml

This will spin up a cluster autoscaler deployment in the kube-system namespace.

Verify that the cluster autoscaler has been deployed. 

kubectl get pods -l app=cluster-autoscaler -n kube-system 

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-61

View the logs for the autoscaler pod

kubectl logs -f pod/cluster-autoscaler-69b696b7df-rb8l6 -n kube-system

Now that the cluster autoscaler is deployed on our cluster let’s scale our app replicas to verify the auto scaling behaviour.

kubectl scale deployment frontend --replicas=40 -n production

As the cluster autoscaler detects pending pods, it will scale up the number of nodes. This can be verified in the autoscaler logs

kubectl logs -f pod/cluster-autoscaler-69b696b7df-rb8l6 -n kube-system

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-41

As you can see in the image above the cluster autoscaler has identified unschedulable pods and has created a scale up plan for the spot-ig-2.demo.cloudchap.cf instance group. The cluster autoscaler will increase the size of the instance group from 1 to 4 to accommodate the new pods.

This completes the deployment of a Kubernetes cluster leveraging both on-demand and spot instances as part of separate instance groups. 

Next we will deploy a Kubernetes cluster that leverages EC2 spot instances as part of single a single mixed instance group.

Kops: Kubernetes Cluster with Single Mixed Instance Group

Mixed instance groups leverage multiple instance types and purchase options. As of version v1.14.x, the cluster autoscaler also supports mixed instance groups. These instances however, need to have the same CPU and memory resources for the cluster autoscaler to function correctly.

Mixed instance groups allows us to diversify our kubernetes cluster and take advantage of multiple spot pools as part of the same instance group. Leveraging multiple spot pools increases the chances of being allocated spot capacity, as well as reducing interruptions.

Let us now move on to the deployment.

We will use the same cluster we deployed earlier in the guide.

First, create a new mixed instance group

kops create ig mixed-ig

Add maxPrice and change the maxSize and minSize

maxPrice: "0.04"
maxSize: 20
minSize: 1

Also add the mixed instance policy

mixedInstancesPolicy:
  instances:
  - t2.medium
  - c5.large
  - a1.large
  onDemandAboveBase: 0
  onDemandBase: 0
  spotInstancePools: 3

onDemandBase is the minimum instance group capacity that we want to be provisioned as on-demand instances. The base capacity is provisioned first. Since we have set it to 0 this means that our instance group will have no base on-demand instances.

onDemandAboveBase is the percentage of instances above base that will be provisioned as on-demand instances. Setting it to 0 means that any additional capacity or instances launched will be spot instances.

The three instance types we have chosen for the instance group all have similar CPU and memory resources. This means that we can safely use the cluster autoscaler with it.

Here is the complete configuration of the mixed instance group

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
labels:
kops.k8s.io/cluster: demo.cloudchap.cf name: mixed-ig spec: image: kope.io/k8s-1.13-debian-stretch-amd64-hvm-ebs-2019-08-16 machineType: t2.medium maxPrice: "0.04" maxSize: 20 minSize: 1 mixedInstancesPolicy: instances: - t2.medium - c5.large - a1.large onDemandAboveBase: 0 onDemandBase: 0 spotInstancePools: 3 nodeLabels: kops.k8s.io/instancegroup: mixed-ig role: Node subnets: - eu-west-1a - eu-west-1b - eu-west-1c

Update the cluster to review the new resources that will be created

kops update cluster demo.cloudchap.cf

Apply the changes

Kops update cluster demo.cloudchap.cf --yes

Let us now deploy the cluster autoscaler for the mixed instance group.

Kops: Deploying the Cluster Autoscaler for a single Mixed Instance Group

Edit this yaml file for the cluster autoscaler and add the following under Hostpath

path: "/etc/ssl/certs/ca-certificates.crt"

Also enter the correct values for the minSize and maxSize and the name of the instance group. Optionally add - --skip-nodes-with-system-pods=false.

command:
  - ./cluster-autoscaler
  - --v=4
  - --stderrthreshold=info
  - --cloud-provider=aws
  - --skip-nodes-with-local-storage=false
  - --nodes=1:7:mixed-ig.demo.cloudchaf.cf
  - --skip-nodes-with-system-pods=false

Deploy the cluster autoscaler using

kubectl apply -f one-asg.yaml

Verify that the cluster autoscaler is running in the kube-system namespace

kubectl get pods -l app=cluster-autoscaler -n kube-system

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-62

View the logs for the autoscaler pod

kubectl logs -f pod/cluster-autoscaler-7bc84c657-s9sfj -n kube-system

Scale app to verify the auto scaling behaviour.

kubectl scale deployment frontend --replicas=90 -n production

View the cluster autoscaler logs

kubectl logs -f cluster-autoscaler-7bc84c657-s9sfj -n kube-system

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-42

As you can see in the image above of the cluster autoscaler logs, the autoscaler has identified unschedulable pods and has created a scale up plan for the mixed-ig.demo.cloudchap.cf instance group. The scale up plan will increase the size of the instance group from 1 to 6.

We can also see these instances in the AWS EC2 console as they are launched

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-43

This concludes the process of deploying a Kubernetes cluster leveraging spot instances using Kops. 

Next we will deploy our cluster using EKS.

EKS: Deploying Kubernetes Cluster Leveraging EC2 Spot Instances

In this section we will review the process of deploying a Kubernetes cluster leveraging EC2 spot instances using EKS. As we did with Kops, we will provision multiple node groups each with its own instance type, as well as a single mixed node group. We will also deploy the cluster autoscaler for both scenarios. 

EKS: Deploy Kubernetes Cluster with Multiple Spot and On-demand Node Groups

In this section we walkthrough the deployment of a Kubernetes cluster with multiple node groups. We will create 3 node groups. Two of these node groups will leverage EC2 spot instances while the remaining one will leverage on-demand instances. 

We assume that you have already installed AWS CLI tools, eksctl, kubectl and the AWS CLI tools.

Create a kubernetes cluster using eksctl. 

eksctl create cluster \
--name demo-eks-cluster \
--nodegroup-name nodes \
--node-type t2.medium \
--nodes-min 1 \
--nodes-max 1 \

This will create the Kubernetes control plane, managed by AWS as well as an on-demand node group called nodes.  

Verify that the node groups have been created

eksctl get nodegroup --cluster demo-eks-cluster

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-71

We can also see the corresponding autoscaling group in the AWS console. Since the master node group is managed by AWS, it does not show up on the AWS console. 

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-72

Let’s now create the 3 node groups. We will use this cloudformation template to create the node groups. Clone the template locally and make the following changes: 

Change the SpotNode1InstanceType to c5.large

SpotNode1InstanceType: 
  Description: EC2 instance type for the spot instances. 
  Type: String 
  Default: c5.large

Change the OnDemandNodeInstanceType to t2.medium

OnDemandNodeInstanceType: 
  Description: EC2 instance type for the node instances. 
  Type: String 
  Default: t2.medium

Increase the default NodeAutoScalingGroupMaxSize to 7

NodeAutoScalingGroupMaxSize: 
  Type: Number 
  Description: Maximum size of Node Group ASG. 
  Default: 7

Head over to the cloudformation section of the AWS console and click on create stack.

Upload the updated 'amazon-eks-nodegroup-with-spot.yaml' file and click Next.

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-80

Enter the correct values for ‘Stack Name’ and ‘Cluster Name’. Choose the correct 'ClusterControlPlaneSecurityGroup', 'VpcId' and Subnets from the drop down list. Lastly enter ‘ami-059c6874350e63ca9’ under ‘NodeImageId’. In the next screen optionally enter the tag key and value and click on create stack. 

Once the stack creation is complete, note down the ARN of the 'NodeInstanceRole' resource created by the cloudformation stack. In our case this is the ‘eksctl-demo-eks-cluster-nodegroup-NodeInstanceRole-GJZ80VTSEPGT’ role.

The stack will create three autoscaling groups on AWS. It will also label the spot and on-demand instances spun up as part of the autoscaling groups with 'lifecycle=Ec2Spot' and 'lifecycle=OnDemand2' labels respectively.  

Verify that the instance groups have been created

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-73

The 'SpotNodeGroup1' and 'SpotNodeGroup2' autoscaling groups exclusively leverage spot instances while the 'OnDemandNodeGroup' leverages on-demand instances. 

We can see the spot requests initiated by these autoscaling groups on the AWS console:

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-74

As we add more instances to these autoscaling groups, the number of spot requests initiated will also increase. 

Since we are requesting spot capacity in separate spot pools for both node groups, the chances of being allocated spot capacity is higher. There is also a lower chance of both spot pools running out of capacity at the same time, which will result in fewer interruptions. 

Next clone the following ConfigMap yaml file locally and enter the role ARN copied earlier.

Here is what the configuration looks like: 

apiVersion: v1 
kind: ConfigMap 
metadata: 
  name: aws-auth 
  namespace: kube-system 
data: 
  mapRoles: | 
    - rolearn: arn:aws:iam::209925384246:role/eks-node-groups-NodeInstanceRole-4H2F9TU6FGB0 
      username: system:node: 
      groups: 
        - system:bootstrappers 
        - system:nodes

Apply using kubectl

kubectl apply -f aws-cm-auth.yaml

This completes the deployment of our EKS cluster with multiple node groups. Next we will deploy the cluster autoscaler for all 3 of the node groups we created.

EKS: Deploy Cluster Autoscaler for Multiple EKS Node Groups 

We created the IAM policy required for the cluster autoscaler deployment earlier. Here is a quick recap:

Copy the following code into a json file locally. 

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*"
        }
    ]
}

Create the policy

aws iam create-policy --policy-name ig-policy-1 --policy-document file://ig-policy-1.json

Attach the policy to the 'eksctl-demo-eks-cluster-nodegroup-NodeInstanceRole-GJZ80VTSEPGT' role by inserting the policy ARN generated in the output from the earlier step:

aws iam attach-role-policy --policy-arn arn:aws:iam::209925384246:policy/ig-policy --role-name eksctl-demo-eks-cluster-nodegroup-NodeInstanceRole-1BC0W2OHCHMDN

Deploy the cluster autoscaler by cloning this yaml file and inserting the names of the node groups as well as the correct values for the min and max sizes as shown below:

command:
  - ./cluster-autoscaler
  - --v=4
  - --stderrthreshold=info
  - --cloud-provider=aws
  - --skip-nodes-with-local-storage=false
  - --expander=least-waste
  - --nodes=1:7:eks-node-groups-SpotNode2Group-XJ6P3SWK5U99
  - --nodes=1:7:eks-node-groups-SpotNode1Group-M1T70Z19O9JA
  - --nodes=1:7:eks-node-groups-OnDemandNodeGroup-1UH2DWM5E304C

Deploy the cluster autoscaler

kubectl apply -f cluster-autoscaler-multi-asg.yaml

Verify that the cluster autoscaler has been deployed. 

kubectl get pods -l app=cluster-autoscaler -n kube-system 

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-75

View the logs for the autoscaler pod

kubectl logs -f pod/cluster-autoscaler-596c9b9cbd-twxp7 -n kube-system

Scale app to verify the cluster autoscaler behaviour. 

kubectl scale deployment frontend --replicas=60 -n production

As we scale our application, the cluster autoscaler detects these pending pods and creates a plan to scale up the cluster.

This can be verified by viewing the cluster autoscaler logs:

kubectl logs -f pod/cluster-autoscaler-596c9b9cbd-twxp7 -n kube-system

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-76

As can be seen in the screenshot above of the cluster autoscaler logs, the autoscaler will increase the size of the ‘eks-node-groups-SpotNode1Group-VQKEESMRSV25’ node group from 1 to 4, to accommodate the pending pods.

EKS: Deploy Kubernetes Cluster with Single Mixed Node Group

Next we will deploy a Kubernetes cluster with a single mixed node group. Mixed node groups allow us to spin up multiple instance types as part of the same node group. They can also be used to mix spot and on-demand instances. Once we have the mixed node group up and running, we will also deploy the cluster autoscaler. 

We will use the same EKS Kubernetes cluster we created earlier. 

To create the mixed node group, clone this yaml file locally and make the following changes as shown in the configuration below:

apiVersion: eksctl.io/v1alpha5 
kind: ClusterConfig 
metadata: 
    name: demo-eks-cluster 
    region: eu-west-1 
nodeGroups: 
    - name: mixed-ng 
      minSize: 1 
      maxSize: 20 
      instancesDistribution: 
        maxPrice: 0.04 
        instanceTypes: ["c5.large", "t2.medium", "a1.large"] 
        onDemandBaseCapacity: 0 
        onDemandPercentageAboveBaseCapacity: 0 
        spotInstancePools: 3

onDemandBaseCapacity is the minimum node group capacity that will be provisioned as on-demand instances, onDemandPercentageAboveBaseCapacity is the percentage of instances above base that will be provisioned as on-demand instances. Setting both to zero means that our cluster will exclusively leverage spot instances. 

We have also chosen three instance types with similar CPU and memory, making it safe to deploy the cluster autoscaler on top.

We have also set the maxPrice based on spot pricing history in the AWS console. 

Create the mixed node group

eksctl create nodegroup -f mixed-ng.yaml

Let us now deploy the cluster autoscaler for the mixed node group.

EKS: Deploy Cluster Autoscaler for Multiple EKS Node Groups

To deploy the cluster auto scaler, clone this yaml file locally and update the name of the mixed node group as well as the min and maxSize. 

command:
  - ./cluster-autoscaler
  - --v=4
  - --stderrthreshold=info
  - --cloud-provider=aws
  - --skip-nodes-with-local-storage=false
  - --nodes=1:20:eksctl-demo-eks-cluster-nodegroup-mixed-ng-NodeGroup-LT7L1JFZK2Q6
  - --skip-nodes-with-system-pods=false

Deploy the cluster autoscaler:

kubectl apply -f cluster-autoscaler-one-asg.yaml

Verify that the cluster autoscaler is running in the kube-system namespace

kubectl get pods -l app=cluster-autoscaler -n kube-system

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-77

View the logs for the autoscaler pod

kubectl logs -f pod/cluster-autoscaler-68b9d674f5-jkcmr -n kube-system

Scale app to verify the auto scaling behaviour.

kubectl scale deployment frontend --replicas=90 -n production

View the cluster autoscaler logs

kubectl logs -f pod/cluster-autoscaler-68b9d674f5-jkcmr -n kube-system

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-78

As can be seen in the screenshot above the cluster autoscaler will scale the eksctl-demo-eks-cluster-nodegroup-mixed-ng-NodeGroup-LT7L1JFZK2Q6 node group from 1 to 4 nodes. 

We can also see these instances in the AWS EC2 console as they are launched:

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances with kops and eks-79

So this concludes the deployment of the cluster autoscaler in our Kubernetes cluster leveraging a single mixed node group. 

K8s Spot Termination Handler

Throughout this guide we have used multiple instance types for our instance groups. Since each instance type has a separate spot pool in each availability zone, this means that our cluster is more likely to be allocated spot capacity. Diversifying our instance types in this manner also reduces the chances of interruptions. 

However, AWS can still reclaim the spot capacity allocated to our cluster at a notice of two minutes. When this happens we might lose the pods that are already running on those spot instances.

This is where the K8s spot termination handler comes in. AWS issues a termination notice two minutes before a spot instance is to be reclaimed. The k8s spot termination handler monitors the AWS metadata service for these termination notices. Whenever it detects a termination notice for a spot instance, it drains that instances and re-schedules the pods running on it on other nodes.

We can deploy the k8s termination handler in our cluster using Helm

helm install stable/k8s-spot-termination-handler --namespace kube-system --name spot-term-handler 

Verify that the K8s spot termination handler is running

kubectl --namespace=kube-system get pods -l "app.kubernetes.io/name=k8s-spot-termination-handler,app.kubernetes.io/instance=spot-term-handler"

The ultimate guide to deploying Kubernetes on AWS EC2 spot instances-51

And the logs

kubectl logs --namespace kube-system spot-term-handler-k8s-spot-termination-handler-76g6s

Conclusion

And that’s it. We have successfully deployed a highly available and resilient Kubernetes cluster that leverages spot instances as worker nodes using both Kops and EKS. The cluster takes advantage of multiple spot pools across instance types and availability zones. We did this by deploying separate instance groups for each instance type as well as a single mixed instance group with multiple instance types.

We also deployed the cluster autoscaler for both types of instance groups. The cluster autoscaler initiates scale in and scale out operations each time it detects pending pods or under-utilised instances. To ensure graceful termination, we also deployed the spot instance termination handler, which reschedules pods whenever it detects that a spot instance is about to be terminated.

Want to learn more about reducing Kubernetes costs? Download the complete guide to Kubernetes in Production for CIOs and CTOs.

Download Guide

 

Hasham Haider

Author

Hasham Haider

Fan of all things cloud, containers and micro-services!

Want to Dig Deeper and Understand How Different Teams or Applications are Driving Your Costs?

Request a quick 20 minute demo to see how you can seamlessly allocate Kubernetes costs while saving up to 30% on infrastructure costs using Replex.

Schedule a Meeting