Automating the provisioning of a production-ready Kubernetes cluster with AWS EKS & CDK
Amazon’s Elastic Kubernetes Service (EKS) gives you automated provisioning of a highly available Kubernetes cluster and automated in-place upgrades of both the control plane as well as the worker Nodes (if you also leverage the Managed Node Groups feature). The former is especially important given the rate of change of the Kubernetes project and the later with the frequency that the underlying Linux or Windows operating system(s) need to be patched.
However, the cluster that EKS provisions today feels very basic and is missing several key add-ons which most customers will need to add as they onboard the service. These include:
- Fluentd or Fluent Bit to ship container logs to a log aggregation platform such as the Amazon Elasticsearch Service to visualise and search/filter them all
- Prometheus node-exporter shipping cluster metrics to a Prometheus with a Grafana to visualise and them. AWS has announced managed services for both of those as well that are currently in preview.
- The metrics-server (needed for the Horizontal Pod Autoscaler (HPA) to scale in/out your Pods)
- The Cluster Autoscaler (CA) to scale in/out your Nodes
- External DNS to manage Route 53 DNS aliases from within the cluster
- An Ingress controller such as the AWS Load Balancer Controller
- The AWS storage, or CSI, drivers for EBS and EFS
- A Network Policy provider to enforce Kubernetes Network Policies (if you prefer that to the alternative AWS Security Groups for Pods) such as Calico.
- The Open Policy Agent’s (OPA) Gatekeeper to enforce constraints on what can be deployed to the cluster. This is especially important to help secure the cluster if it will be multi-tenant.
AWS provides the EKS Workshop showing you how to manually set up all of these add-ons. But most people would prefer to automate that process — or, even better, to not have to worry about it at all.
The good news there is that the EKS team’s roadmap is to include many, if not all, of these add-ons as Managed Add-ons as proper parts of the EKS service over time. In the meantime, though, a customer needs to add the add-ons themselves.
So, how can they do that quickly, easily and reliably?
Enter the AWS Cloud Development Kit (CDK)
The infrastructure-as-code service provided by AWS is CloudFormation. That can be quite verbose to write the JSON or YAML directly for — and it can’t manage what happens on the cluster via the Kubernetes API.
The AWS Cloud Development Kit (CDK) not only lets you write your infrastructure-as-code as actual code (Typescript, Python, Java, C# and soon Go) — but it extends CloudFormation via Lambda-backed custom resources to fill in its gaps or smooth out its rough edges in achieving common tasks.
In the case of EKS, CDK extends CloudFormation to be able to deploy Kubernetes manifests or Helm charts as well as help you to integrate IAM with EKS to allow Pods to call AWS APIs securely via IAM Roles for Service Accounts (IRSA).
An example of using the CDK to set up an EKS cluster with best practices as well as all the common add-ons
I’ve written an example CDK template in Python that not only deploys an EKS with all of the add-ons I’ve described above but also implements some other best practices including:
- Putting the EKS control plane endpoints into a VPC with private subnets off the public Internet
- Setting up a bastion and/or a Client VPN to be able to manage that securely
- Enabling the logging on the service so things like the control plane audit logs go to CloudWatch Logs
- Including a set of Gatekeeper policies with a good security baseline (aligned with the legacy restricted Pod Security Policy plus a few other sensible things that are easy to do with Gatekeeper)
- Implementing GitOps around changes/upgrades to the cluster and its add-ons by setting up CodeBuild with a web-hook to re-run the
cdk deploywhen changes to this template are merged
Let’s dig into the template a little bit
A good example of the power of the CDK in bringing the worlds of AWS and Kubernetes together is this section of our
In this code example we are:
- Creating a new managed Amazon Elasticsearch
- Creating an AWS IAM Role bound to a Kubernetes service account via OIDC to give our Fluent Bit Pods access to ship logs to that new Elasticsearch with a dynamic reference to its ARN as the resource in the IAM Policy (rather than *)
- Deploy the AWS provided Helm chart for aws-for-fluent-bit with a dynamic reference to the new Elasticsearch we’re creating as the host to ship the logs to in the values
Doing these two things together (provisioning an AWS Elasticsearch and Fluent Bit) across the different worlds of AWS and Kubernetes often would have meant using two tools and having the first output AWS environmental details to template them in to your Kubernetes Spec files to bridge the gap. Here we can provision and orchestrate both together with dynamic references and one tool — and with a real programming language instead of YAML as well.
When you provision the Amazon Elastic Kubernetes Service (EKS) you end up with a rather empty Kubernetes cluster today where you’ll need to install several add-ons to add some common functionality. Many of these add-ons are to help integrate the cluster into your AWS environment and their managed services. By leveraging the AWS CDK, you can provision and manage both the AWS and the EKS sides of this puzzle together in one tool and with real infrastructure-as-code.