EKS vs. GKE Service/Ingress managing their their NLBs/ALBs

10 min read12 hours ago

This is the third post in my series about how AWS’ Elastic Kubernetes Service (EKS) and GCP’s Google Kubernetes Engine (GKE) differ. In my previous posts, I:

looked at IAM from Pods both within their own cloud as well as cross-cloud between AWS and GCP — https://medium.com/@jason-umiker/cross-cloud-identities-between-gcp-and-aws-from-gke-and-or-eks-182652bddadb.
looked at how the networking between the two cloud and managed K8s offerings differ — https://medium.com/@jason-umiker/eks-vs-gke-networking-e1dd397fe86d

In this post, I’ll be looking at how using their Service and Ingress controllers differs in getting you one of their managed Network (Layer 4) or Application (Layer 7) load balancers.

The TL;DR of my learnings using them both is:

AWS has the IP mode while GCP has the NEG mode for routing directly to Pods as the load balancer targets (rather than via NodePorts)
AWS uses annotations for every ‘missing’ setting in the standards — while GCP externalizes many of them such as the health check settings to custom resources like BackendConfig
- And BackendConfig is associated with a Service (not Ingress) via an annotation on the Service specifying the BackendConfig CustomResource to use
When you use the kubernetes.io/ingress.class annotation they tell you to use for a GKE managed ALB then K8s warns you that is deprecated and to use ingressClassName instead — but if you do that it doesn’t work
AWS’s load balancer controller defaults to internal LBs while GKE’s defaults to external ones
AWS makes you get Managed Certificates for their load balancers out-of-band while GKE offers you the ManagedCertificate CustomResource/controller to get one right alongside your Ingress
But GKE’s Managed Certificates can’t be used on internal load balancers (while AWS’s can be)
But, GKE mostly makes up for that (in my book) by automatically uploading certificates stored in the Kubernetes secrets that you reference in your Ingress spec to GCP as “self-managed” certificates
- Even those created/managed by another controller such as cert-manager (getting certs from Let’s Encrypt)!
GKE actually builds their load balancer controller right into the managed service while EKS makes you install their controller yourself on everything but their new EKS Auto Mode.

What is Ingress (and why is it eventually getting replaced by Gateway)

Ingress is a standard from Kubernetes to try to abstract away the provisioning and configuration of nearly every Layer 7 Load Balancer (nginx, AWS ALB, GCP ALB, etc.) that you could want to use in your cluster/environment. It lets you configure that load balancer in the “Cloud Native” way with declarative YAML to your K8s cluster — alongside all of your other application K8s YAML manifests. And it is another great example of the Kubernetes practice of setting a generic abstracted standard/schema for an API — and then expect a variety of pluggable controllers/drivers to be written to meet their standard (CNI for Network, CSI for Storage, etc.).

Here is an example of an Ingress from the K8s documentation:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx-example
  rules:
  - http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          service:
            name: test
            port:
              number: 80

In the case of this example, the nginx Ingress Controller would be running/configuring actual nginx Pods in our cluster for us to serve the traffic. More commonly in public cloud, though, the cloud providers give us Controllers that would instead provision and manage one of their managed load balancers for you instead.

The issue with Ingress is that the standard didn’t fully cover all the settings/requirements that you need for one of these to run — things such as:

Certificate Management
Configuration of the Health Checks
Or, as you see in the example above, whether/how to rewrite the targets
Etc.

The problem is that every provider worked around this differently — either with different namespaced annotations (e.g. the nginx.ingress.kubernetes.io above) to feed these missing settings in or, as you’ll see in GKE, externalizing some of them to additional/separate Custom Resources. So, you can’t usually take an Ingress document from, say, EKS and run it on GKE without changes.

This is why Kubernetes is eventually going to be replacing Ingress with Gateway — their new API that should cover much more, if not all, of the missing settings that people need in the standard. Unfortunately, while GKE support Gateway, AWS doesn’t yet. So, at least for us, we’ve decided to use the slightly different Ingresses across both (for some consistency) rather than Ingress on EKS and Gateway on the GKE.

Services for NLBs and Ingresses for ALBs

And what if we want a (Layer 4) Network Load Balancer instead of a (Layer 7) Application Load Balancer? You use a Service instead of an Ingress. And, in order to use an Ingress, you actually need to have a Service there in the middle anyway (see above where we had to specify a service as our Ingress backend)— so, basically, you just use that existing Service with some different settings with no Ingress for L4.

AWS’s documentation for how to do it is here. If you wanted to provision an Internet-facing load balancer, you’d add the following to your Service manifest. AWS defaults to an internal aws-load-balancer-type (if we didn’t specify the annotation here that’s what we’d get) — so, we’re specifying external here to get our Internet-facing one.

annotations:
  service.beta.kubernetes.io/aws-load-balancer-type: external

type: LoadBalancer
loadBalancerClass: service.k8s.aws/nlb

While GCP’s documentation is here. If you wanted to provision the same Internet-facing load balancer you’d add:

annotations:
  cloud.google.com/l4-rbs: "enabled"

type: LoadBalancer

They also default to an external load balancer — which is why we didn’t need to specify that. This is the opposite of AWS and if we wanted an internal one we’d add the following annotation:

annotations:
  networking.gke.io/load-balancer-type: "Internal"

I personally think AWS made the right call on this internal vs. external default —as the default should be the ‘safer’ choice.

Send traffic directly to the Pod IPs or via a Kubernetes NodePort?

Kubernetes supports making your Pod network a virtual overlay network that is not actually routeable from your ‘real’ network. And, if you did that, the way to get into that overlay network is via your Nodes — via a thing they call NodePort. This assigns a particular static port for a Service on each and every Node that proxies it through to the Service — regardless of whether any Pods for that Service are running on that Node.

I borrowed this diagram from https://theithollow.com/2019/02/05/kubernetes-service-publishing/

NodePort has a few downsides:

The AWS/GCP Load Balancer outside the cluster doesn’t know which Nodes the Pod is actually running on. So, let’s say there are two Pods running across our ten Nodes. The load balancer would balance across ten targets — eight of which then need to forward the request on to a Node where a Pod is actually running.
And all of those ten Nodes would be getting healthchecked as the ‘targets’ by the cloud provider’s load balancer — all of which get forwarded on to our two running Pods.
Also lets say that the cloud LB sends a request to a NodePort in Zone A (not knowing that a Pod isn’t running there) which forwards the request to a Node/Pod in Zone B where it is actually running. This adds additional latency in unnecessary hops across Zones.

Thankfully, in both EKS and GCP, Pods have ‘real’ VPC IPs that their managed load balancers could (and arguably should) be targeting directly instead of going through NodePorts.

In AWS, you can tell their Load Balancer Controller to do that direct-to-Pod routing with the following annotation (the alternative default setting here is the instance target type — which instead adds all Nodes (on the right NodePort) to the NLB as the targets). Whereas in ip mode controller will watch all Pod changes and update the target IPs on their NLB with the relevant Pod IPs as they come and go (making the controller running 24/7 a critical piece of infrastructure as if it goes down LBs won’t be pointed at the current Pod IPs as things deploy/scale/heal!).

annotations:
  service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

This diagram is for their ALB but the concept (instance mode to NodePort vs IP mode to Pod IP) is the same for NLB too

In GKE, there are actually a few interesting things that they can do to help improve on the usual/default NodePort configuration:

GKE Subsetting — cluster-wide configuration option (so, not configured on the Service but on the GKE cluster) that improves the scalability of internal pass-through Network Load Balancers by more efficiently grouping node endpoints into GCE_VM_IP network endpoint groups (NEGs). So imagine in our scenario above a particular service wouldn’t add all ten Nodes above as targets just a subnet (more efficiently).

Weighted Load Balancing — allows nodes with more serving Pods to receive a larger proportion of new connections compared to nodes with fewer serving Pods

annotations:
  networking.gke.io/weighted-load-balancing: pods-per-node

Use a Network Endpoint Group (NEG) to route directly to the Pods bypassing a NodePort

annotations:
  cloud.google.com/neg: '{"exposed_ports": {"80":{"name": "NEG_NAME"}}}'

Load balancer via NEG instead of NodePort

Of all of these it feels like the last NEG solution is the ideal one and what I’d suggest.

EKS vs. GKE Ingress

Let’s move on from Layer 4 and Services to Layer 7 and Ingresses.

To use the AWS Load Balancer Controller with your Ingress you add the controller parameter to your spec

spec:
  ingressClassName: alb

Whereas to use the Google ALB with your Ingress you add this annotation:

annotations:
  kubernetes.io/ingress.class: gce (for an external ALB)
  kubernetes.io/ingress.class: gce-internal (for an internal one)

NOTE: when you use that kubernetes.io/ingress.class annotation K8s will warn you that is deprecated and to use ingressClassName instead. If you try to use that it won’t work yet though — so stick with the deprecated one for now.

These are the main differences between EKS and GKE’s Ingress Controllers I’ve found:

For AWS Ingress defaults to the instance (i.e. NodePort) target type and you need to specify the alb.ingress.kubernetes.io/target-type: ip annotation to get direct-to-Pod targets. Whereas, with GKE clusters under certain conditions, direct-to-pod NEG-based load balancing is the default and does not require an explicit cloud.google.com/neg: '{"ingress": true}' annotation on the Service.
AWS Ingress, like the Service above, defaults to internal. To provision an internet facling one use this annotation alb.ingress.kubernetes.io/scheme: internet-facing. Whereas, for GKE, it is decided by which ingress.class annotation you chose as above (gce or gce-internal).
GKE actually builds their load balancer controller right into the managed service — while AWS makes you install their controller yourself on everything but their new EKS Auto Mode.
- It is quite surprising that it still isn’t even one of the EKS Managed Add-ons!
Every setting out of the main specs for the AWS NLB and ALB is via annotations. Here they all are for the NLB and here for the ALB. While, with GKE, you set many of theirs, such as the configuration of the LB health-checks, via Custom Resources such as BackendConfig (which, rather unintuitively, you associate to their underlying Services via an annotation — rather than on the Ingress itself) instead.
GKE helpfully gives you a controller for getting managed TLS certificates to use with the load balancers — ManagedCertificate. While AWS makes you get a Managed Certificate out-of-band with another tool like Terraform or Crossplane.
- And this is admittedly enough of a pain in AWS that we often use wildcard managed certs for the whole subdomain — so we only need to create and manage a handful out-of-band with Terraform rather than for each hostname
Though, unlike AWS, GKE won’t let you use one of their Managed Certificate on their internal load balancers.
- Though, interestingly, their Ingress controller will upload a tls Kubernetes secret that you reference in your Ingress document to GCP as a “self-managed” certificate for their load balancer to use — including those you get automatically via the cert-manager operator issuing from something like Let’s Encrypt!

GKE internal ALB with a HTTPS certificate example

Here is an example Ingress re: internal ALBs and HTTPS certs for GKE that brings that all together given we can’t use ManagedCertificate:

Gets a certificate for the name hello.jasonumiker.com via cert-manager which, in turn, gets it from Let’s Encrypt (validating that against our GCP Cloud DNS automatically by creating the challange records there for us) — putting it in the K8s secret ‘tls’
And then the GKE Ingress controller uploads that Kubernetes secret ‘tls’ that cert-manager just populated that is referenced in our Ingress to GCP as a “self-managed” certificate their LBs can use for HTTPS
- This is the bit that I was pleasantly surprised worked and that AWS’s doesn’t do!
Then it provisions a GCP ALB with HTTPS enabled
Then it sets the DNS A record in their Cloud DNS to point that name at the new ALB’s IPs via the external-dns controller we’ve installed once they’re known

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello
  namespace: hello
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "hello.jasonumiker.com"
    kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.class: "gce-internal"
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  rules:
  - host: hello.jasonumiker.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: hello
            port:
              number: 8080
  tls: 
  - hosts:
    - hello.jasonumiker.com
    secretName: tls

What’s Next?

I hope that this post — and the series so far — has been helpful.

The next segment in this EKS vs. GKE series will be on observability (metrics and logs).

What other topics would you like to see covered?