Argo CD’s “app of apps” — an efficient and easy way for a platform team to manage clusters and their associated add-ons

7 min readSep 22, 2024

In my last blog post I covered declaratively provisioning clusters via the Kubernetes Cluster API and ending that by installing Argo CD to complete the cluster bootstrap setting up ongoing GitOps for that cluster.

In this blog post I’ll take it a step further and talk to how to structure and use that ArgoCD if you are going to be declaratively running a fleet of Kubernetes clusters for your organisation.

TL;DR: I propose a way to structure your platform team git repo and configure your Cluster API and Argo CD manifests to make the job of provisioning as well as updating/maintaining all your tenant clusters much easier and more efficient. It involves ClusterAPI provisioning an Argo CD “app of apps” GitOps flow as well as a bit of Kustomize in the middle to tie clusters to their desired Argo Apps.

Usually the platform team is responsible for installing and maintaining various required “add-ons” for each cluster

Beyond provisioning the EKS/AKS/GKE/etc. for the tenants, the platform team usually has responsibility for ensuring that certain add-ons are there including:

Network (CNI) and/or Storage (CSI) drivers
Ingress/Gateway controller and/or Service Mesh to expose services internally and to the Internet
Monitoring/Observability agents to get the metrics and logs to the right places
Cluster Autoscaler/Karpenter to scale the cluster to optimise cluster size for availability as well as cost
GitOps operators (e.g. Argo CD and Argo Rollouts)
Security agents/tooling (e.g. OPA Gatekeeper/Kyverno for admission control, Falco/Tetragon for runtime security, etc.)
Various other popular controllers to integrate with required backend services (e.g. cert-manager for TLS certificates, external-dns to automatically create/update DNS entries for ingresses, etc.)

Depending on your managed Kubernetes, some will offer these as managed add-on options in their EKS/AKS/GKE APIs — in which case you can likely control them through ClusterAPI manifests. Otherwise, you can manage them via Argo CD and the patterns I’ll get into now.

A proposed à la carte pattern for a balance of granular control for each cluster with easy operations

One of the first choices you need to make as a platform team is whether you allow each cluster to be a bit of a “special snowflake” on the one extreme vs. all being exactly the same on the other. This often goes hand-in-hand with a decision on the “shared responsibility model” of the cluster(s) — what you as the platform team will be responsible for vs. what your tenants will be.

The good news is that the trade-off isn’t too bad to have a more granular approach based on the Argo “app of apps” pattern I’ll propose here.

A proposed structure and pattern for your git repo and GitOps

Imagine a folder structure as follows:

clusters/ <--- Folder where Cluster API manifests go
clusters/dept1-dev.yaml <--- Bootstraps GitOps of cluster-argo-apps/dept1-dev (after provisioning cluster) 

cluster-argo-apps/ <--- Subfolder for each cluster's GitOps
cluster-argo-apps/dept1-dev/
cluster-argo-apps/dept1-dev/kustomization.yaml <--- "Symbolic links" to the required apps
cluster-argo-apps/...

argo-apps/ <--- Folder where all of our Argo app options live
argo-apps/argo-rollouts/argo-rollouts-app.yaml <--- Actual Argo Apps manifest shared by all the clusters
argo-apps/...

The idea is that we:

Start with a ClusterAPI manifest per cluster that, not only provisions the cluster, but also sets up an Argo CD with an initial Application pointing at a folder for that cluster for GitOps.
In that cluster-argo-apps sub-folder for each cluster you can put a kustomization.yaml that serves as a sort of “symbolic link” to the actual Applications that live centrally in the argo-apps folder.
You put your ‘menu’ of options to be added to the cluster in argo-apps (each of which has an ArgoCD Application file in it to bundle it up) and you can pick from them à la carte for each cluster in each cluster’s kustomization.yaml file.

Here is that in practice (you can see it all in GitHub here):

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: dev
  namespace: dept1
  labels:
    name: dept1-dev
spec:
  controlPlaneRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: GCPManagedControlPlane
    name: dev
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: GCPManagedCluster
    name: dev
---

...

---
apiVersion: addons.cluster.x-k8s.io/v1alpha1
kind: HelmChartProxy
metadata:
  name: argocd-apps-dept1-dev
  namespace: dept1
spec:
  clusterSelector:
    matchLabels:
      name: dept1-dev
  repoURL: https://argoproj.github.io/argo-helm
  chartName: argocd-apps
  version: 2.0.1
  valuesTemplate: |
    applications:
      platform-apps:
        namespace: argocd
        finalizers:
        - resources-finalizer.argocd.argoproj.io
        project: default
        sources:
          - repoURL: https://github.com/jasonumiker/gke-autopilot-capi-argocd-example.git
            path: cluster-argo-apps/dept1-dev
            targetRevision: HEAD
        destination:
          server: https://kubernetes.default.svc
          namespace: argocd
        syncPolicy:
          automated:
            prune: true
            selfHeal: false
  options:
    waitForJobs: true
    wait: true
    timeout: 5m
    install:
      createNamespace: true

As you can see in the cluster’s ClusterAPI manifest it ends by setting up a single Argo CD app pointing at cluster-argo-apps/dept1-dev.

Then when Argo CD goes there it sees this kustomization.yaml file:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
  name: dept1-dev-workloads

resources:
- ../../argo-apps/argo-rollouts
- ../../argo-apps/argo-rollouts-demo
- ../../argo-apps/ingress-demo

It just serves as the Argo CD / Kustomize version of a symbolic link to the centrally managed Argo Apps that can be used/shared by all the clusters. The other benefit of the Kustomize layer here is that we can patch the downstream manifests to tailor them to our cluster here if we ever need to as well.

For example, here is a patch for dept1-dev to add cluster-specific settings to the ingress-demos manifest in that kustomization.yaml file above. This is patching the Argo Application file in argo-apps to, in turn, patch the downstream kustomization.yaml file it points at in app-manifests.

patches:
- target:
    group: argoproj.io
    version: v1alpha1
    kind: Application
    name: argo-rollouts-demo
  patch: |-
    - op: add
      path: /spec/source/kustomize
      value:
        patches:
        - target:
            group: argoproj.io
            version: v1alpha1
            kind: Rollout
            name: bluegreen-demo
          patch: |-
            - op: replace
              path: /spec/strategy/blueGreen
              value:
                activeService: bluegreen-demo
                previewService: bluegreen-demo-preview
                scaleDownDelaySeconds: 300
                prePromotionAnalysis:
                  templates:
                  - templateName: success-rate
                  args:
                  - name: url_map_name
                    value: k8s2-um-1cf9hwd1-default-bluegreen-demo-preview-cn1eress
                  - name: project_id
                    value: project-435400
                postPromotionAnalysis:
                  templates:
                  - templateName: success-rate
                  args:
                  - name: url_map_name
                    value: k8s2-um-1cf9hwd1-default-bluegreen-demo-94bd5pw1
                  - name: project_id
                    value: project-435400
                previewReplicaCount: 1
                autoPromotionEnabled: true
                autoPromotionSeconds: 1
                abortScaleDownDelaySeconds: 300

Then here is an example Argo Application file for our argo-rollouts-demo that is in argo-apps/argo-rollouts-demo (the one that we’re patching above):

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argo-rollouts-demo
spec:
  project: default
  source:
    repoURL: https://github.com/jasonumiker/gke-autopilot-capi-argocd-example
    targetRevision: HEAD
    path: app-manifests/argo-rollouts-demo
  destination:
    server: https://kubernetes.default.svc
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: false

The pattern in action and its benefits

When you deploy this pattern you see first that app called platform-apps that pointed at the cluster’s folder which, in turn, pointed at three other Argo Application manifests via Kustomize.

One Argo Application found three others there and pulled them in

Those in turn all get pulled in and deployed — which is why we see all four Argo Applications in the UI.

All four Argo Apps — the “app of apps” and then all of those too

The Benefits

For the platform team’s customer they get:

The ability to both customise their cluster in its ClusterAPI manifest and also pick what combination of the add-ons maintained by the platform team they want loaded onto that cluster
The ability to self-service both the creation as well as updating of their cluster by PRing a change to the platform team’s git repo. This includes easily changing their minds about which add-ons out of the ‘menu’ (argocd-apps folder) they pick later — or perhaps adding a new add-on that the team has just made available there.

While the platform team gets:

To decide what options to put in their “menu” for the tenants to pick from — and a single declarative place to document (README.md in the folder?) and manage them from centrally.
Updating those add-ons becomes as simple and making a change to the version or image parameter in to that central argo-apps/ location for it. Simply changing the version or whatnot there once and merging it means that all of the clusters will pull that change immediately.

Possible improvements

Dev/Staging/Prod

It would be possible to have different sub-folders or branches for different environments such as development, staging and/or production for the argo-apps folder.

That way a platform team could commit to only deploying changes to their “managed” add-ons in production clusters after they have “baked” in non-production ones for a certain period of time.

Set menu instead of à la carte

For some organisations, you might want to just have a sub-folder per environment rather than one per cluster. This would have all the clusters in that environment pointing their Argo “app of apps” at that one directly.

That would ensure that all of the clusters in that environment are exactly the same, and make things simpler, but at the expense of per-cluster granularity.

Automated testing

It would be good if deployment of these add-ons automatically rolled back if there are any issues with their deployment when the changes are merged.

The tool that makes this possible is Argo Rollouts.

And, actually, my next blog post in the series explores how to use that and how to add that into the pattern here to make change safer with blue/green deployments that include automated testing and rollback.

You can read that here.