Vol. XV / Issue 03

The McKinnie Dispatch

Filed from the research cluster

DevOps archaeology Self-hosted infrastructure 2024 to 2025

A hosting question and a lot of Bash

The first cluster was mostly Bash.

I was not trying to become a Kubernetes person. Potential customers were asking about hosting, and I was trying to figure out what self-hosting identity actually cost before committing to a SaaS identity bill.

The starting point was practical. Potential customers were asking whether we could host Archibus environments for them. That meant running identity too. Okta and Auth0 were the obvious options, but the numbers got uncomfortable fast once I modeled them against the kind of deployments I had in mind.

Self-hosting identity looked like a way to avoid that bill. The self-hosted subreddit pulled the thread further. Authentik was the entry point: open source, solid protocol support, reasonable documentation. The research cluster was where I found out whether self-hosting was actually workable or just appealing on paper.

The short answer: it worked. The more useful lesson came after it worked.

Mostly manual

Terminal coding agents were not really part of the workflow at that point. The work was manual: Bash, kubectl, Helm, openssl, ingress configuration, DNS, certificate management, proxy headers, and a lot of reading logs. Every deployment cycle meant running each step yourself.

That was slow. It was also a useful kind of slow. When you have to manually generate and apply SAML certificates, you learn what the protocol actually requires. The SP certificate and the IDP metadata have to agree. The assertion consumer URL has to match exactly. The cookies have to survive the redirect through the identity provider. Each mismatch failed differently, and most of the error messages were not clear about which assumption had broken.

Exhibit A: Certificate generation as a shell step Identity work has physical artifacts. Before the cluster agreed on anything, the SP certificate had to exist, get loaded as a Kubernetes Secret, and match what Authentik expected in its metadata. Working through this by hand made the dependency between the pieces legible in a way a working setup never would have.
mkdir -p certs

openssl req -x509 -newkey rsa:2048 -nodes \
  -keyout certs/sp-key.pem \
  -out certs/sp-cert.pem \
  -days 3650 \
  -subj "/CN=archibus-saml-sp"

kubectl create secret tls archibus-saml-sp \
  --cert=certs/sp-cert.pem \
  --key=certs/sp-key.pem \
  --namespace archibus-dev \
  --dry-run=client -o yaml | kubectl apply -f -

Getting an image deployed was its own lesson. The loop: build, tag with a timestamp, push to the registry, bump the chart value, run the Helm upgrade. That worked for the first deployment and immediately produced the next problem.

Exhibit B: Build, push, bump, deploy A timestamp tag and a one-liner to update the chart value in place. Easy to start with. Easy to outgrow once you cannot confidently answer which version is running or when it last changed.
tag="$(date +%Y%m%d%H%M%S)"
image="registry.example.com/platform/component:${tag}"

docker build -t "${image}" .
docker push "${image}"

yq -i ".image.tag = \"${tag}\"" ./charts/component/values.yaml

helm upgrade --install archibus-sso ./charts/component \
  --namespace archibus-dev \
  --values ./charts/component/values.yaml

The script solved the first deployment and created the second question: what is actually running, and does it match what the repo says? That question is what makes GitOps feel like a relief rather than overhead, but it took enough broken deployments to make the question feel real.

The first SAML win

The concrete goal was getting Archibus running in Kubernetes with SAML SSO through a self-hosted Authentik installation. Getting there was not clean. SAML behind a Kubernetes ingress has real complexity: HTTPS URL reconstruction, protocol handling, certificate trust, cookie forwarding through multiple layers. The SP configuration has to know about its own public URL and handle the redirect loop correctly, or the login flow breaks in ways that are hard to diagnose from logs alone.

The direct OIDC path was theoretically available. The path that actually worked was simpler: one component handles the authentication exchange, and the application server receives identity through request headers. Less architecturally interesting, but more reliable when the upstream app was not built to speak OIDC natively.

Exhibit C: The boring proxy was the right answer Configure the server to trust the identity headers, let the auth proxy handle the OIDC or SAML exchange, and keep the application server out of the protocol. This approach traded architectural elegance for operational reliability, and that trade held up well in later environments too.
server:
  auth:
    mode: proxy
  proxyHeaders:
    - X-Auth-Request-Email
    - X-Auth-Request-User
    - X-Auth-Request-Groups

When SAML SSO finally worked end to end for an Archibus environment, the satisfaction was proportional to how much debugging had preceded it. It was also the moment the scope of "hosting this for a customer" became concrete. Not just running the application server. Running the identity layer, the certificates, the ingress, the DNS, the database, the backups, and the upgrade order that keeps all of those working together.

What you are actually buying

Self-hosting does not remove the cost of identity. It converts the cost.

A SaaS identity bill is visible. It shows up as a number with a unit at the end of the month. The self-hosted equivalent is less legible: operator time, the certificate rotation you have to schedule before it expires, the ingress debugging session when something changes upstream, the runbook you write so you can recreate the setup six months later. That is still a real cost, and in some ways harder to track because it does not arrive as an invoice.

The invoice I wanted to avoid was Okta. The invoice I actually picked up was operational responsibility.

For this use case the trade-off was worth it. Hosted customer environments, control over the identity stack, and the ability to shape the product around the hosting model were real benefits. But understanding those as the actual trade-offs made every subsequent decision easier. Each architectural choice after this was a version of the same question: which costs are you willing to own, and are you building the systems to manage them?

The manual operational work also clarified what GitOps was actually for. Not as an abstract best practice, but as a practical answer to three questions that kept coming up after enough kubectl apply sessions: what is deployed right now, why did it drift from what the repo says, and how do I recreate this environment from a clean state?

Exhibit D: The first control loop Apply the bootstrap manifest, poll until the cluster agrees. Manual operations starting to look like reconciliation. This is where ArgoCD started making sense as the next step, and where the Bash scripts started looking like something to replace rather than maintain.
kubectl apply -n argocd -f bootstrap-app.yaml

for attempt in $(seq 1 30); do
  status="$(argocd app get platform-bootstrap -o json | jq -r '.status.health.status')"
  sync="$(argocd app get platform-bootstrap -o json | jq -r '.status.sync.status')"

  if [ "$status" = "Healthy" ] && [ "$sync" = "Synced" ]; then
    break
  fi

  sleep 10
done

ArgoCD first, then Flux in later environments. The tools were responses to questions the manual phase had made concrete. The Bash scripts came first, and that was right. You have to feel the cost of what you are automating before the automation makes sense. The research cluster was a classroom with a bill attached, and both parts were necessary.