Architecture Overview¶
HomeScale uses a full GitOps model: this repository is the source of truth for every cluster. Nothing is applied manually except the one-time bootstrap. All ongoing changes flow through a pull request.
GitOps loop¶
┌─────────────┐ PR merged ┌──────────────┐ Terraform apply ┌───────────────────┐
│ Git (main) │ ───────────────► │ CI: deploy │ ───────────────────► │ Cloud resources │
└─────────────┘ └──────────────┘ │ (Cloudflare, NB…) │
│ │ └───────────────────┘
│ │ Omni cluster template sync
│ ▼
│ ┌──────────────┐
│ │ Omni / Talos │ (cluster config, node assignments)
│ └──────────────┘
│
│ ArgoCD polls every 30 s
▼
┌─────────────────────────────────────────────────────┐
│ ArgoCD (on each cluster) │
│ ┌──────────────────────────┐ │
│ │ app-of-apps (apps.yaml) │ two sources: │
│ │ • clusters/<cluster>/ │ raw manifests │
│ │ • apps/ │ Helm → Applications │
│ └──────────────────────────┘ │
│ │ generates per-app ArgoCD Application │
│ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ ArgoCD Application (one per enabled app) │ │
│ │ → syncs apps/<name>/ Helm chart to cluster │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
App-of-apps pattern¶
Each cluster bootstraps with a single ArgoCD app-of-apps (clusters/<cluster>/apps.yaml) applied manually once. That app has two sources:
clusters/<cluster>/— any raw Kubernetes manifests scoped to that cluster (e.g. cluster-specific CRs, Omni machine selectors)apps/— the Helm chart that reads everyapps/*/app.yamland renders one ArgoCDApplicationper enabled app
From that point on ArgoCD self-manages: changes to this repo are picked up automatically within the configured reconciliation interval (timeout.reconciliation: 30s).
App catalog (apps/)¶
apps/ is a Helm chart. apps/templates/applications.yaml loops over every apps/*/app.yaml using Helm fileset + fromYaml and generates an ArgoCD Application for each app that is enabled for the current cluster.
Enabling / disabling apps per cluster¶
Each app.yaml has a defaultDeploy boolean and optional per-cluster overrides under clusters.<name>:
defaultDeploy: false # don't deploy everywhere by default
clusters:
boa1-prod:
deploy: true # enable only on this cluster
values:
replicaCount: 3 # cluster-specific value override (deep-merged)
See the App reference for the full field list.
Apps built in CI¶
Any app directory that contains both a Chart.yaml and a Dockerfile is treated as a first-party image. CI builds it on every merge to main and pushes to ghcr.io/homescalecloud/<name>.
Cluster topology¶
| Cluster | Role |
|---|---|
mgmt |
Runs ArgoCD, Infisical operator, and shared infrastructure components |
boa1-prod |
Production workloads for region boa1 |
boa1-gw |
Gateway cluster for region boa1; bare-metal provisioning, subnet routing |
Gateway clusters (*-gw) have three distinct roles:
- Bare-metal provisioning — the
omni-infra-providerapp runs the Omni infrastructure provider to PXE-boot Talos nodes in the region - Subnet routing — a NetBird subnet router exposes the region's BMC and MGMT subnets across the WireGuard mesh so they're reachable from
mgmtand CI - Region ↔ mgmt connectivity — bridges region-local services to the central
mgmtcluster
Sync wave order¶
ArgoCD sync waves control the ordering of app deployments on a cluster. Lower wave numbers sync first.
| Wave | Apps | Why first |
|---|---|---|
| -40 | cilium |
CNI must be ready before any other pod can schedule |
| -35 | infisical, multus |
Secrets operator must be ready so other apps can pull secrets; Multus for multi-homed pods |
| -30 | cert-manager, argocd, rbac |
TLS infrastructure and access control before workloads |
| -25 | generic-device-plugin |
Node resource registration before consumers |
| -20 | netbird, cert-manager-crs, spegel |
Mesh access and certificate issuers before services need them |
| -10 | external-dns, netbird-crs, kubelet-serving-cert-approver |
DNS registration and network routing before apps |
| -5 | volsync |
Backup operator ready before app PVCs need it |
| 0 | everything else | Default wave |
| 1+ | apps that depend on wave-0 apps |
CI/CD pipeline¶
Three reusable workflows are called from .github/workflows/ci.yaml:
scan — security and lint¶
Runs on every PR and push:
pre-commit— YAML lint, trailing whitespace, detect-secrets, Helm lint- PR title validation against Conventional Commits (enforced by gitlint)
- CodeQL — static analysis
- Trivy — config scan for misconfigurations in Kubernetes manifests
build — Docker images¶
- Detects which
apps/*/directories changed (on PRs); builds all on push tomain - Builds the
Dockerfileif present, pushes toghcr.io/homescalecloud/<name> - Runs a Trivy image scan on each built image
- Lints Helm charts (
helm lint)
deploy — infrastructure and cluster sync¶
Runs only on push to main (after scan and build pass):
- Joins the NetBird mesh with an ephemeral one-time setup key so it can reach internal infrastructure
- Terraform plan/apply —
infra/terraform/manages Cloudflare DNS, DigitalOcean, Infisical project setup, NetBird configuration - Omni cluster template sync — for clusters whose
cluster.yamlchanged, pushes the updated template to Omni
ArgoCD then picks up any Git changes and reconciles cluster state automatically — no deploy step needed for app changes.