Skip to content

Fork This Setup — Deployment Checklist

This guide is for people adapting this project to their own hardware

The original homelab is fully deployed and running — 3-node K3s HA cluster, 30+ apps, all managed via GitOps. If you want to run the same stack on your own machines, follow these steps to adapt the repo to your infrastructure.

This checklist covers every action required to go from a fresh fork to a fully running cluster, in the order they must be performed.
Each item is a single, concrete action you can complete independently.


Phase 0 — Prerequisites (workstation)

  • 0.1 Install required tools on your workstation:

    nix develop   # enters the devShell with kubectl, helm, sops, age, nixos-anywhere
    

  • 0.2 Fork / push this repo to your own GitHub account and update the repo URL:

    # Replace every occurrence of https://github.com/Yasuke2000/Homelab.git
    # with your actual repo URL in all Application manifests:
    grep -r "Yasuke2000/Homelab" apps/ --include="*.yaml" -l
    

  • 0.3 Choose your domain (e.g. home.example.com) and replace every daviddelporte.com placeholder:

    grep -r "daviddelporte.com" . --include="*.yaml" --include="*.nix" -l
    # Then bulk-replace:
    sed -i 's/yourdomain\.com/home.example.com/g' \
      $(grep -rl "daviddelporte.com" . --include="*.yaml" --include="*.nix")
    


Phase 1 — Secrets & Encryption

  • 1.1 Generate your workstation age key:

    bash scripts/setup-age-keys.sh
    # Outputs your age public key — save the private key securely!
    

  • 1.2 Copy your workstation age public key into .sops.yaml:

    # .sops.yaml — replace the placeholder on the workstation line:
    - age: age1xxxx...   # YOUR workstation pubkey here
    

  • 1.3 Boot each node from the NixOS minimal ISO and collect its age key (generated by sops-nix during first boot):

    # After first boot of each node:
    ssh root@<redacted> "cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age"
    ssh root@<redacted> "cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age"
    ssh root@<redacted> "cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age"
    # Paste each result into .sops.yaml under the matching node comment
    

  • 1.4 Fill in all secret values in secrets/secrets.yaml and encrypt it:

    # Edit the file — replace every REPLACE_WITH_... value:
    sops secrets/secrets.yaml
    # sops opens $EDITOR; fill in real values, save and quit — file is auto-encrypted
    

    Required secrets to fill: k3s.token — generate with: openssl rand -hex 32 vaultwarden.adminToken — generate with: openssl rand -base64 48 grafana.adminPassword — strong password cloudflare.apiToken — Cloudflare API token (DNS-01 cert-manager) All app DB passwords — generate with: openssl rand -hex 16

    Note: Renovate runs as a GitHub App — no PAT needed.

  • 1.5 Verify the encrypted file looks correct (all values show ENC[...):

    grep -c "ENC\[AES256_GCM" secrets/secrets.yaml
    # Should be >= 10
    


Phase 2 — Hardware Inventory

Note: smart-deploy.sh (used during deployment) automates MAC/disk discovery, NixOS config patching, and age key generation. This manual phase is the alternative if you prefer to collect hardware info separately before deploying.

  • 2.1 Boot each node from the NixOS minimal ISO and collect hardware info:

    bash scripts/collect-hardware-info.sh <redacted>  # repeat for .12 and .13
    
    Record the output for each node.

  • 2.2 Update NIC interface names in host configs:

    # hosts/node1/default.nix (and node2, node3)
    networking.interfaces.eno1.ipv4.addresses = ...
    # Replace eno1 with the actual NIC name from collect-hardware-info.sh
    

  • 2.3 Update MAC addresses in each host config:

    # hosts/nodeX/default.nix
    networking.interfaces.eno1.macAddress = "xx:xx:xx:xx:xx:xx";
    

  • 2.4 Verify disk device name in modules/disk-config.nix (default: /dev/sda):

    # On each node (from live ISO):
    lsblk -d -o NAME,SIZE,MODEL
    # If using NVMe, change /dev/sda → /dev/nvme0n1 in modules/disk-config.nix
    

  • 2.5 Add your SSH public key(s) to each host config:

    # hosts/nodeX/default.nix
    users.users.root.openssh.authorizedKeys.keys = [
      "ssh-ed25519 AAAA... you@workstation"
    ];
    

  • 2.6 Update the Flannel interface name in modules/k3s-server-init.nix:

    "--flannel-iface=eno1"   # replace eno1 with actual NIC name
    


Phase 3 — Node Deployment

Complete Phases 0–2 before deploying any nodes.

  • 3.1 Validate the Nix flake evaluates cleanly:

    nix flake check --no-build
    

  • 3.2 Deploy node1 (cluster-init, etcd leader):

    # If using smart-deploy.sh (recommended — auto-discovers hardware):
    bash scripts/smart-deploy.sh <dhcp-ip> node1 server-init
    
    # Or if you completed Phase 2 manually:
    bash scripts/deploy-node.sh node1
    
    Wait for node1 to reboot and the K3s API to be reachable:
    ssh root@<redacted> "kubectl get nodes"
    

  • 3.3 Copy kubeconfig from node1 to your workstation:

    scp root@<redacted>:/etc/rancher/k3s/k3s.yaml ~/.kube/config
    sed -i 's/127.0.0.1/<redacted>/g' ~/.kube/config
    kubectl get nodes
    

  • 3.4 Deploy node2:

    bash scripts/smart-deploy.sh <dhcp-ip> node2 server-join
    

  • 3.5 Deploy node3:

    bash scripts/smart-deploy.sh <dhcp-ip> node3 server-join
    

  • 3.6 Verify all 3 nodes are Ready and etcd is healthy:

    kubectl get nodes
    # All 3 nodes should show: Ready   control-plane,master
    kubectl -n kube-system exec -it \
      $(kubectl -n kube-system get pod -l component=etcd -o name | head -1) \
      -- etcdctl endpoint health --cluster
    


Phase 4 — Bootstrap ArgoCD

  • 4.1 Run the ArgoCD bootstrap script:

    bash scripts/bootstrap-argocd.sh
    

  • 4.2 Apply the app-of-apps root Application:

    kubectl apply -f apps/app-of-apps.yaml
    

  • 4.3 Monitor the ArgoCD sync waves in order:

    kubectl -n argocd get applications -w
    # Expected order: kyverno → metallb → cert-manager → traefik → longhorn → all apps
    


Phase 5 — TrueNAS Storage

Required before Longhorn backups and RomM work.

  • 5.1 On TrueNAS SCALE, create an NFS dataset for Longhorn backups:
  • Dataset: datapool/longhorn-backup
  • NFS share path: /mnt/datapool/longhorn-backup
  • Allow hosts: <redacted>, <redacted>, <redacted>

  • 5.2 Enable Longhorn backup target in apps/longhorn/application.yaml:

    defaultSettings:
      backupTarget: nfs://<redacted>:/mnt/datapool/longhorn-backup
    

  • 5.3 For RomM — create an NFS dataset for ROM files:

  • Dataset: datapool/roms
  • NFS share path: /mnt/datapool/roms

Phase 6 — DNS & TLS

  • 6.1 Create wildcard DNS record pointing to Traefik's LoadBalancer IP (<redacted> by default):

    *.home.example.com  →  <redacted>   (A record, internal DNS / UniFi)
    
    Alternatively, add individual A records for each service.

  • 6.2 Verify Let's Encrypt staging certificates issue correctly (no rate-limit risk):

    kubectl get certificate -A
    # All certificates should show: READY = True
    

    If using staging certificates, your browser will show an untrusted cert warning — this is normal.

  • 6.3 Switch to Let's Encrypt production issuer once staging works:

    # In each Ingress annotation, change:
    cert-manager.io/cluster-issuer: letsencrypt-staging
    # to:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    


Phase 7 — Alertmanager

  • 7.1 Add an Alertmanager receiver to apps/monitoring/application.yaml. Example using a Discord webhook:

    alertmanager:
      config:
        global:
          resolve_timeout: 5m
        route:
          group_by: ['alertname', 'namespace']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 4h
          receiver: discord
        receivers:
          - name: discord
            discord_configs:
              - webhook_url: https://discord.com/api/webhooks/YOUR_WEBHOOK_ID/TOKEN
                title: '{{ .GroupLabels.alertname }}'
                message: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
    

  • 7.2 Commit the change and verify Alertmanager is correctly configured:

    kubectl -n monitoring get alertmanager
    kubectl -n monitoring exec -it alertmanager-kube-prometheus-stack-alertmanager-0 \
      -- amtool config show
    


Phase 8 — Post-Deployment Hardening

  • 8.1 Disable Vaultwarden admin token after initial setup:

    # In apps/vaultwarden/manifests/deployment.yaml, add:
    - name: DISABLE_ADMIN_TOKEN
      value: "true"
    

  • 8.2 Restrict Actual Budget to internal network only — verify the Traefik middleware actual-budget-ipallow exists (or create it):

    # apps/actual-budget/manifests/middleware.yaml
    apiVersion: traefik.io/v1alpha1
    kind: Middleware
    metadata:
      name: ipallow
      namespace: actual-budget
    spec:
      ipAllowList:
        sourceRange:
          - <redacted>/24
    

  • 8.3 Enable Longhorn recurring snapshots via the Longhorn UI or a RecurringJob CRD:

    kubectl apply -f - <<'EOF'
    apiVersion: longhorn.io/v1beta2
    kind: RecurringJob
    metadata:
      name: daily-snapshot
      namespace: longhorn-system
    spec:
      cron: "0 2 * * *"
      task: snapshot
      retain: 7
      concurrency: 1
    EOF
    

  • 8.4 Configure Renovate: add your GitHub PAT as RENOVATE_TOKEN in GitHub Actions secrets so Renovate can open automatic dependency update PRs.

  • 8.5 Set up Uptime Kuma monitors for all services via its web UI at status.example.com after deployment.

  • 8.6 Review and tighten K8s RBAC — currently using default ArgoCD project (unrestricted). Consider creating per-team ArgoCD projects with resource restrictions.


Quick Reference — Status Check Commands

# Cluster overview
kubectl get nodes -o wide
kubectl get pods -A | grep -v Running | grep -v Completed

# ArgoCD sync status
kubectl -n argocd get applications

# Certificate status
kubectl get certificate -A

# Longhorn volumes
kubectl -n longhorn-system get volumes

# Alerts firing
kubectl -n monitoring exec -it \
  $(kubectl -n monitoring get pod -l app.kubernetes.io/name=alertmanager -o name | head -1) \
  -- amtool alert query