Fork This Setup — Deployment Checklist¶

This guide is for people adapting this project to their own hardware

The original homelab is fully deployed and running — 3-node K3s HA cluster, 30+ apps, all managed via GitOps. If you want to run the same stack on your own machines, follow these steps to adapt the repo to your infrastructure.

This checklist covers every action required to go from a fresh fork to a fully running cluster, in the order they must be performed.
Each item is a single, concrete action you can complete independently.

Phase 0 — Prerequisites (workstation)¶

0.1 Install required tools on your workstation:

nix develop   # enters the devShell with kubectl, helm, sops, age, nixos-anywhere

0.2 Fork / push this repo to your own GitHub account and update the repo URL:

# Replace every occurrence of https://github.com/Yasuke2000/Homelab.git
# with your actual repo URL in all Application manifests:
grep -r "Yasuke2000/Homelab" apps/ --include="*.yaml" -l

0.3 Choose your domain (e.g. home.example.com) and replace every daviddelporte.com placeholder:

grep -r "daviddelporte.com" . --include="*.yaml" --include="*.nix" -l
# Then bulk-replace:
sed -i 's/yourdomain\.com/home.example.com/g' \
  $(grep -rl "daviddelporte.com" . --include="*.yaml" --include="*.nix")

Phase 1 — Secrets & Encryption¶

1.1 Generate your workstation age key:

bash scripts/setup-age-keys.sh
# Outputs your age public key — save the private key securely!

1.2 Copy your workstation age public key into .sops.yaml:

# .sops.yaml — replace the placeholder on the workstation line:
- age: age1xxxx...   # YOUR workstation pubkey here

1.3 Boot each node from the NixOS minimal ISO and collect its age key (generated by sops-nix during first boot):

# After first boot of each node:
ssh root@<redacted> "cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age"
ssh root@<redacted> "cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age"
ssh root@<redacted> "cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age"
# Paste each result into .sops.yaml under the matching node comment

1.4 Fill in all secret values in secrets/secrets.yaml and encrypt it:
```
# Edit the file — replace every REPLACE_WITH_... value:
sops secrets/secrets.yaml
# sops opens $EDITOR; fill in real values, save and quit — file is auto-encrypted
```
Required secrets to fill: k3s.token — generate with: openssl rand -hex 32 vaultwarden.adminToken — generate with: openssl rand -base64 48 grafana.adminPassword — strong password cloudflare.apiToken — Cloudflare API token (DNS-01 cert-manager) All app DB passwords — generate with: openssl rand -hex 16

Note: Renovate runs as a GitHub App — no PAT needed.

1.5 Verify the encrypted file looks correct (all values show ENC[...):

grep -c "ENC\[AES256_GCM" secrets/secrets.yaml
# Should be >= 10

Phase 2 — Hardware Inventory¶

Note: smart-deploy.sh (used during deployment) automates MAC/disk discovery, NixOS config patching, and age key generation. This manual phase is the alternative if you prefer to collect hardware info separately before deploying.

2.1 Boot each node from the NixOS minimal ISO and collect hardware info:
```
bash scripts/collect-hardware-info.sh <redacted>  # repeat for .12 and .13
```
Record the output for each node.

2.2 Update NIC interface names in host configs:

# hosts/node1/default.nix (and node2, node3)
networking.interfaces.eno1.ipv4.addresses = ...
# Replace eno1 with the actual NIC name from collect-hardware-info.sh

2.3 Update MAC addresses in each host config:

# hosts/nodeX/default.nix
networking.interfaces.eno1.macAddress = "xx:xx:xx:xx:xx:xx";

2.4 Verify disk device name in modules/disk-config.nix (default: /dev/sda):

# On each node (from live ISO):
lsblk -d -o NAME,SIZE,MODEL
# If using NVMe, change /dev/sda → /dev/nvme0n1 in modules/disk-config.nix

2.5 Add your SSH public key(s) to each host config:

# hosts/nodeX/default.nix
users.users.root.openssh.authorizedKeys.keys = [
  "ssh-ed25519 AAAA... you@workstation"
];

2.6 Update the Flannel interface name in modules/k3s-server-init.nix:
```
"--flannel-iface=eno1"   # replace eno1 with actual NIC name
```

Phase 3 — Node Deployment¶

Complete Phases 0–2 before deploying any nodes.

3.1 Validate the Nix flake evaluates cleanly:
```
nix flake check --no-build
```

3.2 Deploy node1 (cluster-init, etcd leader):

# If using smart-deploy.sh (recommended — auto-discovers hardware):
bash scripts/smart-deploy.sh <dhcp-ip> node1 server-init

# Or if you completed Phase 2 manually:
bash scripts/deploy-node.sh node1

Wait for node1 to reboot and the K3s API to be reachable:

ssh root@<redacted> "kubectl get nodes"

3.3 Copy kubeconfig from node1 to your workstation:

scp root@<redacted>:/etc/rancher/k3s/k3s.yaml ~/.kube/config
sed -i 's/127.0.0.1/<redacted>/g' ~/.kube/config
kubectl get nodes

3.4 Deploy node2:

bash scripts/smart-deploy.sh <dhcp-ip> node2 server-join

3.5 Deploy node3:

bash scripts/smart-deploy.sh <dhcp-ip> node3 server-join

3.6 Verify all 3 nodes are Ready and etcd is healthy:

kubectl get nodes
# All 3 nodes should show: Ready   control-plane,master
kubectl -n kube-system exec -it \
  $(kubectl -n kube-system get pod -l component=etcd -o name | head -1) \
  -- etcdctl endpoint health --cluster

Phase 4 — Bootstrap ArgoCD¶

4.1 Run the ArgoCD bootstrap script:
```
bash scripts/bootstrap-argocd.sh
```
4.2 Apply the app-of-apps root Application:
```
kubectl apply -f apps/app-of-apps.yaml
```

4.3 Monitor the ArgoCD sync waves in order:

kubectl -n argocd get applications -w
# Expected order: kyverno → metallb → cert-manager → traefik → longhorn → all apps

Phase 5 — TrueNAS Storage¶

Required before Longhorn backups and RomM work.

5.1 On TrueNAS SCALE, create an NFS dataset for Longhorn backups:
Dataset: datapool/longhorn-backup
NFS share path: /mnt/datapool/longhorn-backup
Allow hosts: <redacted>, <redacted>, <redacted>

5.2 Enable Longhorn backup target in apps/longhorn/application.yaml:

defaultSettings:
  backupTarget: nfs://<redacted>:/mnt/datapool/longhorn-backup

5.3 For RomM — create an NFS dataset for ROM files:
Dataset: datapool/roms
NFS share path: /mnt/datapool/roms

Phase 6 — DNS & TLS¶

6.1 Create wildcard DNS record pointing to Traefik's LoadBalancer IP (<redacted> by default):
```
*.home.example.com  →  <redacted>   (A record, internal DNS / UniFi)
```
Alternatively, add individual A records for each service.
6.2 Verify Let's Encrypt staging certificates issue correctly (no rate-limit risk):
```
kubectl get certificate -A
# All certificates should show: READY = True
```
If using staging certificates, your browser will show an untrusted cert warning — this is normal.

6.3 Switch to Let's Encrypt production issuer once staging works:

# In each Ingress annotation, change:
cert-manager.io/cluster-issuer: letsencrypt-staging
# to:
cert-manager.io/cluster-issuer: letsencrypt-prod

Phase 7 — Alertmanager¶

7.1 Add an Alertmanager receiver to apps/monitoring/application.yaml. Example using a Discord webhook:

alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      receiver: discord
    receivers:
      - name: discord
        discord_configs:
          - webhook_url: https://discord.com/api/webhooks/YOUR_WEBHOOK_ID/TOKEN
            title: '{{ .GroupLabels.alertname }}'
            message: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

7.2 Commit the change and verify Alertmanager is correctly configured:

kubectl -n monitoring get alertmanager
kubectl -n monitoring exec -it alertmanager-kube-prometheus-stack-alertmanager-0 \
  -- amtool config show

Phase 8 — Post-Deployment Hardening¶

8.1 Disable Vaultwarden admin token after initial setup:

# In apps/vaultwarden/manifests/deployment.yaml, add:
- name: DISABLE_ADMIN_TOKEN
  value: "true"

8.2 Restrict Actual Budget to internal network only — verify the Traefik middleware actual-budget-ipallow exists (or create it):

# apps/actual-budget/manifests/middleware.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: ipallow
  namespace: actual-budget
spec:
  ipAllowList:
    sourceRange:
      - <redacted>/24

8.3 Enable Longhorn recurring snapshots via the Longhorn UI or a RecurringJob CRD:

kubectl apply -f - <<'EOF'
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: daily-snapshot
  namespace: longhorn-system
spec:
  cron: "0 2 * * *"
  task: snapshot
  retain: 7
  concurrency: 1
EOF

8.4 Configure Renovate: add your GitHub PAT as RENOVATE_TOKEN in GitHub Actions secrets so Renovate can open automatic dependency update PRs.
8.5 Set up Uptime Kuma monitors for all services via its web UI at status.example.com after deployment.
8.6 Review and tighten K8s RBAC — currently using default ArgoCD project (unrestricted). Consider creating per-team ArgoCD projects with resource restrictions.

Quick Reference — Status Check Commands¶

# Cluster overview
kubectl get nodes -o wide
kubectl get pods -A | grep -v Running | grep -v Completed

# ArgoCD sync status
kubectl -n argocd get applications

# Certificate status
kubectl get certificate -A

# Longhorn volumes
kubectl -n longhorn-system get volumes

# Alerts firing
kubectl -n monitoring exec -it \
  $(kubectl -n monitoring get pod -l app.kubernetes.io/name=alertmanager -o name | head -1) \
  -- amtool alert query