Add homelab services stack PRD

Verifies 16 DockerHub images, assigns target nodes per locked policy,
defines 3-phase deployment order (Infra → Media → Polish),
and captures open questions for Bobby.

Services: Traefik, Technitium DNS, AdGuard Home, Prometheus, Grafana,
Beszel, Dozzle, Portainer, Homepage, Authelia, Vaultwarden, Jellyfin,
Sonarr, Radarr, Prowlarr, Nextcloud

Domain: *.ai.home
No public internet exposure.
This commit is contained in:
Artemis (Iron Legion)
2026-05-25 17:17:23 -04:00
parent f3e7c5d108
commit d60bc96f1d
11 changed files with 826 additions and 0 deletions

View File

@@ -0,0 +1,52 @@
# Iron Legion Homelab Services Stack — Deployment Phases
## Phase 1: Infrastructure (Critical Path)
**Goal:** Get DNS, proxy, and basic monitoring alive. Everything else depends on this.
| Order | Service | Target Node | Why First | Dependencies |
|-------|---------|-------------|-----------|--------------|
| 1 | **Technitium DNS** | Bones | Name resolution for internal services | None |
| 2 | **Pi-hole** | Bones | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
| 3 | **Traefik** | Mark5 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
| 4 | **Authelia** | Mark5 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
| 5 | **Portainer** | Neo | Container management UI | Traefik + Authelia (for secured access) |
| 6 | **Prometheus** | Mark44 | Metrics collection baseline | None (scrape targets added in Phase 2) |
| 7 | **Beszel** | Mark44 | Fleet resource overview | None (agents installed per-node) |
| 8 | **Dozzle** | Mark44 | Real-time log viewing | None |
**Phase 1 milestone:** All nodes report healthy in Beszel. Portainer accessible via auth portal. DNS resolves.
---
## Phase 2: Media & File Collaboration
**Goal:** Self-hosted media acquisition and file sync.
| Order | Service | Target Node | Why Now | Dependencies |
|-------|---------|-------------|---------|--------------|
| 9 | **Jellyfin** | Mark44 | Media playback (GPU transcode if Mark44 has dGPU) | None (file ingest later) |
| 10 | **Sonarr** | Mark44 | TV management | Jellyfin (pushes organized files) |
| 11 | **Radarr** | Mark44 | Movie management | Jellyfin (pushes organized files) |
| 12 | **Prowlarr** | Mark44 | Indexer aggregation | Sonarr + Radarr (feeds them) |
| 13 | **Nextcloud** | Neo | File sync/collaboration | PostgreSQL (on Bones) |
| 14 | **Vaultwarden** | Neo | Password management | None (standalone) |
**Phase 2 milestone:** Media acquisition pipeline works end-to-end. Nextcloud syncs. Vaultwarden stores secrets.
---
## Phase 3: Polish & Expansion
**Goal:** Dashboards, advanced monitoring, nice-to-haves.
| Order | Service | Target Node | Why Deferred | Dependencies |
|-------|---------|-------------|--------------|--------------|
| 15 | **Grafana** | Mark44 | Dashboards need metrics to be interesting | Prometheus (needs data history) |
| 16 | **Homepage** | Mark5 | Custom dashboard for everything | All Phase 1+2 services (needs endpoints) |
| | **Promtail + Loki** | TBD | Centralized logging | Only if Dozzle is insufficient |
| | **Uptime-Kuma** | TBD | External uptime monitoring | Only if Beszel alerting is insufficient |
**Phase 3 milestone:** Single-pane dashboard (Homepage) shows all services. Alerts route to Discord or email.
## Deployment Cadence
- **One service per session.** No mass deployments. Validate each before proceeding.
- **Rollback plan:** `docker compose down` + `mv /opt/iron-legion/service{,-failed-$(date +%s)}`. Snapshot taken before each compose up.
- **Bobby approval required before Phase 2 begins.** Phase 1 success must be demonstrated.