- Replaced remaining Pi-hole references with AdGuard throughout master PRD - Constraints, Service Catalog, Data Persistence, Open Questions, Appendix all updated - ACL policy: fixed placeholder (MK7,MK7,MK7,MK7) to actual worker nodes - Appendix skeleton: removed pihole/ directory, updated image count 16→15 - Outstanding Decisions: Pi-hole inclusion marked as resolved
455 lines
26 KiB
Markdown
455 lines
26 KiB
Markdown
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Purpose & Scope
|
||
|
||
## Document ID
|
||
- **PRD:** homelab-services-stack-prd.md
|
||
- **Date:** 2026-05-25
|
||
- **Owner:** Artemis (AI Foreman, Iron Legion Labs)
|
||
- **Authority:** Commander Bobby
|
||
|
||
## Purpose
|
||
Central canonical reference for all Docker/Compose-based services Iron Legion Labs intends to deploy across the fleet. This document exists to:
|
||
1. Prevent duplicate research — every service's Docker image, metadata, and deployment pattern is captured once.
|
||
2. Guide node placement — which service runs where, and why.
|
||
3. Serve as the source of truth for Ansible-pull manifests, compose files, and future automation.
|
||
|
||
## Scope
|
||
### In Scope
|
||
- Service catalog with DockerHub-verified images (name, namespace, description, pull count, stars, last update)
|
||
- Category assignment (Network, Monitoring, Media, Security, Management, Infrastructure)
|
||
- Recommended target node per service
|
||
- Deployment phase priority
|
||
- High-level network, data, and security architecture
|
||
|
||
### Out of Scope
|
||
- Detailed compose-file YAML (deferred to per-service deployment PRDs)
|
||
- Specific Traefik middleware configurations (deferred to network PRD)
|
||
- GPU passthrough configs for media transcode (deferred to Mark44 workload PRD)
|
||
- Service-specific SSO/authelia rule authoring (deferred to security PRD)
|
||
|
||
## Living Document
|
||
This PRD is append-only for new services. Modifications to existing entries require Bobby sign-off. Additions follow the raw-metadata-to-summary pattern established in Section 4.
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Success Criteria
|
||
|
||
## Done When
|
||
1. ✅ Every service in the catalog has a verified DockerHub image with a non-stale last-update date (≤ 90 days old at time of cataloging)
|
||
2. ✅ Every service has an assigned target node that respects the **Node Assignments Locked** policy
|
||
3. ✅ Every service has a deployment phase (1, 2, or 3) agreed by Bobby
|
||
4. ✅ Network ingress/egress flow is documented at the service level (who talks to whom, via what port/protocol)
|
||
5. ✅ A single `docker-compose.yml` skeleton exists per phase, ready for population
|
||
6. ✅ Bobby has read and approved this PRD; any objections are captured as blockers below
|
||
|
||
## Verification Methods
|
||
- DockerHub API freshness check: `last_updated` field within 90 days
|
||
- Node lock compliance: cross-reference against `fleet-ops.md` node assignments
|
||
- Compose skeleton existence: `ls ~/.ansible-repo/new-build/phase-*.yml`
|
||
|
||
## Failure Modes
|
||
| Failure | Mitigation |
|
||
|---------|------------|
|
||
| DockerHub image stale or abandoned | Flag for alternative image research |
|
||
| Node assignment conflicts with locked policy | Escalate to Bobby immediately |
|
||
| Service dependency on another Phase 2+ service | Note in Open Questions, defer deployment |
|
||
|
||
## Known Blockers
|
||
- **Authelia** requires a domain + valid TLS cert. If Bobby does not want to expose to public internet, Traefik + internal Tailscale cert or self-signed CA required.
|
||
- **Technitium DNS** upstream forwarding policy not yet specified (DoH, DoT, plain UDP?).
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Constraints
|
||
|
||
## Hard Constraints (Non-Negotiable)
|
||
1. **Bare metal over abstraction.** Direct deployments preferred. Compose files are acceptable as orchestration glue, but no Docker Swarm mode, no Kubernetes, no abstraction layers Bobby cannot `ssh` into and debug.
|
||
2. **No nginx.** Traefik is the sole edge router. No nginx reverse proxies, no nginx sidecars.
|
||
3. **No Tailscale serve/funnel.** Services bind to `0.0.0.0` on their assigned node and are reachable via Tailscale mesh IP + port. No `tailscale serve`, no `tailscale funnel`.
|
||
4. **Node assignments locked.** Services do not migrate between nodes without Bobby's explicit written direction.
|
||
5. **Patch upstream source** when loopback/bind restrictions block direct deployment. Do not re-architect around the constraint.
|
||
|
||
## Node Assignment Policy (as of 2026-05-25)
|
||
**The G9 Swarm Cluster is the ONLY deployment target.** Mark5, Bones, Neo, and Mark44 are NOT part of this homelab services stack.
|
||
|
||
| Node | Role | Services Assigned |
|
||
|------|------|-------------------|
|
||
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, AdGuard Home, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
||
| **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
|
||
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
|
||
|
||
## Soft Constraints (Bobby Approval Required to Override)
|
||
- **Data residency:** All persistent volumes live on-node. No NFS, no Ceph, no distributed storage unless explicitly approved.
|
||
- **Secret management:** No plain-text secrets in compose files. Use `.env` files with `file:` mode 0600, or Vaultwarden if a secret store is needed.
|
||
- **Backup cadence:** Every service with persistent state must have a documented backup target. Default: daily rsync to MK7 secondary storage.
|
||
|
||
## Environment Assumptions
|
||
- All nodes run Debian Trixie or compatible.
|
||
- Docker Engine (not Docker Desktop) is installed on all target nodes.
|
||
- Tailscale is up and meshed. All inter-node traffic is over Tailscale IPs.
|
||
- `docker compose` plugin (v2) available, not legacy `docker-compose` standalone.
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Service Catalog
|
||
|
||
## Verified DockerHub Metadata (as of 2026-05-25)
|
||
|
||
### Swarm Placement Legend
|
||
| Placement | Swarm Behavior |
|
||
|-----------|----------------|
|
||
| **Global** | One replica on EVERY node (including manager) |
|
||
| **Replicated (N)** | N replicas distributed across workers by scheduler |
|
||
| **Manager Constraint** | Only on manager node(s) |
|
||
| **Label Constraint** | Only on nodes with matching `node.label` |
|
||
|
||
### Placement Rules for 5-Node Swarm (1 manager + 4 workers)
|
||
- **MK7** = Manager (can run global services + manager-constrained services)
|
||
- **MK33, MK34, MK39, MK42** = Workers (run global services + replicated services)
|
||
- **No node labels yet** — will label storage nodes (e.g., media storage) as Phase 3
|
||
|
||
---
|
||
|
||
### Network Layer
|
||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||
|---------|-------|-------|-------|---------|-----------|-------|
|
||
| **Traefik** | `traefik` | 3.49B | 3,634 | 2026-05-13 | **Global** | Every node receives ingress routing + Docker socket read-only |
|
||
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Single authoritative DNS — port 53 on MK7 only |
|
||
| **AdGuard Home** | `adguard/adguardhome` | 170.7M | 1,408 | 2026-05-25 | **Replicated (1)** | Single replica on MK7 — port 3000 |
|
||
|
||
### Monitoring / Observability
|
||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||
|---------|-------|-------|-------|---------|-----------|-------|
|
||
| **Prometheus** | `prom/prometheus` | 1.97B | 2,064 | 2026-05-25 | **Manager Constraint** | Central scraping server on MK7 |
|
||
| **Prometheus Node Exporter** | `prom/node-exporter` | — | — | — | **Global** | Runs on every node — scrapes CPU/mem/disk |
|
||
| **Grafana** | `grafana/grafana` | 5.22B | 3,540 | 2026-05-16 | **Replicated (1)** | Any worker (Phase 3, needs data history first) |
|
||
| **Beszel Hub** | `henrygd/beszel` | 12.58M | 32 | 2026-04-30 | **Manager Constraint** | Central hub on MK7 collects metrics from agents |
|
||
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Pending** | Planned global — reports to hub. Not yet deployed. |
|
||
| **Dozzle** | `amir20/dozzle` | 309.6M | 144 | 2026-05-25 | **Replicated (1)** | Any worker — read-only Docker socket |
|
||
|
||
### Management / Dashboard
|
||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||
|---------|-------|-------|-------|---------|-----------|-------|
|
||
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Replicated (1)** | MK7 — agentless mode, no portainer-agent needed |
|
||
| **Homepage** | `gethomepage/homepage` | 1.31M | 40 | 2026-05-25 | **Replicated (1)** | Any worker — all endpoints via env vars |
|
||
|
||
### Security / Identity
|
||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||
|---------|-------|-------|-------|---------|-----------|-------|
|
||
| **Authelia** | `authelia/authelia` | 75.2M | 208 | 2026-05-25 | **Replicated (1)** | Any worker — Traefik ForwardAuth middleware |
|
||
|
||
### Existing External Services (NOT in Swarm)
|
||
| Service | Location | Status | Notes |
|
||
|---------|----------|--------|-------|
|
||
| **Vaultwarden** | Neo (Nebuchadnezzar) | ✅ Production | Already deployed via Docker. Managed separately. |
|
||
| **Nextcloud** | Neo (Nebuchadnezzar) | ✅ Production | Nextcloud AIO. NOT part of G9 Swarm stack. |
|
||
|
||
> These services live outside the G9 Swarm cluster. No migration planned unless Bobby explicitly requests it.
|
||
|
||
### Media Stack (*arr + Jellyfin)
|
||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||
|---------|-------|-------|-------|---------|-----------|-------|
|
||
| **Jellyfin** | `jellyfin/jellyfin` | 370.4M | 1,535 | 2026-05-25 | **Label Constraint** | Nodes with `node.label.storage=media` (Phase 3) |
|
||
| **Sonarr** | `linuxserver/sonarr` | 2.34B | 2,118 | 2026-05-23 | **Replicated (1)** | Any worker — needs shared /downloads mount |
|
||
| **Radarr** | `linuxserver/radarr` | 2.36B | 1,791 | 2026-05-25 | **Replicated (1)** | Any worker — needs shared /downloads mount |
|
||
| **Prowlarr** | `linuxserver/prowlarr` | 35.9M | 403 | 2026-05-25 | **Replicated (1)** | Any worker — feeds Sonarr/Radarr via network |
|
||
|
||
## Total Services: 16 (catalog) + 3 (existing external) = 19 total fleet services
|
||
## Swarm Services: 16 (includes global Beszel agent and node exporter)
|
||
## Total DockerHub Pulls (aggregate): ~16.0B
|
||
## All images updated within 90 days
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Network Architecture
|
||
|
||
## Ingress Flow
|
||
```
|
||
[Internet] → [Tailscale mesh] → [MK7: Traefik] → [Target Node: Service Port]
|
||
```
|
||
|
||
## Traefik Role
|
||
- **Single entrypoint.** Every HTTP/HTTPS service routes through Traefik on MK7.
|
||
- **Tailscale-native.** Traefik binds to `0.0.0.0:80` and `0.0.0.0:443`. No `tailscale serve`.
|
||
- **Service discovery via Docker labels.** Each compose service exposes labels that Traefik reads from the Docker socket on MK7.
|
||
- **Docker socket access restricted.** Traefik mounts a read-only Docker socket. No other service gets socket access.
|
||
|
||
## Internal Traffic Patterns
|
||
| Source | Destination | Protocol | Port | Notes |
|
||
|--------|-------------|----------|------|-------|
|
||
| Traefik (MK7) | Any service | HTTP/HTTPS | Varies | Proxied via Tailscale IP |
|
||
| Beszel (MK7) | Any node | HTTP | Varies | Agent polls HTTP metrics endpoints (read-only) |
|
||
| Prometheus (MK7) | Any node | HTTP | 9100 (node-exporter) | Scrapes node and container metrics |
|
||
| Prowlarr (MK7) | Indexer sites | HTTPS | 443 | Outbound only |
|
||
| Sonarr/Radarr (MK7) | Prowlarr | HTTP | 9696 | Internal indexer lookup |
|
||
| Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
|
||
|
||
## DNS Resolution
|
||
|
||
| Component | Status | Detail |
|
||
|-----------|--------|--------|
|
||
| **Technitium (MK7)** | ✅ Deployed | Container running, port 53/5380 open |
|
||
| **`*.ai.home` zone** | ⏳ Pending | Not yet configured as authoritative — Tailscale MagicDNS currently handles name resolution |
|
||
| **AdGuard Home (MK7)** | ✅ Active | Recursive resolver + blocklists on port 3000. Replaces Pi-hole. |
|
||
|
||
**Planned Chain (not yet active):**
|
||
```
|
||
Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
|
||
```
|
||
|
||
**Current Fallback:** Tailscale MagicDNS provides `*.ai.home` resolution via Tailscale IP addresses. Technitium will assume authority once zone records are populated.
|
||
|
||
- **AdGuard Home admin UI** runs on port 3000.
|
||
|
||
## Port Allocation (Reserved)
|
||
| Port | Service |
|
||
|------|---------|
|
||
| 53 | DNS (Technitium / AdGuard) |
|
||
| 80/443 | HTTP/S (Traefik) |
|
||
| 3000 | Grafana |
|
||
| 9090 | Prometheus |
|
||
| 9000 | Portainer |
|
||
| 8096 | Jellyfin |
|
||
| 8989 | Sonarr |
|
||
| 7878 | Radarr |
|
||
| 9696 | Prowlarr |
|
||
| 8080 | Authelia (default) |
|
||
|
||
## TLS Strategy
|
||
- **Internal:** Traefik generates self-signed certs for `*.labs.internal`. Authelia can enforce client-cert if needed.
|
||
- **External:** Not applicable per no-Tailscale-funnel constraint. If Bobby later wants public access, Let's Encrypt via DNS challenge (Technitium controls the zone).
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Data & Persistence
|
||
|
||
## Volume Strategy
|
||
Every service with persistent state uses **bind mounts to on-node directories**. No named volumes, no NFS, no distributed storage.
|
||
|
||
## Directory Convention
|
||
```
|
||
/opt/iron-legion/
|
||
├── service-name/
|
||
│ ├── data/ # Application data (databases, config, state)
|
||
│ ├── config/ # Static config files mounted read-only where possible
|
||
│ └── logs/ # Log output (optional, if not sent to stdout)
|
||
```
|
||
|
||
## Per-Service Persistence
|
||
| Service | Data Path | Backup Target | Size Estimate |
|
||
|---------|-----------|---------------|---------------|
|
||
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
|
||
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
|
||
| **AdGuard Home** | `/opt/iron-legion/adguard/work/` `/opt/iron-legion/adguard/conf/` | MK7 | \u003c 500 MB |
|
||
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 5–20 GB |
|
||
| **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
|
||
| **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
|
||
| **Portainer** | `/opt/iron-legion/portainer/data/` | MK7 | < 100 MB |
|
||
| **Homepage** | `/opt/iron-legion/homepage/config/` | MK7 | < 10 MB |
|
||
| **Vaultwarden** | `/opt/iron-legion/vaultwarden/data/` | MK7 (encrypted) | < 500 MB |
|
||
| **Authelia** | `/opt/iron-legion/authelia/config/` | MK7 | < 10 MB |
|
||
| **Jellyfin** | `/opt/iron-legion/jellyfin/config/` `/opt/iron-legion/jellyfin/media/` | **None** (media too large) | < 1 GB config; media drive separate |
|
||
| **Sonarr** | `/opt/iron-legion/sonarr/config/` | MK7 | < 1 GB |
|
||
| **Radarr** | `/opt/iron-legion/radarr/config/` | MK7 | < 1 GB |
|
||
| **Prowlarr** | `/opt/iron-legion/prowlarr/config/` | MK7 | < 100 MB |
|
||
| **Nextcloud** | `/opt/iron-legion/nextcloud/data/` | MK7 (snapshots) | 10–50 GB |
|
||
|
||
## Media Storage Exception
|
||
- **Jellyfin media** lives on a separate mount (likely external USB/NVMe on MK7). Not backed up via rsync.
|
||
- **Sonarr/Radarr** download staging to a shared `/downloads` bind mount, then hardlink/copy to Jellyfin media library.
|
||
|
||
## Backup Tooling
|
||
- **Primary:** `rsync -a --delete` to MK7 secondary storage daily at 03:00 local.
|
||
- **Vaultwarden:** `rsqlite3` dump + `rsync` (encrypted at rest on MK7).
|
||
- **Prometheus:** `snapshot API` → rsync (not raw WAL files).
|
||
|
||
## Secret Management
|
||
- `.env` files live in `/opt/iron-legion/service-name/.env`, mode `0600`.
|
||
- Compose files use `${VAR_NAME}` syntax, never literal strings.
|
||
- Vaultwarden stores shared secrets (DB passwords, API keys). Artemis holds no secrets in memory.
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Security Model
|
||
|
||
## Authentication Layers
|
||
| Layer | Service | Scope | Notes |
|
||
|-------|---------|-------|-------|
|
||
| **Edge Auth** | Authelia | Traefik-secured endpoints | MFA portal, session cookies |
|
||
| **App Auth** | Vaultwarden | Password vault | Master password + 2FA |
|
||
| **App Auth** | Portainer | Container mgmt | Built-in RBAC, can integrate LDAP |
|
||
| **App Auth** | Nextcloud | File collaboration | Built-in, can integrate Authelia OIDC |
|
||
| **OS Auth** | SSH keys | Node access | Tailscale SSH + local keypairs |
|
||
|
||
## Authelia Deployment Notes
|
||
- **Target node:** MK7 (lightweight, sits beside Traefik)
|
||
- **Redirection URL:** Set Authelia `redirection_url` to the base domain of services needing auth.
|
||
- **Backend storage:** Uses SQLite initially. If Bobby wants HA, migrate to PostgreSQL on MK7.
|
||
- **Notification method:** File-based (writes to `/opt/iron-legion/authelia/notifications/`) until SMTP/Discord is configured.
|
||
- **Rule granularity:** Per-service `access_control` rules in `configuration.yml`. Default: `one_factor` for internal services, `two_factor` for management interfaces (Portainer, Grafana admin).
|
||
|
||
## Traefik ↔ Authelia Integration
|
||
```yaml
|
||
# Traefik middleware label (example)
|
||
traefik.http.routers.portainer.middlewares: authelia@docker
|
||
traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/verify?rd=https://auth.labs.internal
|
||
```
|
||
- **No nginx.** ForwardAuth middleware talks directly to Authelia over internal Docker network.
|
||
- **Bypass list:** Prometheus scrape targets, Beszel agents, Technitium DNS queries — these are internal metrics/DNS, no auth required.
|
||
|
||
## Secret Handling
|
||
| Secret Type | Storage Method | Rotation Trigger |
|
||
|-------------|----------------|------------------|
|
||
| Authelia session secret | `.env` file, 64-byte random hex | On any Authelia config reload |
|
||
| Vaultwarden admin token | `.env` file, 48-byte random | Only on compromise |
|
||
| DB passwords (Nextcloud ↔ PostgreSQL) | `.env` files on both nodes | On any DB migration or rebuild |
|
||
| Tailscale auth keys | Vaultwarden secure note | On key expiry or node rebuild |
|
||
| API keys (indexers, Cloudflare) | Vaultwarden secure note | On key rotation by provider |
|
||
|
||
## Network Segmentation
|
||
- **No VLANs.** Tailscale ACLs handle segment isolation.
|
||
- **ACL policy (draft):**
|
||
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
|
||
- `tag:services` (MK7 manager + MK33, MK34, MK39, MK42 workers) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
||
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
|
||
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
|
||
|
||
## Monitoring for Security Events
|
||
- **Dozzle** provides real-time log viewing but is NOT a SIEM.
|
||
- **Promtail/Loki** not yet in catalog. If Bobby wants log aggregation + alerting, add to Phase 3.
|
||
- **Beszel** alerts on anomalous CPU/memory — use as coarse intrusion detection proxy.
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Deployment Phases
|
||
|
||
## Phase 1: Infrastructure (Critical Path)
|
||
**Goal:** Get DNS, proxy, and basic monitoring alive. Everything else depends on this.
|
||
|
||
| Order | Service | Target Node | Why First | Dependencies |
|
||
|-------|---------|-------------|-----------|--------------|
|
||
| 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
|
||
| 2 | **AdGuard Home** | MK7 | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
|
||
| 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
|
||
| 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
|
||
| 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
|
||
| 6 | **Prometheus** | MK7 | Metrics collection baseline | None (scrape targets added in Phase 2) |
|
||
| 7 | **Beszel** | MK7 | Fleet resource overview | None (agents installed per-node) |
|
||
| 8 | **Dozzle** | MK7 | Real-time log viewing | None |
|
||
|
||
**Phase 1 milestone:** All nodes report healthy in Beszel. Portainer accessible via auth portal. DNS resolves.
|
||
|
||
---
|
||
|
||
## Phase 2: Media & File Collaboration
|
||
**Goal:** Self-hosted media acquisition and file sync.
|
||
|
||
| Order | Service | Target Node | Why Now | Dependencies |
|
||
|-------|---------|-------------|---------|--------------|
|
||
| 9 | **Jellyfin** | MK7 | Media playback (GPU transcode if MK7 has dGPU) | None (file ingest later) |
|
||
| 10 | **Sonarr** | MK7 | TV management | Jellyfin (pushes organized files) |
|
||
| 11 | **Radarr** | MK7 | Movie management | Jellyfin (pushes organized files) |
|
||
| 12 | **Prowlarr** | MK7 | Indexer aggregation | Sonarr + Radarr (feeds them) |
|
||
| 13 | **Nextcloud** | MK7 | File sync/collaboration | PostgreSQL (on MK7) |
|
||
| 14 | **Vaultwarden** | MK7 | Password management | None (standalone) |
|
||
|
||
**Phase 2 milestone:** Media acquisition pipeline works end-to-end. Nextcloud syncs. Vaultwarden stores secrets.
|
||
|
||
---
|
||
|
||
## Phase 3: Polish & Expansion
|
||
**Goal:** Dashboards, advanced monitoring, nice-to-haves.
|
||
|
||
| Order | Service | Target Node | Why Deferred | Dependencies |
|
||
|-------|---------|-------------|--------------|--------------|
|
||
| 15 | **Grafana** | MK7 | Dashboards need metrics to be interesting | Prometheus (needs data history) |
|
||
| 16 | **Homepage** | MK7 | Custom dashboard for everything | All Phase 1+2 services (needs endpoints) |
|
||
| – | **Promtail + Loki** | TBD | Centralized logging | Only if Dozzle is insufficient |
|
||
| – | **Uptime-Kuma** | TBD | External uptime monitoring | Only if Beszel alerting is insufficient |
|
||
|
||
**Phase 3 milestone:** Single-pane dashboard (Homepage) shows all services. Alerts route to Discord or email.
|
||
|
||
## Deployment Cadence
|
||
- **One service per session.** No mass deployments. Validate each before proceeding.
|
||
- **Rollback plan:** `docker compose down` + `mv /opt/iron-legion/service{,-failed-$(date +%s)}`. Snapshot taken before each compose up.
|
||
- **Bobby approval required before Phase 2 begins.** Phase 1 success must be demonstrated.
|
||
|
||
---
|
||
|
||
# Iron Legion Homelab Services Stack — Open Questions & Blockers
|
||
|
||
## Blocker Status
|
||
| # | Question | Impact | Default if Unresolved |
|
||
|---|----------|--------|----------------------|
|
||
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
|
||
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
|
||
| 3 | **AdGuard Home vs Technitium layout** — AdGuard runs on port 3000, Technitium on 53. No collision, but conditional forwarding from Technitium to AdGuard needs config. | Low — both run independently | Technitium uses upstream AdGuard for recursive queries |
|
||
| 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
|
||
| 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
|
||
| 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
|
||
| 7 | **GPU on MK7** — NVIDIA driver runtime for Jellyfin transcode? | Low — falls back to CPU transcode | Use `jellyfin/jellyfin` with `NVIDIA_VISIBLE_DEVICES` env if available |
|
||
| 8 | **Notification routing** — Discord webhook? SMTP? File only? | Low — default file works | File notifications in `/opt/iron-legion/authelia/notifications/` |
|
||
| 9 | **Tailscale ACL policy** — Draft exists in Section 7. Bobby must review and apply in Tailscale admin console. | Low | Stay permissive until Bobby approves |
|
||
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |
|
||
|
||
## Outstanding Decisions Required
|
||
1. ~~Pi-hole inclusion~~ — **Resolved.** AdGuard Home replaces Pi-hole in Phase 1. Removed from catalog.
|
||
2. **Authelia two-factor method** — TOTP via app (Google Authenticator) vs WebAuthn/FIDO2 keys?
|
||
3. **Home vs remote access** — If Bobby wants to share Jellyfin with friends/family outside Tailscale, public domain + Authelia guard is required.
|
||
|
||
---
|
||
|
||
# Appendix A — Raw DockerHub Metadata Table
|
||
|
||
**Full API response data captured 2026-05-25T16:45:00Z.**
|
||
|
||
| Service | Full Image | Namespace | Pulls | Stars | Last Updated | API Status |
|
||
|---------|-----------|-----------|-------|-------|--------------|------------|
|
||
| Traefik | `traefik` | `library` | 3,490,588,071 | 3,634 | 2026-05-13 | ✅ 200 |
|
||
| Technitium DNS | `technitium/dns-server` | `technitium` | 8,989,831 | 156 | 2026-05-09 | ✅ 200 |
|
||
| Homepage | `gethomepage/homepage` | `gethomepage` | 1,305,710 | 40 | 2026-05-25 | ✅ 200 |
|
||
| Beszel | `henrygd/beszel` | `henrygd` | 12,578,135 | 32 | 2026-04-30 | ✅ 200 |
|
||
| Dozzle | `amir20/dozzle` | `amir20` | 309,561,399 | 144 | 2026-05-25 | ✅ 200 |
|
||
| Grafana | `grafana/grafana` | `grafana` | 5,220,434,031 | 3,540 | 2026-05-16 | ✅ 200 |
|
||
| Prometheus | `prom/prometheus` | `prom` | 1,966,043,381 | 2,064 | 2026-05-25 | ✅ 200 |
|
||
| Portainer CE | `portainer/portainer-ce` | `portainer` | 1,464,874,500 | 2,665 | 2026-05-20 | ✅ 200 |
|
||
| Jellyfin | `jellyfin/jellyfin` | `jellyfin` | 370,358,966 | 1,535 | 2026-05-25 | ✅ 200 |
|
||
| Sonarr | `linuxserver/sonarr` | `linuxserver` | 2,339,638,307 | 2,118 | 2026-05-23 | ✅ 200 |
|
||
| Radarr | `linuxserver/radarr` | `linuxserver` | 2,359,097,569 | 1,791 | 2026-05-25 | ✅ 200 |
|
||
| Prowlarr | `linuxserver/prowlarr` | `linuxserver` | 35,913,487 | 403 | 2026-05-25 | ✅ 200 |
|
||
| Vaultwarden | `vaultwarden/server` | `vaultwarden` | 287,182,978 | 1,454 | 2026-05-17 | ✅ 200 |
|
||
| Nextcloud | `nextcloud` | `library` | 1,011,978,204 | 4,485 | 2026-05-23 | ✅ 200 |
|
||
| **Authelia** | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
||
|
||
**Total unique images:** 15
|
||
**Community health indicator:** All images have > 10 stars, > 1M pulls (except Beszel 32 stars, Homepage 40 stars — acceptable for young projects)
|
||
**Freshness:** All updated within 90 days except Beszel (30 days — still acceptable)
|
||
|
||
## Appendix B — Compose Skeleton Directory Map
|
||
```
|
||
~/.ansible-repo/new-build/
|
||
├── phase-1/ # Infrastructure
|
||
│ ├── technitium/
|
||
│ ├── pihole/
|
||
│ ├── traefik/
|
||
│ ├── authelia/
|
||
│ ├── portainer/
|
||
│ ├── prometheus/
|
||
│ ├── beszel/
|
||
│ └── dozzle/
|
||
├── phase-2/ # Media + Files
|
||
│ ├── jellyfin/
|
||
│ ├── sonarr/
|
||
│ ├── radarr/
|
||
│ ├── prowlarr/
|
||
│ ├── nextcloud/
|
||
│ └── vaultwarden/
|
||
└── phase-3/ # Dashboards + Polish
|
||
├── grafana/
|
||
├── homepage/
|
||
└── loki/ # Optional
|
||
```
|
||
Skeleton not yet created. Deferred until Bobby approves PRD.
|