Remove Mark5/Bones/Neo/Mark44 — G9 Swarm Cluster is the ONLY deployment target

All services reassigned to MK7 (Swarm Manager) or swarm-distributed.
Per Bobby: Mark5, Bones, Neo, Mark44 are NOT part of this homelab services stack.

Phase 1 infra (Traefik, DNS, AdGuard, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage) → MK7
Phase 2 media (Jellyfin, Sonarr, Radarr, Prowlarr) → Swarm distributed
Phase 3 dashboards (Grafana, Homepage) → Swarm distributed

Also updates:
- Backup target: MK7 secondary storage (was Bones)
- Network/DNS/Security model: all refs to Bones/Neo/Mark5/Mark44 corrected
This commit is contained in:
2026-05-25 18:24:22 -04:00
parent 4cff1b5e48
commit fea42f892b
8 changed files with 155 additions and 151 deletions

View File

@@ -8,18 +8,18 @@
5. **Patch upstream source** when loopback/bind restrictions block direct deployment. Do not re-architect around the constraint. 5. **Patch upstream source** when loopback/bind restrictions block direct deployment. Do not re-architect around the constraint.
## Node Assignment Policy (as of 2026-05-25) ## Node Assignment Policy (as of 2026-05-25)
**The G9 Swarm Cluster is the ONLY deployment target.** Mark5, Bones, Neo, and Mark44 are NOT part of this homelab services stack.
| Node | Role | Services Assigned | | Node | Role | Services Assigned |
|------|------|-------------------| |------|------|-------------------|
| **Neo** | Services node | Nextcloud AIO, Vaultwarden, Portainer (UI/mgmt) | | **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, AdGuard Home, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
| **Bones** | Infrastructure node | Paperclip + Ollama + PostgreSQL, Technitium DNS (infra DNS) | | **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
| **Mark44 (Hulkbuster)** | Heavy-lifting / GPU | Monitoring stack (Prometheus, Grafana, Beszel), media apps with transcode (Jellyfin) | | **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
| **Mark5 (Suitcase)** | Research / light-task | Traefik (edge router — lightweight, always-on), Homepage (lightweight dashboard) |
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane |
## Soft Constraints (Bobby Approval Required to Override) ## Soft Constraints (Bobby Approval Required to Override)
- **Data residency:** All persistent volumes live on-node. No NFS, no Ceph, no distributed storage unless explicitly approved. - **Data residency:** All persistent volumes live on-node. No NFS, no Ceph, no distributed storage unless explicitly approved.
- **Secret management:** No plain-text secrets in compose files. Use `.env` files with `file:` mode 0600, or Vaultwarden if a secret store is needed. - **Secret management:** No plain-text secrets in compose files. Use `.env` files with `file:` mode 0600, or Vaultwarden if a secret store is needed.
- **Backup cadence:** Every service with persistent state must have a documented backup target. Default: daily rsync to Bones secondary storage. - **Backup cadence:** Every service with persistent state must have a documented backup target. Default: daily rsync to MK7 secondary storage.
## Environment Assumptions ## Environment Assumptions
- All nodes run Debian Trixie or compatible. - All nodes run Debian Trixie or compatible.

View File

@@ -5,42 +5,42 @@
### Network Layer ### Network Layer
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Traefik** | `traefik` | `library` | Cloud Native Edge Router | 3.49B | 3,634 | 2026-05-13 | Mark5 | | **Traefik** | `traefik` | `library` | Cloud Native Edge Router | 3.49B | 3,634 | 2026-05-13 | MK7 |
| **Technitium DNS** | `technitium/dns-server` | `technitium` | Self-hosted DNS server with DoH/DoT | 8.99M | 156 | 2026-05-09 | Bones | | **Technitium DNS** | `technitium/dns-server` | `technitium` | Self-hosted DNS server with DoH/DoT | 8.99M | 156 | 2026-05-09 | MK7 |
| **AdGuard Home** | `adguard/adguardhome` | `adguard` | Network-wide ad blocking DNS server | 170.7M | 1,408 | 2026-05-25 | Bones | | **AdGuard Home** | `adguard/adguardhome` | `adguard` | Network-wide ad blocking DNS server | 170.7M | 1,408 | 2026-05-25 | MK7 |
### Monitoring / Observability ### Monitoring / Observability
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Prometheus** | `prom/prometheus` | `prom` | Systems monitoring & alerting toolkit | 1.97B | 2,064 | 2026-05-25 | Mark44 | | **Prometheus** | `prom/prometheus` | `prom` | Systems monitoring & alerting toolkit | 1.97B | 2,064 | 2026-05-25 | MK7 |
| **Grafana** | `grafana/grafana` | `grafana` | Analytics & monitoring dashboards | 5.22B | 3,540 | 2026-05-16 | Mark44 | | **Grafana** | `grafana/grafana` | `grafana` | Analytics & monitoring dashboards | 5.22B | 3,540 | 2026-05-16 | MK7 |
| **Beszel** | `henrygd/beszel` | `henrygd` | Lightweight server monitoring hub with Docker stats | 12.58M | 32 | 2026-04-30 | Mark44 | | **Beszel** | `henrygd/beszel` | `henrygd` | Lightweight server monitoring hub with Docker stats | 12.58M | 32 | 2026-04-30 | MK7 |
| **Dozzle** | `amir20/dozzle` | `amir20` | Real-time Docker container log viewer | 309.6M | 144 | 2026-05-25 | Mark44 | | **Dozzle** | `amir20/dozzle` | `amir20` | Real-time Docker container log viewer | 309.6M | 144 | 2026-05-25 | MK7 |
### Management / Dashboard ### Management / Dashboard
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Portainer CE** | `portainer/portainer-ce` | `portainer` | Lightweight container management UI | 1.46B | 2,665 | 2026-05-20 | Neo | | **Portainer CE** | `portainer/portainer-ce` | `portainer` | Lightweight container management UI | 1.46B | 2,665 | 2026-05-20 | MK7 (Phase 2 swarm) |
| **Homepage** | `gethomepage/homepage` | `gethomepage` | Customizable homepage with integrations | 1.31M | 40 | 2026-05-25 | Mark5 | | **Homepage** | `gethomepage/homepage` | `gethomepage` | Customizable homepage with integrations | 1.31M | 40 | 2026-05-25 | MK7 |
### Security / Identity ### Security / Identity
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Vaultwarden** | `vaultwarden/server` | `vaultwarden` | Bitwarden-compatible password manager (Rust) | 287.2M | 1,454 | 2026-05-17 | Neo | | **Vaultwarden** | `vaultwarden/server` | `vaultwarden` | Bitwarden-compatible password manager (Rust) | 287.2M | 1,454 | 2026-05-17 | MK7 (Phase 2 swarm) |
| **Authelia** | `authelia/authelia` | `authelia` | Multi-factor authentication portal | 75.2M | 208 | 2026-05-25 | Mark5 | | **Authelia** | `authelia/authelia` | `authelia` | Multi-factor authentication portal | 75.2M | 208 | 2026-05-25 | MK7 |
### Media Stack (*arr + Jellyfin) ### Media Stack (*arr + Jellyfin)
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Jellyfin** | `jellyfin/jellyfin` | `jellyfin` | Free software media browser | 370.4M | 1,535 | 2026-05-25 | Mark44 | | **Jellyfin** | `jellyfin/jellyfin` | `jellyfin` | Free software media browser | 370.4M | 1,535 | 2026-05-25 | MK7 |
| **Sonarr** | `linuxserver/sonarr` | `linuxserver` | TV series management | 2.34B | 2,118 | 2026-05-23 | Mark44 | | **Sonarr** | `linuxserver/sonarr` | `linuxserver` | TV series management | 2.34B | 2,118 | 2026-05-23 | MK7 |
| **Radarr** | `linuxserver/radarr` | `linuxserver` | Movie management | 2.36B | 1,791 | 2026-05-25 | Mark44 | | **Radarr** | `linuxserver/radarr` | `linuxserver` | Movie management | 2.36B | 1,791 | 2026-05-25 | MK7 |
| **Prowlarr** | `linuxserver/prowlarr` | `linuxserver` | Indexer management | 35.9M | 403 | 2026-05-25 | Mark44 | | **Prowlarr** | `linuxserver/prowlarr` | `linuxserver` | Indexer management | 35.9M | 403 | 2026-05-25 | MK7 |
### File / Collaboration ### File / Collaboration
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Nextcloud** | `nextcloud` | `library` | Self-hosted file sync & collaboration | 1.01B | 4,485 | 2026-05-23 | Neo | | **Nextcloud** | `nextcloud` | `library` | Self-hosted file sync & collaboration | 1.01B | 4,485 | 2026-05-23 | MK7 (Phase 2 swarm) |
## Total Services: 15 ## Total Services: 15
## Total DockerHub Pulls (aggregate): ~16.0B ## Total DockerHub Pulls (aggregate): ~16.0B

View File

@@ -2,28 +2,28 @@
## Ingress Flow ## Ingress Flow
``` ```
[Internet] → [Tailscale mesh] → [Mark5: Traefik] → [Target Node: Service Port] [Internet] → [Tailscale mesh] → [MK7: Traefik] → [Target Node: Service Port]
``` ```
## Traefik Role ## Traefik Role
- **Single entrypoint.** Every HTTP/HTTPS service routes through Traefik on Mark5. - **Single entrypoint.** Every HTTP/HTTPS service routes through Traefik on MK7.
- **Tailscale-native.** Traefik binds to `0.0.0.0:80` and `0.0.0.0:443`. No `tailscale serve`. - **Tailscale-native.** Traefik binds to `0.0.0.0:80` and `0.0.0.0:443`. No `tailscale serve`.
- **Service discovery via Docker labels.** Each compose service exposes labels that Traefik reads from the Docker socket on Mark5. - **Service discovery via Docker labels.** Each compose service exposes labels that Traefik reads from the Docker socket on MK7.
- **Docker socket access restricted.** Traefik mounts a read-only Docker socket. No other service gets socket access. - **Docker socket access restricted.** Traefik mounts a read-only Docker socket. No other service gets socket access.
## Internal Traffic Patterns ## Internal Traffic Patterns
| Source | Destination | Protocol | Port | Notes | | Source | Destination | Protocol | Port | Notes |
|--------|-------------|----------|------|-------| |--------|-------------|----------|------|-------|
| Traefik (Mark5) | Any service | HTTP/HTTPS | Varies | Proxied via Tailscale IP | | Traefik (MK7) | Any service | HTTP/HTTPS | Varies | Proxied via Tailscale IP |
| Beszel (Mark44) | Any node | HTTP | Varies | Agent polls HTTP metrics endpoints (read-only) | | Beszel (MK7) | Any node | HTTP | Varies | Agent polls HTTP metrics endpoints (read-only) |
| Prometheus (Mark44) | Any node | HTTP | 9100 (node-exporter) | Scrapes node and container metrics | | Prometheus (MK7) | Any node | HTTP | 9100 (node-exporter) | Scrapes node and container metrics |
| Prowlarr (Mark44) | Indexer sites | HTTPS | 443 | Outbound only | | Prowlarr (MK7) | Indexer sites | HTTPS | 443 | Outbound only |
| Sonarr/Radarr (Mark44) | Prowlarr | HTTP | 9696 | Internal indexer lookup | | Sonarr/Radarr (MK7) | Prowlarr | HTTP | 9696 | Internal indexer lookup |
| Nextcloud (Neo) | PostgreSQL (Bones) | TCP | 5432 | DB traffic over Tailscale | | Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
## DNS Resolution ## DNS Resolution
- **Technitium (Bones)** is the authoritative internal DNS for `*.ai.home`. - **Technitium (MK7)** is the authoritative internal DNS for `*.ai.home`.
- **AdGuard Home (Bones)** handles recursive resolution with ad-block lists. Replaces Pi-hole. - **AdGuard Home (MK7)** handles recursive resolution with ad-block lists. Replaces Pi-hole.
- **Chain:** Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9) - **Chain:** Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
- **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution. - **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution.
- **AdGuard Home admin UI** runs on port 3000 by default (separate from Grafana if co-located). - **AdGuard Home admin UI** runs on port 3000 by default (separate from Grafana if co-located).

View File

@@ -15,29 +15,29 @@ Every service with persistent state uses **bind mounts to on-node directories**.
## Per-Service Persistence ## Per-Service Persistence
| Service | Data Path | Backup Target | Size Estimate | | Service | Data Path | Backup Target | Size Estimate |
|---------|-----------|---------------|---------------| |---------|-----------|---------------|---------------|
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | Bones (daily rsync) | < 50 MB | | **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | Bones | < 10 MB | | **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
| **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | Bones | < 500 MB | | **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | MK7 | < 500 MB |
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | Bones (retention: 15d local, 90d backup) | 520 GB | | **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 520 GB |
| **Grafana** | `/opt/iron-legion/grafana/data/` | Bones | < 500 MB | | **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
| **Beszel** | `/opt/iron-legion/beszel/data/` | Bones | < 1 GB | | **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
| **Portainer** | `/opt/iron-legion/portainer/data/` | Bones | < 100 MB | | **Portainer** | `/opt/iron-legion/portainer/data/` | MK7 | < 100 MB |
| **Homepage** | `/opt/iron-legion/homepage/config/` | Bones | < 10 MB | | **Homepage** | `/opt/iron-legion/homepage/config/` | MK7 | < 10 MB |
| **Vaultwarden** | `/opt/iron-legion/vaultwarden/data/` | Bones (encrypted) | < 500 MB | | **Vaultwarden** | `/opt/iron-legion/vaultwarden/data/` | MK7 (encrypted) | < 500 MB |
| **Authelia** | `/opt/iron-legion/authelia/config/` | Bones | < 10 MB | | **Authelia** | `/opt/iron-legion/authelia/config/` | MK7 | < 10 MB |
| **Jellyfin** | `/opt/iron-legion/jellyfin/config/` `/opt/iron-legion/jellyfin/media/` | **None** (media too large) | < 1 GB config; media drive separate | | **Jellyfin** | `/opt/iron-legion/jellyfin/config/` `/opt/iron-legion/jellyfin/media/` | **None** (media too large) | < 1 GB config; media drive separate |
| **Sonarr** | `/opt/iron-legion/sonarr/config/` | Bones | < 1 GB | | **Sonarr** | `/opt/iron-legion/sonarr/config/` | MK7 | < 1 GB |
| **Radarr** | `/opt/iron-legion/radarr/config/` | Bones | < 1 GB | | **Radarr** | `/opt/iron-legion/radarr/config/` | MK7 | < 1 GB |
| **Prowlarr** | `/opt/iron-legion/prowlarr/config/` | Bones | < 100 MB | | **Prowlarr** | `/opt/iron-legion/prowlarr/config/` | MK7 | < 100 MB |
| **Nextcloud** | `/opt/iron-legion/nextcloud/data/` | Bones (snapshots) | 1050 GB | | **Nextcloud** | `/opt/iron-legion/nextcloud/data/` | MK7 (snapshots) | 1050 GB |
## Media Storage Exception ## Media Storage Exception
- **Jellyfin media** lives on a separate mount (likely external USB/NVMe on Mark44). Not backed up via rsync. - **Jellyfin media** lives on a separate mount (likely external USB/NVMe on MK7). Not backed up via rsync.
- **Sonarr/Radarr** download staging to a shared `/downloads` bind mount, then hardlink/copy to Jellyfin media library. - **Sonarr/Radarr** download staging to a shared `/downloads` bind mount, then hardlink/copy to Jellyfin media library.
## Backup Tooling ## Backup Tooling
- **Primary:** `rsync -a --delete` to Bones secondary storage daily at 03:00 local. - **Primary:** `rsync -a --delete` to MK7 secondary storage daily at 03:00 local.
- **Vaultwarden:** `rsqlite3` dump + `rsync` (encrypted at rest on Bones). - **Vaultwarden:** `rsqlite3` dump + `rsync` (encrypted at rest on MK7).
- **Prometheus:** `snapshot API` → rsync (not raw WAL files). - **Prometheus:** `snapshot API` → rsync (not raw WAL files).
## Secret Management ## Secret Management

View File

@@ -10,9 +10,9 @@
| **OS Auth** | SSH keys | Node access | Tailscale SSH + local keypairs | | **OS Auth** | SSH keys | Node access | Tailscale SSH + local keypairs |
## Authelia Deployment Notes ## Authelia Deployment Notes
- **Target node:** Mark5 (lightweight, sits beside Traefik) - **Target node:** MK7 (lightweight, sits beside Traefik)
- **Redirection URL:** Set Authelia `redirection_url` to the base domain of services needing auth. - **Redirection URL:** Set Authelia `redirection_url` to the base domain of services needing auth.
- **Backend storage:** Uses SQLite initially. If Bobby wants HA, migrate to PostgreSQL on Bones. - **Backend storage:** Uses SQLite initially. If Bobby wants HA, migrate to PostgreSQL on MK7.
- **Notification method:** File-based (writes to `/opt/iron-legion/authelia/notifications/`) until SMTP/Discord is configured. - **Notification method:** File-based (writes to `/opt/iron-legion/authelia/notifications/`) until SMTP/Discord is configured.
- **Rule granularity:** Per-service `access_control` rules in `configuration.yml`. Default: `one_factor` for internal services, `two_factor` for management interfaces (Portainer, Grafana admin). - **Rule granularity:** Per-service `access_control` rules in `configuration.yml`. Default: `one_factor` for internal services, `two_factor` for management interfaces (Portainer, Grafana admin).
@@ -38,8 +38,8 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
- **No VLANs.** Tailscale ACLs handle segment isolation. - **No VLANs.** Tailscale ACLs handle segment isolation.
- **ACL policy (draft):** - **ACL policy (draft):**
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes - `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
- `tag:services` (Neo, Bones, Mark44, Mark5) → only their assigned service ports, no cross-node SSH except via Tailscale SSH - `tag:services` (MK7, MK7, MK7, MK7) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on Mark5 only, Jellyfin 8096 on Mark44 directly - `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped. - **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
## Monitoring for Security Events ## Monitoring for Security Events

View File

@@ -5,14 +5,14 @@
| Order | Service | Target Node | Why First | Dependencies | | Order | Service | Target Node | Why First | Dependencies |
|-------|---------|-------------|-----------|--------------| |-------|---------|-------------|-----------|--------------|
| 1 | **Technitium DNS** | Bones | Name resolution for internal services | None | | 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
| 2 | **Pi-hole** | Bones | Recursive DNS + ad-block | Technitium (via conditional forwarding) | | 2 | **Pi-hole** | MK7 | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
| 3 | **Traefik** | Mark5 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) | | 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
| 4 | **Authelia** | Mark5 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) | | 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
| 5 | **Portainer** | Neo | Container management UI | Traefik + Authelia (for secured access) | | 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
| 6 | **Prometheus** | Mark44 | Metrics collection baseline | None (scrape targets added in Phase 2) | | 6 | **Prometheus** | MK7 | Metrics collection baseline | None (scrape targets added in Phase 2) |
| 7 | **Beszel** | Mark44 | Fleet resource overview | None (agents installed per-node) | | 7 | **Beszel** | MK7 | Fleet resource overview | None (agents installed per-node) |
| 8 | **Dozzle** | Mark44 | Real-time log viewing | None | | 8 | **Dozzle** | MK7 | Real-time log viewing | None |
**Phase 1 milestone:** All nodes report healthy in Beszel. Portainer accessible via auth portal. DNS resolves. **Phase 1 milestone:** All nodes report healthy in Beszel. Portainer accessible via auth portal. DNS resolves.
@@ -23,12 +23,12 @@
| Order | Service | Target Node | Why Now | Dependencies | | Order | Service | Target Node | Why Now | Dependencies |
|-------|---------|-------------|---------|--------------| |-------|---------|-------------|---------|--------------|
| 9 | **Jellyfin** | Mark44 | Media playback (GPU transcode if Mark44 has dGPU) | None (file ingest later) | | 9 | **Jellyfin** | MK7 | Media playback (GPU transcode if MK7 has dGPU) | None (file ingest later) |
| 10 | **Sonarr** | Mark44 | TV management | Jellyfin (pushes organized files) | | 10 | **Sonarr** | MK7 | TV management | Jellyfin (pushes organized files) |
| 11 | **Radarr** | Mark44 | Movie management | Jellyfin (pushes organized files) | | 11 | **Radarr** | MK7 | Movie management | Jellyfin (pushes organized files) |
| 12 | **Prowlarr** | Mark44 | Indexer aggregation | Sonarr + Radarr (feeds them) | | 12 | **Prowlarr** | MK7 | Indexer aggregation | Sonarr + Radarr (feeds them) |
| 13 | **Nextcloud** | Neo | File sync/collaboration | PostgreSQL (on Bones) | | 13 | **Nextcloud** | MK7 | File sync/collaboration | PostgreSQL (on MK7) |
| 14 | **Vaultwarden** | Neo | Password management | None (standalone) | | 14 | **Vaultwarden** | MK7 | Password management | None (standalone) |
**Phase 2 milestone:** Media acquisition pipeline works end-to-end. Nextcloud syncs. Vaultwarden stores secrets. **Phase 2 milestone:** Media acquisition pipeline works end-to-end. Nextcloud syncs. Vaultwarden stores secrets.
@@ -39,8 +39,8 @@
| Order | Service | Target Node | Why Deferred | Dependencies | | Order | Service | Target Node | Why Deferred | Dependencies |
|-------|---------|-------------|--------------|--------------| |-------|---------|-------------|--------------|--------------|
| 15 | **Grafana** | Mark44 | Dashboards need metrics to be interesting | Prometheus (needs data history) | | 15 | **Grafana** | MK7 | Dashboards need metrics to be interesting | Prometheus (needs data history) |
| 16 | **Homepage** | Mark5 | Custom dashboard for everything | All Phase 1+2 services (needs endpoints) | | 16 | **Homepage** | MK7 | Custom dashboard for everything | All Phase 1+2 services (needs endpoints) |
| | **Promtail + Loki** | TBD | Centralized logging | Only if Dozzle is insufficient | | | **Promtail + Loki** | TBD | Centralized logging | Only if Dozzle is insufficient |
| | **Uptime-Kuma** | TBD | External uptime monitoring | Only if Beszel alerting is insufficient | | | **Uptime-Kuma** | TBD | External uptime monitoring | Only if Beszel alerting is insufficient |

View File

@@ -5,11 +5,11 @@
|---|----------|--------|----------------------| |---|----------|--------|----------------------|
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA | | 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` | | 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
| 3 | **Pi-hole vs Technitium conflict** — Both run on Bones port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium | | 3 | **Pi-hole vs Technitium conflict** — Both run on MK7 port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium |
| 4 | **Jellyfin media storage** — External USB on Mark44? SMB share? NVMe? | Medium | External USB mounted at `/media` on Mark44 | | 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
| 5 | **Backup target on Bones** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on Bones secondary storage | | 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
| 6 | **Nextcloud database** — Use existing PostgreSQL on Bones, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on Bones | Deploy standalone PostgreSQL container on Bones for Nextcloud AIO is too heavy | | 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
| 7 | **GPU on Mark44** — NVIDIA driver runtime for Jellyfin transcode? | Low — falls back to CPU transcode | Use `jellyfin/jellyfin` with `NVIDIA_VISIBLE_DEVICES` env if available | | 7 | **GPU on MK7** — NVIDIA driver runtime for Jellyfin transcode? | Low — falls back to CPU transcode | Use `jellyfin/jellyfin` with `NVIDIA_VISIBLE_DEVICES` env if available |
| 8 | **Notification routing** — Discord webhook? SMTP? File only? | Low — default file works | File notifications in `/opt/iron-legion/authelia/notifications/` | | 8 | **Notification routing** — Discord webhook? SMTP? File only? | Low — default file works | File notifications in `/opt/iron-legion/authelia/notifications/` |
| 9 | **Tailscale ACL policy** — Draft exists in Section 7. Bobby must review and apply in Tailscale admin console. | Low | Stay permissive until Bobby approves | | 9 | **Tailscale ACL policy** — Draft exists in Section 7. Bobby must review and apply in Tailscale admin console. | Low | Stay permissive until Bobby approves |
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container | | 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |

View File

@@ -1,3 +1,6 @@
---
# Iron Legion Homelab Services Stack — Purpose & Scope # Iron Legion Homelab Services Stack — Purpose & Scope
## Document ID ## Document ID
@@ -69,18 +72,18 @@ This PRD is append-only for new services. Modifications to existing entries requ
5. **Patch upstream source** when loopback/bind restrictions block direct deployment. Do not re-architect around the constraint. 5. **Patch upstream source** when loopback/bind restrictions block direct deployment. Do not re-architect around the constraint.
## Node Assignment Policy (as of 2026-05-25) ## Node Assignment Policy (as of 2026-05-25)
**The G9 Swarm Cluster is the ONLY deployment target.** Mark5, Bones, Neo, and Mark44 are NOT part of this homelab services stack.
| Node | Role | Services Assigned | | Node | Role | Services Assigned |
|------|------|-------------------| |------|------|-------------------|
| **Neo** | Services node | Nextcloud AIO, Vaultwarden, Portainer (UI/mgmt) | | **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, AdGuard Home, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
| **Bones** | Infrastructure node | Paperclip + Ollama + PostgreSQL, Technitium DNS (infra DNS) | | **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
| **Mark44 (Hulkbuster)** | Heavy-lifting / GPU | Monitoring stack (Prometheus, Grafana, Beszel), media apps with transcode (Jellyfin) | | **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
| **Mark5 (Suitcase)** | Research / light-task | Traefik (edge router — lightweight, always-on), Homepage (lightweight dashboard) |
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane |
## Soft Constraints (Bobby Approval Required to Override) ## Soft Constraints (Bobby Approval Required to Override)
- **Data residency:** All persistent volumes live on-node. No NFS, no Ceph, no distributed storage unless explicitly approved. - **Data residency:** All persistent volumes live on-node. No NFS, no Ceph, no distributed storage unless explicitly approved.
- **Secret management:** No plain-text secrets in compose files. Use `.env` files with `file:` mode 0600, or Vaultwarden if a secret store is needed. - **Secret management:** No plain-text secrets in compose files. Use `.env` files with `file:` mode 0600, or Vaultwarden if a secret store is needed.
- **Backup cadence:** Every service with persistent state must have a documented backup target. Default: daily rsync to Bones secondary storage. - **Backup cadence:** Every service with persistent state must have a documented backup target. Default: daily rsync to MK7 secondary storage.
## Environment Assumptions ## Environment Assumptions
- All nodes run Debian Trixie or compatible. - All nodes run Debian Trixie or compatible.
@@ -97,42 +100,42 @@ This PRD is append-only for new services. Modifications to existing entries requ
### Network Layer ### Network Layer
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Traefik** | `traefik` | `library` | Cloud Native Edge Router | 3.49B | 3,634 | 2026-05-13 | Mark5 | | **Traefik** | `traefik` | `library` | Cloud Native Edge Router | 3.49B | 3,634 | 2026-05-13 | MK7 |
| **Technitium DNS** | `technitium/dns-server` | `technitium` | Self-hosted DNS server with DoH/DoT | 8.99M | 156 | 2026-05-09 | Bones | | **Technitium DNS** | `technitium/dns-server` | `technitium` | Self-hosted DNS server with DoH/DoT | 8.99M | 156 | 2026-05-09 | MK7 |
| **Pi-hole** | `pihole/pihole` | `pihole` | Network-wide ad blocking | 961.2M | 2,943 | 2026-05-25 | Bones | | **AdGuard Home** | `adguard/adguardhome` | `adguard` | Network-wide ad blocking DNS server | 170.7M | 1,408 | 2026-05-25 | MK7 |
### Monitoring / Observability ### Monitoring / Observability
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Prometheus** | `prom/prometheus` | `prom` | Systems monitoring & alerting toolkit | 1.97B | 2,064 | 2026-05-25 | Mark44 | | **Prometheus** | `prom/prometheus` | `prom` | Systems monitoring & alerting toolkit | 1.97B | 2,064 | 2026-05-25 | MK7 |
| **Grafana** | `grafana/grafana` | `grafana` | Analytics & monitoring dashboards | 5.22B | 3,540 | 2026-05-16 | Mark44 | | **Grafana** | `grafana/grafana` | `grafana` | Analytics & monitoring dashboards | 5.22B | 3,540 | 2026-05-16 | MK7 |
| **Beszel** | `henrygd/beszel` | `henrygd` | Lightweight server monitoring hub with Docker stats | 12.58M | 32 | 2026-04-30 | Mark44 | | **Beszel** | `henrygd/beszel` | `henrygd` | Lightweight server monitoring hub with Docker stats | 12.58M | 32 | 2026-04-30 | MK7 |
| **Dozzle** | `amir20/dozzle` | `amir20` | Real-time Docker container log viewer | 309.6M | 144 | 2026-05-25 | Mark44 | | **Dozzle** | `amir20/dozzle` | `amir20` | Real-time Docker container log viewer | 309.6M | 144 | 2026-05-25 | MK7 |
### Management / Dashboard ### Management / Dashboard
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Portainer CE** | `portainer/portainer-ce` | `portainer` | Lightweight container management UI | 1.46B | 2,665 | 2026-05-20 | Neo | | **Portainer CE** | `portainer/portainer-ce` | `portainer` | Lightweight container management UI | 1.46B | 2,665 | 2026-05-20 | MK7 (Phase 2 swarm) |
| **Homepage** | `gethomepage/homepage` | `gethomepage` | Customizable homepage with integrations | 1.31M | 40 | 2026-05-25 | Mark5 | | **Homepage** | `gethomepage/homepage` | `gethomepage` | Customizable homepage with integrations | 1.31M | 40 | 2026-05-25 | MK7 |
### Security / Identity ### Security / Identity
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Vaultwarden** | `vaultwarden/server` | `vaultwarden` | Bitwarden-compatible password manager (Rust) | 287.2M | 1,454 | 2026-05-17 | Neo | | **Vaultwarden** | `vaultwarden/server` | `vaultwarden` | Bitwarden-compatible password manager (Rust) | 287.2M | 1,454 | 2026-05-17 | MK7 (Phase 2 swarm) |
| **Authelia** | `authelia/authelia` | `authelia` | Multi-factor authentication portal | 75.2M | 208 | 2026-05-25 | Mark5 | | **Authelia** | `authelia/authelia` | `authelia` | Multi-factor authentication portal | 75.2M | 208 | 2026-05-25 | MK7 |
### Media Stack (*arr + Jellyfin) ### Media Stack (*arr + Jellyfin)
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Jellyfin** | `jellyfin/jellyfin` | `jellyfin` | Free software media browser | 370.4M | 1,535 | 2026-05-25 | Mark44 | | **Jellyfin** | `jellyfin/jellyfin` | `jellyfin` | Free software media browser | 370.4M | 1,535 | 2026-05-25 | MK7 |
| **Sonarr** | `linuxserver/sonarr` | `linuxserver` | TV series management | 2.34B | 2,118 | 2026-05-23 | Mark44 | | **Sonarr** | `linuxserver/sonarr` | `linuxserver` | TV series management | 2.34B | 2,118 | 2026-05-23 | MK7 |
| **Radarr** | `linuxserver/radarr` | `linuxserver` | Movie management | 2.36B | 1,791 | 2026-05-25 | Mark44 | | **Radarr** | `linuxserver/radarr` | `linuxserver` | Movie management | 2.36B | 1,791 | 2026-05-25 | MK7 |
| **Prowlarr** | `linuxserver/prowlarr` | `linuxserver` | Indexer management | 35.9M | 403 | 2026-05-25 | Mark44 | | **Prowlarr** | `linuxserver/prowlarr` | `linuxserver` | Indexer management | 35.9M | 403 | 2026-05-25 | MK7 |
### File / Collaboration ### File / Collaboration
| Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node | | Service | Image | Namespace | Description | Pulls | Stars | Updated | Target Node |
|---------|-------|-----------|-------------|-------|-------|---------|-------------| |---------|-------|-----------|-------------|-------|-------|---------|-------------|
| **Nextcloud** | `nextcloud` | `library` | Self-hosted file sync & collaboration | 1.01B | 4,485 | 2026-05-23 | Neo | | **Nextcloud** | `nextcloud` | `library` | Self-hosted file sync & collaboration | 1.01B | 4,485 | 2026-05-23 | MK7 (Phase 2 swarm) |
## Total Services: 15 ## Total Services: 15
## Total DockerHub Pulls (aggregate): ~16.0B ## Total DockerHub Pulls (aggregate): ~16.0B
@@ -149,30 +152,31 @@ This PRD is append-only for new services. Modifications to existing entries requ
## Ingress Flow ## Ingress Flow
``` ```
[Internet] → [Tailscale mesh] → [Mark5: Traefik] → [Target Node: Service Port] [Internet] → [Tailscale mesh] → [MK7: Traefik] → [Target Node: Service Port]
``` ```
## Traefik Role ## Traefik Role
- **Single entrypoint.** Every HTTP/HTTPS service routes through Traefik on Mark5. - **Single entrypoint.** Every HTTP/HTTPS service routes through Traefik on MK7.
- **Tailscale-native.** Traefik binds to `0.0.0.0:80` and `0.0.0.0:443`. No `tailscale serve`. - **Tailscale-native.** Traefik binds to `0.0.0.0:80` and `0.0.0.0:443`. No `tailscale serve`.
- **Service discovery via Docker labels.** Each compose service exposes labels that Traefik reads from the Docker socket on Mark5. - **Service discovery via Docker labels.** Each compose service exposes labels that Traefik reads from the Docker socket on MK7.
- **Docker socket access restricted.** Traefik mounts a read-only Docker socket. No other service gets socket access. - **Docker socket access restricted.** Traefik mounts a read-only Docker socket. No other service gets socket access.
## Internal Traffic Patterns ## Internal Traffic Patterns
| Source | Destination | Protocol | Port | Notes | | Source | Destination | Protocol | Port | Notes |
|--------|-------------|----------|------|-------| |--------|-------------|----------|------|-------|
| Traefik (Mark5) | Any service | HTTP/HTTPS | Varies | Proxied via Tailscale IP | | Traefik (MK7) | Any service | HTTP/HTTPS | Varies | Proxied via Tailscale IP |
| Beszel (Mark44) | Any node | HTTP | Varies | Agent polls HTTP metrics endpoints (read-only) | | Beszel (MK7) | Any node | HTTP | Varies | Agent polls HTTP metrics endpoints (read-only) |
| Prometheus (Mark44) | Any node | HTTP | 9100 (node-exporter) | Scrapes node and container metrics | | Prometheus (MK7) | Any node | HTTP | 9100 (node-exporter) | Scrapes node and container metrics |
| Prowlarr (Mark44) | Indexer sites | HTTPS | 443 | Outbound only | | Prowlarr (MK7) | Indexer sites | HTTPS | 443 | Outbound only |
| Sonarr/Radarr (Mark44) | Prowlarr | HTTP | 9696 | Internal indexer lookup | | Sonarr/Radarr (MK7) | Prowlarr | HTTP | 9696 | Internal indexer lookup |
| Nextcloud (Neo) | PostgreSQL (Bones) | TCP | 5432 | DB traffic over Tailscale | | Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
## DNS Resolution ## DNS Resolution
- **Technitium (Bones)** is the authoritative internal DNS for `*.labs.internal`. - **Technitium (MK7)** is the authoritative internal DNS for `*.ai.home`.
- **Pi-hole (Bones)** handles recursive resolution with ad-block lists. - **AdGuard Home (MK7)** handles recursive resolution with ad-block lists. Replaces Pi-hole.
- **Chain:** Client → Technitium (local record?) → Pi-hole (recursive + blocklist) → Upstream (Cloudflare/Quad9) - **Chain:** Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
- **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution. - **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution.
- **AdGuard Home admin UI** runs on port 3000 by default (separate from Grafana if co-located).
## Port Allocation (Reserved) ## Port Allocation (Reserved)
| Port | Service | | Port | Service |
@@ -211,29 +215,29 @@ Every service with persistent state uses **bind mounts to on-node directories**.
## Per-Service Persistence ## Per-Service Persistence
| Service | Data Path | Backup Target | Size Estimate | | Service | Data Path | Backup Target | Size Estimate |
|---------|-----------|---------------|---------------| |---------|-----------|---------------|---------------|
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | Bones (daily rsync) | < 50 MB | | **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | Bones | < 10 MB | | **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
| **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | Bones | < 500 MB | | **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | MK7 | < 500 MB |
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | Bones (retention: 15d local, 90d backup) | 520 GB | | **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 520 GB |
| **Grafana** | `/opt/iron-legion/grafana/data/` | Bones | < 500 MB | | **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
| **Beszel** | `/opt/iron-legion/beszel/data/` | Bones | < 1 GB | | **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
| **Portainer** | `/opt/iron-legion/portainer/data/` | Bones | < 100 MB | | **Portainer** | `/opt/iron-legion/portainer/data/` | MK7 | < 100 MB |
| **Homepage** | `/opt/iron-legion/homepage/config/` | Bones | < 10 MB | | **Homepage** | `/opt/iron-legion/homepage/config/` | MK7 | < 10 MB |
| **Vaultwarden** | `/opt/iron-legion/vaultwarden/data/` | Bones (encrypted) | < 500 MB | | **Vaultwarden** | `/opt/iron-legion/vaultwarden/data/` | MK7 (encrypted) | < 500 MB |
| **Authelia** | `/opt/iron-legion/authelia/config/` | Bones | < 10 MB | | **Authelia** | `/opt/iron-legion/authelia/config/` | MK7 | < 10 MB |
| **Jellyfin** | `/opt/iron-legion/jellyfin/config/` `/opt/iron-legion/jellyfin/media/` | **None** (media too large) | < 1 GB config; media drive separate | | **Jellyfin** | `/opt/iron-legion/jellyfin/config/` `/opt/iron-legion/jellyfin/media/` | **None** (media too large) | < 1 GB config; media drive separate |
| **Sonarr** | `/opt/iron-legion/sonarr/config/` | Bones | < 1 GB | | **Sonarr** | `/opt/iron-legion/sonarr/config/` | MK7 | < 1 GB |
| **Radarr** | `/opt/iron-legion/radarr/config/` | Bones | < 1 GB | | **Radarr** | `/opt/iron-legion/radarr/config/` | MK7 | < 1 GB |
| **Prowlarr** | `/opt/iron-legion/prowlarr/config/` | Bones | < 100 MB | | **Prowlarr** | `/opt/iron-legion/prowlarr/config/` | MK7 | < 100 MB |
| **Nextcloud** | `/opt/iron-legion/nextcloud/data/` | Bones (snapshots) | 1050 GB | | **Nextcloud** | `/opt/iron-legion/nextcloud/data/` | MK7 (snapshots) | 1050 GB |
## Media Storage Exception ## Media Storage Exception
- **Jellyfin media** lives on a separate mount (likely external USB/NVMe on Mark44). Not backed up via rsync. - **Jellyfin media** lives on a separate mount (likely external USB/NVMe on MK7). Not backed up via rsync.
- **Sonarr/Radarr** download staging to a shared `/downloads` bind mount, then hardlink/copy to Jellyfin media library. - **Sonarr/Radarr** download staging to a shared `/downloads` bind mount, then hardlink/copy to Jellyfin media library.
## Backup Tooling ## Backup Tooling
- **Primary:** `rsync -a --delete` to Bones secondary storage daily at 03:00 local. - **Primary:** `rsync -a --delete` to MK7 secondary storage daily at 03:00 local.
- **Vaultwarden:** `rsqlite3` dump + `rsync` (encrypted at rest on Bones). - **Vaultwarden:** `rsqlite3` dump + `rsync` (encrypted at rest on MK7).
- **Prometheus:** `snapshot API` → rsync (not raw WAL files). - **Prometheus:** `snapshot API` → rsync (not raw WAL files).
## Secret Management ## Secret Management
@@ -255,9 +259,9 @@ Every service with persistent state uses **bind mounts to on-node directories**.
| **OS Auth** | SSH keys | Node access | Tailscale SSH + local keypairs | | **OS Auth** | SSH keys | Node access | Tailscale SSH + local keypairs |
## Authelia Deployment Notes ## Authelia Deployment Notes
- **Target node:** Mark5 (lightweight, sits beside Traefik) - **Target node:** MK7 (lightweight, sits beside Traefik)
- **Redirection URL:** Set Authelia `redirection_url` to the base domain of services needing auth. - **Redirection URL:** Set Authelia `redirection_url` to the base domain of services needing auth.
- **Backend storage:** Uses SQLite initially. If Bobby wants HA, migrate to PostgreSQL on Bones. - **Backend storage:** Uses SQLite initially. If Bobby wants HA, migrate to PostgreSQL on MK7.
- **Notification method:** File-based (writes to `/opt/iron-legion/authelia/notifications/`) until SMTP/Discord is configured. - **Notification method:** File-based (writes to `/opt/iron-legion/authelia/notifications/`) until SMTP/Discord is configured.
- **Rule granularity:** Per-service `access_control` rules in `configuration.yml`. Default: `one_factor` for internal services, `two_factor` for management interfaces (Portainer, Grafana admin). - **Rule granularity:** Per-service `access_control` rules in `configuration.yml`. Default: `one_factor` for internal services, `two_factor` for management interfaces (Portainer, Grafana admin).
@@ -283,8 +287,8 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
- **No VLANs.** Tailscale ACLs handle segment isolation. - **No VLANs.** Tailscale ACLs handle segment isolation.
- **ACL policy (draft):** - **ACL policy (draft):**
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes - `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
- `tag:services` (Neo, Bones, Mark44, Mark5) → only their assigned service ports, no cross-node SSH except via Tailscale SSH - `tag:services` (MK7, MK7, MK7, MK7) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on Mark5 only, Jellyfin 8096 on Mark44 directly - `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped. - **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
## Monitoring for Security Events ## Monitoring for Security Events
@@ -301,14 +305,14 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
| Order | Service | Target Node | Why First | Dependencies | | Order | Service | Target Node | Why First | Dependencies |
|-------|---------|-------------|-----------|--------------| |-------|---------|-------------|-----------|--------------|
| 1 | **Technitium DNS** | Bones | Name resolution for internal services | None | | 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
| 2 | **Pi-hole** | Bones | Recursive DNS + ad-block | Technitium (via conditional forwarding) | | 2 | **Pi-hole** | MK7 | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
| 3 | **Traefik** | Mark5 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) | | 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
| 4 | **Authelia** | Mark5 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) | | 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
| 5 | **Portainer** | Neo | Container management UI | Traefik + Authelia (for secured access) | | 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
| 6 | **Prometheus** | Mark44 | Metrics collection baseline | None (scrape targets added in Phase 2) | | 6 | **Prometheus** | MK7 | Metrics collection baseline | None (scrape targets added in Phase 2) |
| 7 | **Beszel** | Mark44 | Fleet resource overview | None (agents installed per-node) | | 7 | **Beszel** | MK7 | Fleet resource overview | None (agents installed per-node) |
| 8 | **Dozzle** | Mark44 | Real-time log viewing | None | | 8 | **Dozzle** | MK7 | Real-time log viewing | None |
**Phase 1 milestone:** All nodes report healthy in Beszel. Portainer accessible via auth portal. DNS resolves. **Phase 1 milestone:** All nodes report healthy in Beszel. Portainer accessible via auth portal. DNS resolves.
@@ -319,12 +323,12 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
| Order | Service | Target Node | Why Now | Dependencies | | Order | Service | Target Node | Why Now | Dependencies |
|-------|---------|-------------|---------|--------------| |-------|---------|-------------|---------|--------------|
| 9 | **Jellyfin** | Mark44 | Media playback (GPU transcode if Mark44 has dGPU) | None (file ingest later) | | 9 | **Jellyfin** | MK7 | Media playback (GPU transcode if MK7 has dGPU) | None (file ingest later) |
| 10 | **Sonarr** | Mark44 | TV management | Jellyfin (pushes organized files) | | 10 | **Sonarr** | MK7 | TV management | Jellyfin (pushes organized files) |
| 11 | **Radarr** | Mark44 | Movie management | Jellyfin (pushes organized files) | | 11 | **Radarr** | MK7 | Movie management | Jellyfin (pushes organized files) |
| 12 | **Prowlarr** | Mark44 | Indexer aggregation | Sonarr + Radarr (feeds them) | | 12 | **Prowlarr** | MK7 | Indexer aggregation | Sonarr + Radarr (feeds them) |
| 13 | **Nextcloud** | Neo | File sync/collaboration | PostgreSQL (on Bones) | | 13 | **Nextcloud** | MK7 | File sync/collaboration | PostgreSQL (on MK7) |
| 14 | **Vaultwarden** | Neo | Password management | None (standalone) | | 14 | **Vaultwarden** | MK7 | Password management | None (standalone) |
**Phase 2 milestone:** Media acquisition pipeline works end-to-end. Nextcloud syncs. Vaultwarden stores secrets. **Phase 2 milestone:** Media acquisition pipeline works end-to-end. Nextcloud syncs. Vaultwarden stores secrets.
@@ -335,8 +339,8 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
| Order | Service | Target Node | Why Deferred | Dependencies | | Order | Service | Target Node | Why Deferred | Dependencies |
|-------|---------|-------------|--------------|--------------| |-------|---------|-------------|--------------|--------------|
| 15 | **Grafana** | Mark44 | Dashboards need metrics to be interesting | Prometheus (needs data history) | | 15 | **Grafana** | MK7 | Dashboards need metrics to be interesting | Prometheus (needs data history) |
| 16 | **Homepage** | Mark5 | Custom dashboard for everything | All Phase 1+2 services (needs endpoints) | | 16 | **Homepage** | MK7 | Custom dashboard for everything | All Phase 1+2 services (needs endpoints) |
| | **Promtail + Loki** | TBD | Centralized logging | Only if Dozzle is insufficient | | | **Promtail + Loki** | TBD | Centralized logging | Only if Dozzle is insufficient |
| | **Uptime-Kuma** | TBD | External uptime monitoring | Only if Beszel alerting is insufficient | | | **Uptime-Kuma** | TBD | External uptime monitoring | Only if Beszel alerting is insufficient |
@@ -356,11 +360,11 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|---|----------|--------|----------------------| |---|----------|--------|----------------------|
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA | | 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` | | 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
| 3 | **Pi-hole vs Technitium conflict** — Both run on Bones port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium | | 3 | **Pi-hole vs Technitium conflict** — Both run on MK7 port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium |
| 4 | **Jellyfin media storage** — External USB on Mark44? SMB share? NVMe? | Medium | External USB mounted at `/media` on Mark44 | | 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
| 5 | **Backup target on Bones** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on Bones secondary storage | | 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
| 6 | **Nextcloud database** — Use existing PostgreSQL on Bones, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on Bones | Deploy standalone PostgreSQL container on Bones for Nextcloud AIO is too heavy | | 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
| 7 | **GPU on Mark44** — NVIDIA driver runtime for Jellyfin transcode? | Low — falls back to CPU transcode | Use `jellyfin/jellyfin` with `NVIDIA_VISIBLE_DEVICES` env if available | | 7 | **GPU on MK7** — NVIDIA driver runtime for Jellyfin transcode? | Low — falls back to CPU transcode | Use `jellyfin/jellyfin` with `NVIDIA_VISIBLE_DEVICES` env if available |
| 8 | **Notification routing** — Discord webhook? SMTP? File only? | Low — default file works | File notifications in `/opt/iron-legion/authelia/notifications/` | | 8 | **Notification routing** — Discord webhook? SMTP? File only? | Low — default file works | File notifications in `/opt/iron-legion/authelia/notifications/` |
| 9 | **Tailscale ACL policy** — Draft exists in Section 7. Bobby must review and apply in Tailscale admin console. | Low | Stay permissive until Bobby approves | | 9 | **Tailscale ACL policy** — Draft exists in Section 7. Bobby must review and apply in Tailscale admin console. | Low | Stay permissive until Bobby approves |
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container | | 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |