Compare commits
10 Commits
e6cfa11ca6
...
4af50ec883
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4af50ec883 | ||
| 484b2e6272 | |||
| a7e70726eb | |||
| ba2b3dba82 | |||
| f18b978602 | |||
| 32570cb40d | |||
| b7cc09cca2 | |||
| fae739f3fa | |||
| a3fc718a34 | |||
| 26c66590d1 |
@@ -12,7 +12,7 @@
|
|||||||
|
|
||||||
| Node | Role | Services Assigned |
|
| Node | Role | Services Assigned |
|
||||||
|------|------|-------------------|
|
|------|------|-------------------|
|
||||||
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, AdGuard Home, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
||||||
| **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
|
| **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
|
||||||
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
|
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
|
||||||
|
|
||||||
|
|||||||
@@ -21,8 +21,8 @@
|
|||||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||||
|---------|-------|-------|-------|---------|-----------|-------|
|
|---------|-------|-------|-------|---------|-----------|-------|
|
||||||
| **Traefik** | `traefik` | 3.49B | 3,634 | 2026-05-13 | **Global** | Every node receives ingress routing + Docker socket read-only |
|
| **Traefik** | `traefik` | 3.49B | 3,634 | 2026-05-13 | **Global** | Every node receives ingress routing + Docker socket read-only |
|
||||||
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Single authoritative DNS — port 53 on MK7 only |
|
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Authoritative `.ai.home` + recursive with DoT to Cloudflare, ad blocking — port 53 on MK7 only |
|
||||||
| **AdGuard Home** | `adguard/adguardhome` | 170.7M | 1,408 | 2026-05-25 | **Replicated (2)** | 2 replicas across workers for redundancy — port 3000 |
|
| **~~AdGuard Home~~** | ~~`adguard/adguardhome`~~ | ~~170.7M~~ | ~~1,408~~ | ~~2026-05-25~~ | ~~**Removed**~~ | ~~Technitium built-in ad blocking replaces AdGuard~~ |
|
||||||
|
|
||||||
### Monitoring / Observability
|
### Monitoring / Observability
|
||||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||||
@@ -31,13 +31,14 @@
|
|||||||
| **Prometheus Node Exporter** | `prom/node-exporter` | — | — | — | **Global** | Runs on every node — scrapes CPU/mem/disk |
|
| **Prometheus Node Exporter** | `prom/node-exporter` | — | — | — | **Global** | Runs on every node — scrapes CPU/mem/disk |
|
||||||
| **Grafana** | `grafana/grafana` | 5.22B | 3,540 | 2026-05-16 | **Replicated (1)** | Any worker (Phase 3, needs data history first) |
|
| **Grafana** | `grafana/grafana` | 5.22B | 3,540 | 2026-05-16 | **Replicated (1)** | Any worker (Phase 3, needs data history first) |
|
||||||
| **Beszel Hub** | `henrygd/beszel` | 12.58M | 32 | 2026-04-30 | **Manager Constraint** | Central hub on MK7 collects metrics from agents |
|
| **Beszel Hub** | `henrygd/beszel` | 12.58M | 32 | 2026-04-30 | **Manager Constraint** | Central hub on MK7 collects metrics from agents |
|
||||||
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Global** | Runs on every node — reports to hub |
|
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Pending** | Planned global — reports to hub. Not yet deployed. |
|
||||||
| **Dozzle** | `amir20/dozzle` | 309.6M | 144 | 2026-05-25 | **Replicated (1)** | Any worker — read-only Docker socket |
|
| **Dozzle** | `amir20/dozzle` | 309.6M | 144 | 2026-05-25 | **Replicated (1)** | Any worker — read-only Docker socket |
|
||||||
|
|
||||||
### Management / Dashboard
|
### Management / Dashboard
|
||||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||||
|---------|-------|-------|-------|---------|-----------|-------|
|
|---------|-------|-------|-------|---------|-----------|-------|
|
||||||
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Manager Constraint** | MK7 only — agentless mode, no portainer-agent needed |
|
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Replicated (1)** | MK7 — agentless mode, no portainer-agent needed |
|
||||||
|
| **PegaProx** | `pegaprox/pegaprox` | — | — | — | **Manager Constraint** | MK7 — PVE cluster manager (host mode ports 5000-5002) |
|
||||||
| **Homepage** | `gethomepage/homepage` | 1.31M | 40 | 2026-05-25 | **Replicated (1)** | Any worker — all endpoints via env vars |
|
| **Homepage** | `gethomepage/homepage` | 1.31M | 40 | 2026-05-25 | **Replicated (1)** | Any worker — all endpoints via env vars |
|
||||||
|
|
||||||
### Security / Identity
|
### Security / Identity
|
||||||
@@ -62,6 +63,6 @@
|
|||||||
| **Prowlarr** | `linuxserver/prowlarr` | 35.9M | 403 | 2026-05-25 | **Replicated (1)** | Any worker — feeds Sonarr/Radarr via network |
|
| **Prowlarr** | `linuxserver/prowlarr` | 35.9M | 403 | 2026-05-25 | **Replicated (1)** | Any worker — feeds Sonarr/Radarr via network |
|
||||||
|
|
||||||
## Total Services: 16 (catalog) + 3 (existing external) = 19 total fleet services
|
## Total Services: 16 (catalog) + 3 (existing external) = 19 total fleet services
|
||||||
## Swarm Services: 16 (includes global Beszel agent and node exporter)
|
## Swarm Services: 15 active + 1 pending (Beszel Agent) + 4 Phase 2/3 planned = 16 catalog entries
|
||||||
## Total DockerHub Pulls (aggregate): ~16.0B
|
## Total DockerHub Pulls (aggregate): ~16.0B
|
||||||
## All images updated within 90 days
|
## All images updated within 90 days
|
||||||
|
|||||||
@@ -22,16 +22,27 @@
|
|||||||
| Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
|
| Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
|
||||||
|
|
||||||
## DNS Resolution
|
## DNS Resolution
|
||||||
- **Technitium (MK7)** is the authoritative internal DNS for `*.ai.home`.
|
|
||||||
- **AdGuard Home (MK7)** handles recursive resolution with ad-block lists. Replaces Pi-hole.
|
| Component | Status | Detail |
|
||||||
- **Chain:** Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
|
|-----------|--------|--------|
|
||||||
- **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution.
|
| **Technitium (MK7)** | ✅ Deployed | Container running, port 53/5380 open |
|
||||||
- **AdGuard Home admin UI** runs on port 3000 by default (separate from Grafana if co-located).
|
| **`*.ai.home` zone** | ⏳ Pending | Not yet configured as authoritative — Tailscale MagicDNS currently handles name resolution |
|
||||||
|
| **Technitium DNS (MK7)** | ✅ Active | Authoritative `.ai.home` + recursive resolver + ad blocking on port 53. |
|
||||||
|
| **~~AdGuard Home~~** | ~~Removed~~ | ~~Technitium built-in ad blocking replaces AdGuard~~ |
|
||||||
|
|
||||||
|
**Planned Chain (not yet active):**
|
||||||
|
```
|
||||||
|
Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Current Fallback:** Tailscale MagicDNS provides `*.ai.home` resolution via Tailscale IP addresses. Technitium will assume authority once zone records are populated.
|
||||||
|
|
||||||
|
- **AdGuard Home admin UI** runs on port 3000.
|
||||||
|
|
||||||
## Port Allocation (Reserved)
|
## Port Allocation (Reserved)
|
||||||
| Port | Service |
|
| Port | Service |
|
||||||
|------|---------|
|
|------|---------|
|
||||||
| 53 | DNS (Technitium / Pi-hole) |
|
| 53 | DNS (Technitium / AdGuard) |
|
||||||
| 80/443 | HTTP/S (Traefik) |
|
| 80/443 | HTTP/S (Traefik) |
|
||||||
| 3000 | Grafana |
|
| 3000 | Grafana |
|
||||||
| 9090 | Prometheus |
|
| 9090 | Prometheus |
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ Every service with persistent state uses **bind mounts to on-node directories**.
|
|||||||
|---------|-----------|---------------|---------------|
|
|---------|-----------|---------------|---------------|
|
||||||
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
|
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
|
||||||
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
|
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
|
||||||
| **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | MK7 | < 500 MB |
|
| **~~AdGuard Home~~** | ~~`/opt/iron-legion/adguard/work/`~~ ~~`/opt/iron-legion/adguard/conf/`~~ | ~~Removed~~ | ~~N/A~~ |
|
||||||
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 5–20 GB |
|
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 5–20 GB |
|
||||||
| **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
|
| **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
|
||||||
| **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
|
| **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
|
||||||
|
|||||||
@@ -38,7 +38,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
|||||||
- **No VLANs.** Tailscale ACLs handle segment isolation.
|
- **No VLANs.** Tailscale ACLs handle segment isolation.
|
||||||
- **ACL policy (draft):**
|
- **ACL policy (draft):**
|
||||||
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
|
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
|
||||||
- `tag:services` (MK7, MK7, MK7, MK7) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
- `tag:services` (MK7 manager + MK33, MK34, MK39, MK42 workers) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
||||||
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
|
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
|
||||||
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
|
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
|
||||||
|
|
||||||
|
|||||||
@@ -6,7 +6,8 @@
|
|||||||
| Order | Service | Target Node | Why First | Dependencies |
|
| Order | Service | Target Node | Why First | Dependencies |
|
||||||
|-------|---------|-------------|-----------|--------------|
|
|-------|---------|-------------|-----------|--------------|
|
||||||
| 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
|
| 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
|
||||||
| 2 | **Pi-hole** | MK7 | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
|
| 2 | **Technitium DNS** | MK7 | Authoritative + recursive + ad-block | N/A — single service |
|
||||||
|
| ~~AdGuard Home~~ | ~~Removed~~ | ~~Technitium replaces AdGuard~~ |
|
||||||
| 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
|
| 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
|
||||||
| 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
|
| 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
|
||||||
| 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
|
| 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
|
||||||
|
|||||||
@@ -4,8 +4,8 @@
|
|||||||
| # | Question | Impact | Default if Unresolved |
|
| # | Question | Impact | Default if Unresolved |
|
||||||
|---|----------|--------|----------------------|
|
|---|----------|--------|----------------------|
|
||||||
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
|
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
|
||||||
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
|
|| 2 | **~~Technitium upstream~~** | ~~Low~~ | ~~Resolved. DoT to Cloudflare `tls://1.1.1.1`~~ |
|
||||||
| 3 | **Pi-hole vs Technitium conflict** — Both run on MK7 port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium |
|
|| 3 | **~~AdGuard Home vs Technitium layout~~** | ~~Low~~ | ~~**Resolved.** AdGuard removed. Technitium handles authoritative + recursive + ad blocking independently~~ |
|
||||||
| 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
|
| 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
|
||||||
| 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
|
| 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
|
||||||
| 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
|
| 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
|
||||||
@@ -15,6 +15,7 @@
|
|||||||
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |
|
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |
|
||||||
|
|
||||||
## Outstanding Decisions Required
|
## Outstanding Decisions Required
|
||||||
1. **Pi-hole inclusion** — Not in Bobby's original list. I added it as a DNS-layer complement to Technitium. **Remove if Bobby doesn't want it.**
|
|| 18|1. ~~Pi-hole inclusion~~ — **Resolved.** AdGuard Home replaces Pi-hole in Phase 1.
|
||||||
|
|| ~~AdGuard Home~~ — **Resolved.** Removed. Technitium built-in ad blocking replaces it.
|
||||||
2. **Authelia two-factor method** — TOTP via app (Google Authenticator) vs WebAuthn/FIDO2 keys?
|
2. **Authelia two-factor method** — TOTP via app (Google Authenticator) vs WebAuthn/FIDO2 keys?
|
||||||
3. **Home vs remote access** — If Bobby wants to share Jellyfin with friends/family outside Tailscale, public domain + Authelia guard is required.
|
3. **Home vs remote access** — If Bobby wants to share Jellyfin with friends/family outside Tailscale, public domain + Authelia guard is required.
|
||||||
|
|||||||
@@ -18,10 +18,9 @@
|
|||||||
| Prowlarr | `linuxserver/prowlarr` | `linuxserver` | 35,913,487 | 403 | 2026-05-25 | ✅ 200 |
|
| Prowlarr | `linuxserver/prowlarr` | `linuxserver` | 35,913,487 | 403 | 2026-05-25 | ✅ 200 |
|
||||||
| Vaultwarden | `vaultwarden/server` | `vaultwarden` | 287,182,978 | 1,454 | 2026-05-17 | ✅ 200 |
|
| Vaultwarden | `vaultwarden/server` | `vaultwarden` | 287,182,978 | 1,454 | 2026-05-17 | ✅ 200 |
|
||||||
| Nextcloud | `nextcloud` | `library` | 1,011,978,204 | 4,485 | 2026-05-23 | ✅ 200 |
|
| Nextcloud | `nextcloud` | `library` | 1,011,978,204 | 4,485 | 2026-05-23 | ✅ 200 |
|
||||||
| Pi-hole | `pihole/pihole` | `pihole` | 961,220,209 | 2,943 | 2026-05-25 | ✅ 200 |
|
|
||||||
| Authelia | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
| Authelia | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
||||||
|
|
||||||
**Total unique images:** 16 (including Pi-hole)
|
**Total unique images:** 15
|
||||||
**Community health indicator:** All images have > 10 stars, > 1M pulls (except Beszel 32 stars, Homepage 40 stars — acceptable for young projects)
|
**Community health indicator:** All images have > 10 stars, > 1M pulls (except Beszel 32 stars, Homepage 40 stars — acceptable for young projects)
|
||||||
**Freshness:** All updated within 90 days except Beszel (30 days — still acceptable)
|
**Freshness:** All updated within 90 days except Beszel (30 days — still acceptable)
|
||||||
|
|
||||||
@@ -30,7 +29,7 @@
|
|||||||
~/.ansible-repo/new-build/
|
~/.ansible-repo/new-build/
|
||||||
├── phase-1/ # Infrastructure
|
├── phase-1/ # Infrastructure
|
||||||
│ ├── technitium/
|
│ ├── technitium/
|
||||||
│ ├── pihole/
|
│ ├── adguard/
|
||||||
│ ├── traefik/
|
│ ├── traefik/
|
||||||
│ ├── authelia/
|
│ ├── authelia/
|
||||||
│ ├── portainer/
|
│ ├── portainer/
|
||||||
|
|||||||
177
AUDIT_REPORT.md
Normal file
177
AUDIT_REPORT.md
Normal file
@@ -0,0 +1,177 @@
|
|||||||
|
# Hermes CLEAN Audit Report
|
||||||
|
|
||||||
|
**Date:** 2026-05-27
|
||||||
|
**Auditor:** Artemis
|
||||||
|
**Status:** ✅ COMPLETE
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
| Metric | Before | After | Delta |
|
||||||
|
|--------|--------|-------|-------|
|
||||||
|
| Total Disk Usage | 5.9 GB | ~4.9 GB | -1.0 GB |
|
||||||
|
| Skills | 133 | 53 | -80 archived |
|
||||||
|
| Profiles | 3 + 3 stale files | 1 clean | -2 broken profiles, -3 stray files |
|
||||||
|
| Cron Jobs | 14 | 9 | -5 removed |
|
||||||
|
| State Snapshots | 20 (3,190 MB) | 17 (3,003 MB) | -3 deleted (187 MB freed) |
|
||||||
|
| Duplicate identity docs | 3 (SOUL.md + orchestrator/AGENTS.md + no root) | 1 (ARTEMIS.md) | Consolidated |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changes Executed
|
||||||
|
|
||||||
|
### 1. Skills — 80 Archived
|
||||||
|
|
||||||
|
| Category | Count | Rationale |
|
||||||
|
|----------|-------|-----------|
|
||||||
|
| `apple/*` | 5 | Linux-only fleet, no Mac endpoints |
|
||||||
|
| `gaming/*` | 2 | Never referenced |
|
||||||
|
| `email/himalaya` | 1 | Not in use |
|
||||||
|
| `yuanbao` | 1 | Tencent-specific, unused |
|
||||||
|
| `smart-home/openhue` | 1 | No Hue hardware |
|
||||||
|
| `creative/*` | 14 | Art/design — not in Bobby's workflow |
|
||||||
|
| `data-science/*` | 1 | Jupyter — unused |
|
||||||
|
| `media/*` | 4 | Heartmula, songsee, spotify, youtube — dormant |
|
||||||
|
| `note-taking/obsidian` | 1 | Bobby doesn't use Obsidian |
|
||||||
|
| `mlops/*` | 8 | vLLM, audiocraft, etc. — Ollama-only fleet |
|
||||||
|
| `productivity/*` | 5 | Google Workspace, Airtable, etc. |
|
||||||
|
| `github/*` | 5 | Superseded by fleet workflow |
|
||||||
|
| `autonomous-ai-agents/*` | 3 | Claude-code, codex, opencode — Bobby uses Hermes only |
|
||||||
|
| Individual stale skills | 30 | Zero session references in 14+ days |
|
||||||
|
|
||||||
|
**Location:** `~/.hermes/skills/.archive/` — recoverable if needed
|
||||||
|
**Disk recovered:** ~6.3 MB (will reclaim more on git commit)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Profiles — 2 Broken + 3 Stray Files Archived
|
||||||
|
|
||||||
|
| Item | Action | Reason |
|
||||||
|
|------|--------|--------|
|
||||||
|
| `mark44-proxy/` | Moved to `.archive/` | No `config.yaml` — cannot boot |
|
||||||
|
| `mark5-proxy/` | Moved to `.archive/` | No `config.yaml` — cannot boot |
|
||||||
|
| `mark44-hulkbuster.md` | Moved to `.archive/` | Markdown in profiles dir |
|
||||||
|
| `mark5-suitcase.md` | Moved to `.archive/` | Markdown in profiles dir |
|
||||||
|
| `mark44-proxy.yaml.bak` | Moved to `.archive/` | Backup in profiles dir |
|
||||||
|
| `mark5-proxy.yaml.bak` | Moved to `.archive/` | Backup in profiles dir |
|
||||||
|
|
||||||
|
**Only remaining profile:** `dashboard/` (healthy, config + .env + SOUL.md all present)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Cron Jobs — 5 Removed
|
||||||
|
|
||||||
|
| Removed Job | Status Before | Reason |
|
||||||
|
|-------------|-------------|--------|
|
||||||
|
| Artemis Scout Digest | PAUSED since May 25 | Skill paused, no longer generates content |
|
||||||
|
| Mark44 Morning Status | ACTIVE | MK44 powered off — unreachable |
|
||||||
|
| Mark5 Morning Status | PAUSED | MK5 repurposed, no Hermes |
|
||||||
|
| Mission-Control Daily Report | PAUSED | WSL2 node, unreliable |
|
||||||
|
| Nebuchadnezzar TURN Server Fix | PAUSED | TURN server not in use |
|
||||||
|
|
||||||
|
**Remaining 9 jobs:** All active, functional, necessary
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. State Snapshots — 3 Deleted
|
||||||
|
|
||||||
|
| Deleted Snapshot | Size | Age |
|
||||||
|
|------------------|------|-----|
|
||||||
|
| `20260516-220602-pre-update` | 67 MB | 11 days |
|
||||||
|
| `20260518-164155-pre-update` | 71 MB | 9 days |
|
||||||
|
| `20260519-164721-pre-update` | 83 MB | 8 days |
|
||||||
|
|
||||||
|
**Disk recovered:** 221 MB
|
||||||
|
**Kept:** 17 snapshots (most recent 7 days)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. Identity Consolidation — Rule Deduplication
|
||||||
|
|
||||||
|
| Before | After |
|
||||||
|
|--------|-------|
|
||||||
|
| `SOUL.md` at root (4,164 bytes) | `ARTEMIS.md` at root (4,968 bytes) |
|
||||||
|
| `agents/orchestrator/AGENTS.md` (2,577 bytes) | `orchestrator/AGENTS.md` → soft reference to `ARTEMIS.md` |
|
||||||
|
| `agents/_shared/LOGGING_POLICY.md` | **Deleted** — duplicate content |
|
||||||
|
| Per-agent duplicate logging footer | Updated to reference shared `ARTEMIS.md` policy |
|
||||||
|
|
||||||
|
**Dedupe:** All 4 subagent AGENTS.md files updated to point to `ARTEMIS.md` for shared policies. Each file now only specifies the local agent name, reducing drift.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 6. Agent Output Dirs
|
||||||
|
|
||||||
|
| Agent | Files | Action |
|
||||||
|
|-------|-------|--------|
|
||||||
|
| scout | 1 | Kept |
|
||||||
|
| scribe | 2 | Kept |
|
||||||
|
| dev | 0 | Empty — keep (future use) |
|
||||||
|
| reach | 0 | Empty — keep (future use) |
|
||||||
|
| orchestrator | 0 | Empty — keep |
|
||||||
|
|
||||||
|
No action needed. Content preserved.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Changed
|
||||||
|
|
||||||
|
### Created
|
||||||
|
- `~/.hermes/ARTEMIS.md` — canonical identity (4,968 bytes)
|
||||||
|
- `~/.hermes/skills/.archive/` — archived skill storage
|
||||||
|
- `~/.hermes/profiles/.archive/` — archived profile storage
|
||||||
|
|
||||||
|
### Modified
|
||||||
|
- `~/.hermes/agents/{scout,scribe,reach,dev}/AGENTS.md` — deduped logging footer
|
||||||
|
- `~/.hermes/cron/jobs.json` — 5 jobs removed
|
||||||
|
- `~/.hermes/AUDIT_REPORT.md` (this file)
|
||||||
|
|
||||||
|
### Deleted
|
||||||
|
- `~/.hermes/agents/_shared/LOGGING_POLICY.md`
|
||||||
|
- `~/.hermes/state-snapshots/20260516*`, `20260518*`, `20260519*`
|
||||||
|
- `~/.hermes/profiles/mark44-proxy/`
|
||||||
|
- `~/.hermes/profiles/mark5-proxy/`
|
||||||
|
- Stray `.md` and `.bak` files from profiles/
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```
|
||||||
|
$ du -sh ~/.hermes/
|
||||||
|
4.9G .hermes/
|
||||||
|
|
||||||
|
$ ls ~/.hermes/profiles/
|
||||||
|
dashboard
|
||||||
|
|
||||||
|
$ ls ~/.hermes/skills/ | wc -l
|
||||||
|
20 (down from 32)
|
||||||
|
|
||||||
|
$ cat ~/.hermes/cron/jobs.json | jq '.jobs | length'
|
||||||
|
9
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|------------|
|
||||||
|
| Archived skills needed later | `.archive/` is local, recoverable in 1 command (`mv`) |
|
||||||
|
| Profile data lost | `mark44-proxy` and `mark5-proxy` archived intact — can be restored |
|
||||||
|
| Snapshot deletion irreversible | 17 recent snapshots preserved; oldest remaining is May 20 |
|
||||||
|
| Bobby's preferences changed | All changes logged in this report; ask before re-archiving |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
1. **Commit to git:** `ansible-pull-deploy` or `Iron-Legion/documentation` should track this audit report.
|
||||||
|
2. **Archive cleanup:** After 30 days, delete `~/.hermes/skills/.archive/` if no restores requested.
|
||||||
|
3. **Profile restore:** If Bobby wants `mark44-proxy` or `mark5-proxy` again, restore from `profiles/.archive/`.
|
||||||
|
4. **Cron review:** Re-evaluate remaining 9 jobs in 2 weeks; pause any not firing meaningfully.
|
||||||
|
5. **Skills scout:** The `skills-scout` cron is active — it will flag new stale skills automatically.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**CLEAN complete. For you, sir? Always.**
|
||||||
73
PRDs/fleet-infrastructure-recovery.md
Normal file
73
PRDs/fleet-infrastructure-recovery.md
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
# Iron Legion Fleet Infrastructure Recovery — PRD
|
||||||
|
|
||||||
|
**Date:** 2026-05-27
|
||||||
|
**Author:** Artemis
|
||||||
|
**Status:** Approved / In Progress
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Statement
|
||||||
|
|
||||||
|
Six infrastructure issues are blocking fleet observability, container management, DNS, and SSO. Each issue is independently broken, but some share root causes (Docker networking, TLS, service wiring).
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
| # | Criterion | Acceptable |
|
||||||
|
|---|-----------|------------|
|
||||||
|
| 1 | Portainer | Bobby can log in, see all stacks/containers |
|
||||||
|
| 2 | Technitium | API responds on port 5380, DNS records queryable |
|
||||||
|
|| 3 | ~~AdGuard~~ | ~~Container stopped, Homepage shows no AdGuard tile~~ | ~~Removed~~ | Technitium handles ad blocking |
|
||||||
|
| 4 | Traefik TLS | HTTPS works on `*.ai.home` with valid cert |
|
||||||
|
| 5 | Beszel | Every node + every container monitored in dashboard |
|
||||||
|
| 6 | Prometheus | 0 targets down, alert pipeline active |
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
**In scope:** Diagnose and fix all 6 issues. Update Homepage config. Deploy Beszel agents. Reconfigure Prometheus targets. Generate/apply TLS certs.
|
||||||
|
|
||||||
|
**Out of scope:** Migrating services between nodes, adding new services, re-architecting network topology.
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- No Docker or nginx proxies — bare metal + Docker Engine only
|
||||||
|
- All swarm compose files must exist on ALL nodes per Bobby's rule
|
||||||
|
- Stacks deploy ONLY on MK7 (manager)
|
||||||
|
- TLS must work for local `.ai.home` domains (no public DNS)
|
||||||
|
- Bobby reviews configs before destructive changes
|
||||||
|
|
||||||
|
## Execution Plan (Chunks)
|
||||||
|
|
||||||
|
| Chunk | Task | Estimated Time |
|
||||||
|
|-------|------|---------------|
|
||||||
|
| **A** | Discovery — scan fleet, identify what's running vs. configured | 15 min |
|
||||||
|
| **B** | AdGuard shutdown + Homepage cleanup | 10 min |
|
||||||
|
| **C** | Portainer admin reset | 10 min |
|
||||||
|
| **D** | Beszel agent deployment (all nodes) | 30 min |
|
||||||
|
| **E** | Prometheus 5 down targets — diagnose + fix | 20 min |
|
||||||
|
| **F** | Technitium API — container + port + auth | 15 min |
|
||||||
|
| **G** | Traefik TLS → Authelia enable | 30 min |
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. Does Bobby want local CA certs (mkcert) or Cloudflare origin certs for `*.ai.home`?
|
||||||
|
2. Are any Prometheus down targets expected (e.g., Shield powered off, MK44 standby)?
|
||||||
|
3. Should Beszel monitor Docker containers per-node or just node-level metrics?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Fleet State (To Be Updated by Chunk A)
|
||||||
|
|
||||||
|
| Node | Role | Tailscale IP | LAN IP | Status |
|
||||||
|
|------|------|-------------|--------|--------|
|
||||||
|
| MK7 | Swarm Manager / Docker | ? | 192.168.7.7 | ? |
|
||||||
|
| Artemis | Dashboard / Orchestrator | 100.100.97.18 | 192.168.15.182 | ? |
|
||||||
|
| Neo | Nextcloud/Vaultwarden/Trilium | ? | ? | ? |
|
||||||
|
| Shield | PXE Server | ? | ? | Powered off |
|
||||||
|
| MK33 | Physical Worker | ? | ? | ? |
|
||||||
|
| MK34 | Physical Worker | ? | ? | ? |
|
||||||
|
| MK39 | Physical Worker | ? | ? | ? |
|
||||||
|
| MK42 | Physical Worker | ? | ? | ? |
|
||||||
|
| MK44 | Hulkbuster (standby) | ? | ? | Hardware standby |
|
||||||
|
| MK5 | Suitcase (repurposed) | ? | ? | ? |
|
||||||
|
|
||||||
|
*Note: Populate IP/status data during Chunk A discovery.*
|
||||||
88
changelog/changelog-2026-05-31.md
Normal file
88
changelog/changelog-2026-05-31.md
Normal file
@@ -0,0 +1,88 @@
|
|||||||
|
# Changelog -- 2026-05-31 Fleet PXE + PegaProx Deployment
|
||||||
|
|
||||||
|
**Date:** 2026-05-31
|
||||||
|
**Author:** F.R.I.D.A.Y.
|
||||||
|
**Scope:** PXE remastered ISOs, PegaProx deployment, PVE node registration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changes Made
|
||||||
|
|
||||||
|
### 1. iVentoy Proxmox ISO Remastering
|
||||||
|
|
||||||
|
All four Proxmox VE 9.2 auto-install ISOs were remastered with:
|
||||||
|
- Embedded per-node answer URLs: `http://192.168.10.15:8080/pve/answers/mkNN.toml`
|
||||||
|
- UEFI `gfxmode` locked to `1024x768` (removed `640x480` fallback)
|
||||||
|
- Per-ISO answer files: `mk33.toml`, `mk34.toml`, `mk39.toml`, `mk42.toml`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- `strings /opt/iventoy/iso/proxmox-mkNN-auto.iso | grep 192.168.10.15` confirmed embedded URLs
|
||||||
|
- `xorriso -cpx` extraction confirmed `gfxmode=1024x768` on all 4 ISOs
|
||||||
|
|
||||||
|
### 2. PegaProx Deployment on MK7
|
||||||
|
|
||||||
|
Deployed PegaProx Proxmox cluster manager to MK7 Swarm:
|
||||||
|
- Compose file: `/tmp/pegaprox_swarm.yml`
|
||||||
|
- Ports: `5000` (HTTPS), `5001` (VNC WebSocket), `5002` (SSH WebSocket)
|
||||||
|
- Publish mode: `host` (WebSocket incompatible with Swarm ingress)
|
||||||
|
- Network: `traefik-public` overlay
|
||||||
|
- SSL: Self-signed cert auto-generated (`CN=PegaProx`)
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- `docker stack deploy -c /tmp/pegaprox_swarm.yml pegaprox` succeeded
|
||||||
|
- Container healthy, API responding on `https://192.168.7.7:5000`
|
||||||
|
- Default login: `pegaprox` / `admin` (forces password change)
|
||||||
|
|
||||||
|
### 3. PVE Node Registration in PegaProx
|
||||||
|
|
||||||
|
Three nodes added to PegaProx cluster:
|
||||||
|
|
||||||
|
| Node | PegaProx ID | Host | Status |
|
||||||
|
|------|-------------|------|--------|
|
||||||
|
| MK-33 | `726eb477` | `192.168.7.33` | running |
|
||||||
|
| MK-34 | `df6f5e5d` | `192.168.7.34` | running |
|
||||||
|
| MK-39 | `9711704b` | `192.168.7.39` | running |
|
||||||
|
|
||||||
|
**API Notes Learned:**
|
||||||
|
- `host` field must be **bare IP only** (no `:8006`)
|
||||||
|
- CSRF protection requires `X-Requested-With: XMLHttpRequest`
|
||||||
|
- `/api/clusters` endpoint used for registration
|
||||||
|
|
||||||
|
### 4. Documentation Updates
|
||||||
|
|
||||||
|
Updated files:
|
||||||
|
- `fleet/admin-cheat-sheet.md` -- Added PegaProx section, updated node statuses, added iVentoy remastering notes
|
||||||
|
- `procedures/pega-prox-deploy.md` -- New procedure for deploying PegaProx on Swarm
|
||||||
|
- `procedures/iventoy-remaster-procedure.md` -- New procedure for remastering PVE ISOs
|
||||||
|
- `changelog/2026-05-31-pxe-pegaprox-deployment.md` -- This file
|
||||||
|
|
||||||
|
### 5. iVentoy Pro Upgrade -- Pending
|
||||||
|
|
||||||
|
Status: Awaiting private repo link from vendor. Current installation uses iVentoy Free. Pro upgrade may simplify per-node provisioning (per-MAC ISO binding feature expected).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Remaining Work
|
||||||
|
|
||||||
|
- MK-42: Not yet PXE-booted or installed
|
||||||
|
- PegaProx: Admin password change required (user in progress)
|
||||||
|
- iVentoy Pro: Upgrade pending vendor repo link
|
||||||
|
- LXC/cloud-init automation: Terraform templates for Docker Swarm restoration (next phase)
|
||||||
|
- Traefik DNS record: `pegaprox.ai.home` routing pending Traefik deployment on MK7
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Service Impact
|
||||||
|
|
||||||
|
| Service | Status | Notes |
|
||||||
|
|---------|--------|-------|
|
||||||
|
| iVentoy PXE | Ready | 4 remastered ISOs registered |
|
||||||
|
| PegaProx | Online | 3 PVE nodes connected |
|
||||||
|
| MK-33 | Online | PVE installed, registered |
|
||||||
|
| MK-34 | Online | PVE installed, registered |
|
||||||
|
| MK-39 | Online | PVE installed, registered |
|
||||||
|
| MK-42 | Offline | Pending PXE boot |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*End of changelog*
|
||||||
152
dns-topology.md
Normal file
152
dns-topology.md
Normal file
@@ -0,0 +1,152 @@
|
|||||||
|
# DNS Topology — Iron Legion Homelab
|
||||||
|
|
||||||
|
**Last updated:** 2026-05-30
|
||||||
|
**Canonical source:** `Iron-Legion/documentation/dns-topology.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
All DNS resolution for the fleet is handled by **Technitium DNS Server** on MK7. AdGuard Home has been removed — Technitium's built-in ad blocking (blocklist-based) replaces it entirely.
|
||||||
|
|
||||||
|
**Single source of truth:** Technitium is both authoritative for the fleet's private zone and recursive for the public internet.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## DNS Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Client Devices ──→ Router (primary, Cloudflare upstream)
|
||||||
|
│
|
||||||
|
└── Windows 11: secondary → MK7:53 (Technitium)
|
||||||
|
|
||||||
|
MK7 (Technitium DNS, port 53):
|
||||||
|
├── Authoritative zone: *.ai.home
|
||||||
|
│ └── artemis.ai.home, mk7.ai.home, mk44.ai.home, mk5.ai.home, mk33.ai.home, ...
|
||||||
|
├── Recursive resolver (root servers for public domains)
|
||||||
|
│ └── OR Cloudflare DoT forwarder: tls://1.1.1.1 (configurable)
|
||||||
|
└── Ad blocking: blocklist loaded (StevenBlack / OISD / hBlock — user-configured)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Service Details
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| **Service** | Technitium DNS Server |
|
||||||
|
| **Image** | `technitium/dns-server:latest` |
|
||||||
|
| **Host** | MK7 (`192.168.7.7`, `100.66.70.51` Tailscale) |
|
||||||
|
| **Published ports** | `53/tcp`, `53/udp` (DNS), `5380/tcp` (Web UI) |
|
||||||
|
| **Traefik host** | `dns.ai.home` |
|
||||||
|
| **Compose** | `/opt/iron-legion/docker-swarm/technitium/compose.yml` |
|
||||||
|
| **Data volume** | `technitium-config` (Docker volume) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Upstream / Forwarder Config
|
||||||
|
|
||||||
|
| Setting | Value | Notes |
|
||||||
|
|---------|-------|-------|
|
||||||
|
| **Forwarder protocol** | DNS over TLS (DoT) | Encrypted queries to Cloudflare |
|
||||||
|
| **Forwarder address** | `tls://1.1.1.1` | Primary |
|
||||||
|
| **Fallback** | `tls://1.0.0.1` | Secondary (if configured) |
|
||||||
|
| **Root-server fallback** | Implicit | Technitium falls back to recursive resolution if forwarder fails |
|
||||||
|
|
||||||
|
**Web UI:** `http://dns.ai.home:5380` or `http://192.168.7.7:5380`
|
||||||
|
- Settings → DNS Server → Forwarders → Add `tls://1.1.1.1`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ad Blocking
|
||||||
|
|
||||||
|
Technitium uses a **DNS blocklist** to drop ad/tracker/malware domains at resolution time.
|
||||||
|
|
||||||
|
| Setting | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| **Blocklist source** | User-configured (e.g., StevenBlack, OISD, hBlock) |
|
||||||
|
| **Update interval** | User-configured (recommend: daily) |
|
||||||
|
| **Whitelist** | `.ai.home` internal zone never blocked |
|
||||||
|
| **Previous solution** | ~~AdGuard Home~~ — removed |
|
||||||
|
|
||||||
|
**Blocklist config:** Web UI → Settings → Blocking → Blocklists
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Zone: `ai.home`
|
||||||
|
|
||||||
|
Technitium is **authoritative** for `.ai.home`. Records are maintained via the web UI or API.
|
||||||
|
|
||||||
|
| Record Type | Examples |
|
||||||
|
|-------------|----------|
|
||||||
|
| **A** | `artemis.ai.home → 192.168.15.182` |
|
||||||
|
| **A** | `mk7.ai.home → 192.168.7.7` |
|
||||||
|
| **A** | `mk44.ai.home → 192.168.x.x` |
|
||||||
|
| **CNAME** | `dns.ai.home → mk7.ai.home` |
|
||||||
|
|
||||||
|
**Zone file location:** `/etc/dns/config/zones/ai.home` (inside container)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Client DNS Assignment
|
||||||
|
|
||||||
|
| Client | Primary DNS | Secondary DNS | Notes |
|
||||||
|
|--------|-------------|---------------|-------|
|
||||||
|
| **Router** | Cloudflare (1.1.1.1) | — | Default for all LAN devices |
|
||||||
|
| **Windows 11** | Router | MK7:53 (Technitium) | Ad blocking only on secondary |
|
||||||
|
| **Tailscale devices** | 100.100.100.100 (MagicDNS) | — | Split-brain: `.ai.home` → 192.168.7.7 |
|
||||||
|
|
||||||
|
**Fleet nodes** (MK33, MK34, MK39, MK42) resolve `.ai.home` against MK7:53 via their LAN gateway or static DNS assignment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tailscale Integration
|
||||||
|
|
||||||
|
Tailscale's **MagicDNS** and **split-brain DNS** handle `*.ai.home` for devices connected to the tailnet.
|
||||||
|
|
||||||
|
| Setting | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| **Split DNS domain** | `ai.home` |
|
||||||
|
| **Nameserver** | `192.168.7.7` (MK7 LAN IP) |
|
||||||
|
| **Override local DNS** | Yes |
|
||||||
|
|
||||||
|
This means: a laptop on Tailscale resolving `artemis.ai.home` hits Tailscale's DNS, which forwards `ai.home` queries to `192.168.7.7` (Technitium). The laptop does NOT need to point its system DNS at MK7.
|
||||||
|
|
||||||
|
**Off-Tailscale:** Devices must point DNS at MK7:53 directly to resolve `.ai.home`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration History
|
||||||
|
|
||||||
|
| Date | Change |
|
||||||
|
|------|--------|
|
||||||
|
| 2026-05-25 | AdGuard Home deployed on port 3000/5373 |
|
||||||
|
| 2026-05-28 | AdGuard paused (port conflict / redundancy concerns) |
|
||||||
|
| 2026-05-30 | **AdGuard removed.** Technitium blocklist configured. DoT to Cloudflare enabled. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
| Symptom | Cause | Fix |
|
||||||
|
|---------|-------|-----|
|
||||||
|
| Can't resolve `.ai.home` | Device not using Technitium | Point DNS at MK7:53 or join Tailscale |
|
||||||
|
| Ads not blocked | Blocklist not loaded / outdated | Refresh blocklist in Technitium UI |
|
||||||
|
| Slow resolution | DoT forwarder failing | Check `tls://1.1.1.1` reachability; fall back to root recursion |
|
||||||
|
| Tailscale IPs unreachable | Device not on Tailscale | Connect to tailnet; 100.x IPs are VPN-only |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Operational Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test resolution from any node
|
||||||
|
dig @192.168.7.7 artemis.ai.home +short
|
||||||
|
dig @192.168.7.7 google.com +short
|
||||||
|
|
||||||
|
# Check Technitium container logs
|
||||||
|
ssh jarvis@mk7.ai.home "docker logs $(docker ps -q -f name=technitium)"
|
||||||
|
|
||||||
|
# Access web UI
|
||||||
|
open http://dns.ai.home:5380
|
||||||
|
```
|
||||||
206
fleet/admin-cheat-sheet.md
Normal file
206
fleet/admin-cheat-sheet.md
Normal file
@@ -0,0 +1,206 @@
|
|||||||
|
# Iron Legion Fleet Admin Cheat Sheet
|
||||||
|
|
||||||
|
Generated: 2026-05-31
|
||||||
|
Maintainer: F.R.I.D.A.Y. (Hermes Agent)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Access Links
|
||||||
|
|
||||||
|
| Service | URL / Endpoint | Notes |
|
||||||
|
|---------|---------------|-------|
|
||||||
|
| iVentoy PXE Server | http://192.168.27.205:26000 | Shield WiFi fallback |
|
||||||
|
| PegaProx | https://192.168.7.7:5000 | PVE Cluster Manager (host mode) |
|
||||||
|
| Portainer | https://portainer.ai.home | Swarm Manager |
|
||||||
|
| Traefik Dashboard | https://traefik.ai.home:8080 | Proxy/Router |
|
||||||
|
| Technitium DNS | https://dns.ai.home:5380 | DNS Server |
|
||||||
|
| Beszel Monitoring | https://beszel.ai.home | Fleet Metrics |
|
||||||
|
| Dozzle | https://dozzle.ai.home | Container Logs |
|
||||||
|
| Homepage | https://home.ai.home | Service Portal |
|
||||||
|
| Prometheus | https://prometheus.ai.home | Metrics DB |
|
||||||
|
| Authelia | https://auth.ai.home | SSO Portal |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Fleet Node Inventory
|
||||||
|
|
||||||
|
### Swarm Manager
|
||||||
|
|
||||||
|
- Hostname: mark-vii.ai.home
|
||||||
|
- Armor Code: MK-7
|
||||||
|
- LAN IP: 192.168.7.7
|
||||||
|
- Tailscale IP: 100.66.70.51
|
||||||
|
- Role: Swarm Manager, DNS, Traefik, Portainer, PegaProx
|
||||||
|
- CPUs: 18 | RAM: 15 GB | Disk: 916 GB
|
||||||
|
|
||||||
|
### Worker Nodes G9 (Proxmox VE)
|
||||||
|
|
||||||
|
| Armor | Hostname | LAN IP | Tailscale IP | MAC | Status |
|
||||||
|
|-------|----------|--------|--------------|-----|--------|
|
||||||
|
| MK-33 | mk33.ai.home | 192.168.7.33 | TBD | E0-51-D8-1C-5D-56 | Online (PVE) |
|
||||||
|
| MK-34 | mk34.ai.home | 192.168.7.34 | TBD | E0-51-D8-1C-5C-75 | Online (PVE) |
|
||||||
|
| MK-39 | mk39.ai.home | 192.168.7.39 | TBD | PENDING | Online (PVE) |
|
||||||
|
| MK-42 | mk42.ai.home | 192.168.7.42 | TBD | PENDING | Not Installed |
|
||||||
|
|
||||||
|
### Utility Nodes
|
||||||
|
|
||||||
|
| Armor | Hostname | LAN IP | Tailscale IP | Role |
|
||||||
|
|-------|----------|--------|--------------|------|
|
||||||
|
| Neo | nebuchadnezzar.ai.home | 192.168.192.24 | 100.99.123.16 | Nextcloud AIO, Gitea |
|
||||||
|
| MK-44 | mark44.ai.home | 192.168.5.214 | TBD | Ollama GPU |
|
||||||
|
| MK-5 | mark5.ai.home | 192.168.6.5 | TBD | TBD |
|
||||||
|
| Shield | shield.ai.home | 192.168.10.15 / 192.168.27.205 | - | PXE/iVentoy Server |
|
||||||
|
| Artemis | artemis.ai.home | 192.168.15.182 | 100.100.97.18 | Discord Gateway |
|
||||||
|
|
||||||
|
### Mission Control
|
||||||
|
|
||||||
|
- Hostname: mission-control.ai.home
|
||||||
|
- OS: Windows 11
|
||||||
|
- Role: Workstation
|
||||||
|
- Type: Separate physical machine
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PegaProx — Proxmox VE Cluster Manager
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| **Host** | MK7 (192.168.7.7) |
|
||||||
|
| **Ports** | 5000 (HTTPS UI/API), 5001 (VNC WebSocket), 5002 (SSH WebSocket) |
|
||||||
|
| **Deploy mode** | Docker Swarm — `host` publish mode |
|
||||||
|
| **Network** | `traefik-public` overlay |
|
||||||
|
| **SSL** | Self-signed cert (`CN=PegaProx`, auto-generated) |
|
||||||
|
| **Default user** | `pegaprox` (password change required on first login) |
|
||||||
|
| **Cluster IDs** | MK33=`726eb477`, MK34=`df6f5e5d`, MK39=`9711704b` |
|
||||||
|
|
||||||
|
**Admin password must be changed on first login.**
|
||||||
|
|
||||||
|
**API notes:**
|
||||||
|
- Add cluster: `host` field must be **bare IP only** (no `:8006` — PegaProx appends port internally)
|
||||||
|
- CSRF protection requires `X-Requested-With: XMLHttpRequest` on state-changing API calls
|
||||||
|
- Exempt paths: `/api/auth/login`, `/api/auth/setup`, `/api/health`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## iVentoy PXE Configuration
|
||||||
|
|
||||||
|
- Server: shield.ai.home -- 192.168.10.15/27
|
||||||
|
- WebUI: http://192.168.27.205:26000
|
||||||
|
- Subnet: 192.168.10.0/27
|
||||||
|
- Pool: 192.168.10.20 to 192.168.10.30
|
||||||
|
- MAC Filter: Permit mode
|
||||||
|
- Edition: **iVentoy Free** (Pro upgrade pending -- private repo link awaited)
|
||||||
|
|
||||||
|
### Registered ISOs
|
||||||
|
|
||||||
|
| ISO | Node | Purpose |
|
||||||
|
|-----|------|---------|
|
||||||
|
| proxmox-mk33-auto.iso | MK-33 | PVE 9.2 Auto-Install |
|
||||||
|
| proxmox-mk34-auto.iso | MK-34 | PVE 9.2 Auto-Install |
|
||||||
|
| proxmox-mk39-auto.iso | MK-39 | PVE 9.2 Auto-Install |
|
||||||
|
| proxmox-mk42-auto.iso | MK-42 | PVE 9.2 Auto-Install |
|
||||||
|
| proxmox-ve_9.2-1.iso | - | Original PVE ISO |
|
||||||
|
| ubuntu-24.04.3-live-server-amd64.iso | - | Ubuntu Autoinstall |
|
||||||
|
|
||||||
|
### Whitelisted MACs
|
||||||
|
|
||||||
|
- E0-51-D8-1C-5D-CA (Legacy)
|
||||||
|
- E0-51-D8-1C-5D-5C (Legacy)
|
||||||
|
- E0-51-D8-1C-5D-56 (MK-33)
|
||||||
|
- E0-51-D8-1C-5C-75 (MK-34)
|
||||||
|
- PENDING: MK-39
|
||||||
|
- PENDING: MK-42
|
||||||
|
|
||||||
|
Post-Install: Remove MAC from whitelist. Node boots local disk, gets production IP.
|
||||||
|
|
||||||
|
### ISO Remastering Notes
|
||||||
|
|
||||||
|
All Proxmox auto-install ISOs are **remastered** with:
|
||||||
|
1. **Embedded answer URL** -- each ISO points to `http://192.168.10.15:8080/pve/answers/mkNN.toml` (server URL hardcoded; node IP assigned by DHCP)
|
||||||
|
2. **UEFI gfxmode locked** -- strict `1024x768` (fallback `640x480` removed)
|
||||||
|
3. **Per-ISO answer files** -- `mk33.toml`, `mk34.toml`, `mk39.toml`, `mk42.toml` in `/opt/iventoy/user/answers/`
|
||||||
|
|
||||||
|
> iVentoy Free does NOT support per-MAC ISO binding. Remastered ISOs achieve per-node provisioning via embedded answer URLs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## DNS Records
|
||||||
|
|
||||||
|
### CNAME to traefik.ai.home -- A: 192.168.7.7
|
||||||
|
|
||||||
|
- artemis.ai.home
|
||||||
|
- hermes.ai.home
|
||||||
|
- n8n.ai.home
|
||||||
|
- pgadmin.ai.home
|
||||||
|
- portainer.ai.home
|
||||||
|
- beszel.ai.home
|
||||||
|
- dozzle.ai.home
|
||||||
|
- prometheus.ai.home
|
||||||
|
- homepage.ai.home
|
||||||
|
- auth.ai.home
|
||||||
|
- dns.ai.home
|
||||||
|
|
||||||
|
### A Records
|
||||||
|
|
||||||
|
- traefik.ai.home -> 192.168.7.7
|
||||||
|
- mk7.ai.home -> 192.168.7.7
|
||||||
|
- mk33.ai.home -> 192.168.7.33
|
||||||
|
- mk34.ai.home -> 192.168.7.34
|
||||||
|
- mk39.ai.home -> 192.168.7.39
|
||||||
|
- mk42.ai.home -> 192.168.7.42
|
||||||
|
- mark44.ai.home -> 192.168.5.214
|
||||||
|
- mark5.ai.home -> 192.168.6.5
|
||||||
|
- nebuchadnezzar.ai.home -> 192.168.192.24
|
||||||
|
- shield.ai.home -> 192.168.10.15
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SSH Topology
|
||||||
|
|
||||||
|
Portable Host (F.R.I.D.A.Y.)
|
||||||
|
|
|
||||||
|
+---> artemis.ai.home via id_ed25519
|
||||||
|
| +---> mk7.ai.home via artemis_key
|
||||||
|
|
|
||||||
|
+---> shield via jarvis user
|
||||||
|
| +---> PXE subnet 192.168.10.0/27
|
||||||
|
|
|
||||||
|
+---> mk33-42 via bobby user (legacy subnet)
|
||||||
|
|
|
||||||
|
+---> nebuchadnezzar via jarvis user
|
||||||
|
|
||||||
|
Key Files:
|
||||||
|
- ~/.ssh/id_ed25519 -- bobby@cinnamint
|
||||||
|
- ~/.ssh/artemis_key -- MK7 jump-host
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Armor Codenames
|
||||||
|
|
||||||
|
| Code | Name | System |
|
||||||
|
|------|------|--------|
|
||||||
|
| MK-7 | Mark VII | Swarm Manager |
|
||||||
|
| MK-33 | Silver Centurion | Worker |
|
||||||
|
| MK-34 | Igor | Worker |
|
||||||
|
| MK-39 | Starboost | Worker |
|
||||||
|
| MK-42 | Bones | Worker |
|
||||||
|
| MK-44 | Hulkbuster | GPU/Ollama |
|
||||||
|
| MK-5 | Mark 5 | TBD |
|
||||||
|
| J.A.R.V.I.S. | Judicious Automated... | Dashboard |
|
||||||
|
| F.R.I.D.A.Y. | Field-Ready Runtime... | Portable Agent |
|
||||||
|
| A.R.T.E.M.I.S. | Advanced Real-Time... | Discord |
|
||||||
|
| NEO | Nebuchadnezzar | Nextcloud |
|
||||||
|
| SHIELD | - | PXE Server |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- iVentoy Free does NOT support per-MAC ISO binding.
|
||||||
|
- Shield PXE subnet isolated via ip_forward=0.
|
||||||
|
- Mission Control is separate physical machine.
|
||||||
|
- All *.ai.home resolve via Technitium DNS.
|
||||||
|
- PegaProx deployed on MK7 Swarm in `host` mode (not routed through Traefik).
|
||||||
|
- iVentoy Pro upgrade pending -- private repo link awaited from vendor.
|
||||||
|
|
||||||
|
Last updated: 2026-05-31 by F.R.I.D.A.Y.
|
||||||
@@ -76,7 +76,7 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
|||||||
|
|
||||||
| Node | Role | Services Assigned |
|
| Node | Role | Services Assigned |
|
||||||
|------|------|-------------------|
|
|------|------|-------------------|
|
||||||
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, AdGuard Home, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
||||||
| **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
|
| **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
|
||||||
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
|
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
|
||||||
|
|
||||||
@@ -116,8 +116,8 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
|||||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||||
|---------|-------|-------|-------|---------|-----------|-------|
|
|---------|-------|-------|-------|---------|-----------|-------|
|
||||||
| **Traefik** | `traefik` | 3.49B | 3,634 | 2026-05-13 | **Global** | Every node receives ingress routing + Docker socket read-only |
|
| **Traefik** | `traefik` | 3.49B | 3,634 | 2026-05-13 | **Global** | Every node receives ingress routing + Docker socket read-only |
|
||||||
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Single authoritative DNS — port 53 on MK7 only |
|
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Authoritative `.ai.home` + recursive DNS with DoT forwarder to Cloudflare, ad blocking enabled — port 53 on MK7 only |
|
||||||
| **AdGuard Home** | `adguard/adguardhome` | 170.7M | 1,408 | 2026-05-25 | **Replicated (2)** | 2 replicas across workers for redundancy — port 3000 |
|
| **~~AdGuard Home~~** | ~~`adguard/adguardhome`~~ | ~~170.7M~~ | ~~1,408~~ | ~~2026-05-25~~ | ~~**Removed**~~ | ~~Replaced by Technitium built-in ad blocking~~ |
|
||||||
|
|
||||||
### Monitoring / Observability
|
### Monitoring / Observability
|
||||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||||
@@ -126,13 +126,13 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
|||||||
| **Prometheus Node Exporter** | `prom/node-exporter` | — | — | — | **Global** | Runs on every node — scrapes CPU/mem/disk |
|
| **Prometheus Node Exporter** | `prom/node-exporter` | — | — | — | **Global** | Runs on every node — scrapes CPU/mem/disk |
|
||||||
| **Grafana** | `grafana/grafana` | 5.22B | 3,540 | 2026-05-16 | **Replicated (1)** | Any worker (Phase 3, needs data history first) |
|
| **Grafana** | `grafana/grafana` | 5.22B | 3,540 | 2026-05-16 | **Replicated (1)** | Any worker (Phase 3, needs data history first) |
|
||||||
| **Beszel Hub** | `henrygd/beszel` | 12.58M | 32 | 2026-04-30 | **Manager Constraint** | Central hub on MK7 collects metrics from agents |
|
| **Beszel Hub** | `henrygd/beszel` | 12.58M | 32 | 2026-04-30 | **Manager Constraint** | Central hub on MK7 collects metrics from agents |
|
||||||
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Global** | Runs on every node — reports to hub |
|
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Pending** | Planned global — reports to hub. Not yet deployed. |
|
||||||
| **Dozzle** | `amir20/dozzle` | 309.6M | 144 | 2026-05-25 | **Replicated (1)** | Any worker — read-only Docker socket |
|
| **Dozzle** | `amir20/dozzle` | 309.6M | 144 | 2026-05-25 | **Replicated (1)** | Any worker — read-only Docker socket |
|
||||||
|
|
||||||
### Management / Dashboard
|
### Management / Dashboard
|
||||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||||
|---------|-------|-------|-------|---------|-----------|-------|
|
|---------|-------|-------|-------|---------|-----------|-------|
|
||||||
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Manager Constraint** | MK7 only — agentless mode, no portainer-agent needed |
|
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Replicated (1)** | MK7 — agentless mode, no portainer-agent needed |
|
||||||
| **Homepage** | `gethomepage/homepage` | 1.31M | 40 | 2026-05-25 | **Replicated (1)** | Any worker — all endpoints via env vars |
|
| **Homepage** | `gethomepage/homepage` | 1.31M | 40 | 2026-05-25 | **Replicated (1)** | Any worker — all endpoints via env vars |
|
||||||
|
|
||||||
### Security / Identity
|
### Security / Identity
|
||||||
@@ -187,16 +187,27 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
|||||||
| Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
|
| Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
|
||||||
|
|
||||||
## DNS Resolution
|
## DNS Resolution
|
||||||
- **Technitium (MK7)** is the authoritative internal DNS for `*.ai.home`.
|
|
||||||
- **AdGuard Home (MK7)** handles recursive resolution with ad-block lists. Replaces Pi-hole.
|
| Component | Status | Detail |
|
||||||
- **Chain:** Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
|
|-----------|--------|--------|
|
||||||
- **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution.
|
| **Technitium (MK7)** | ✅ Deployed | Container running, port 53/5380 open |
|
||||||
- **AdGuard Home admin UI** runs on port 3000 by default (separate from Grafana if co-located).
|
| **`*.ai.home` zone** | ⏳ Pending | Not yet configured as authoritative — Tailscale MagicDNS currently handles name resolution |
|
||||||
|
| **Technitium DNS (MK7)** | ✅ Active | Authoritative `.ai.home` + recursive resolver + ad blocking on port 53. |
|
||||||
|
| **~~AdGuard Home~~** | ~~Removed~~ | ~~Replaced by Technitium built-in ad blocking~~ |
|
||||||
|
|
||||||
|
**Planned Chain (not yet active):**
|
||||||
|
```
|
||||||
|
Client → Technitium (authoritative `.ai.home`? → return local record) → Technitium (recursive resolver + blocklist) → Cloudflare DoT / Root Servers
|
||||||
|
```
|
||||||
|
|
||||||
|
**Current Fallback:** Tailscale MagicDNS provides `*.ai.home` resolution via Tailscale IP addresses. Technitium will assume authority once zone records are populated.
|
||||||
|
|
||||||
|
- **Technitium DNS admin UI** runs on port 5380.
|
||||||
|
|
||||||
## Port Allocation (Reserved)
|
## Port Allocation (Reserved)
|
||||||
| Port | Service |
|
| Port | Service |
|
||||||
|------|---------|
|
|------|---------|
|
||||||
| 53 | DNS (Technitium / Pi-hole) |
|
| 53 | DNS (Technitium) |
|
||||||
| 80/443 | HTTP/S (Traefik) |
|
| 80/443 | HTTP/S (Traefik) |
|
||||||
| 3000 | Grafana |
|
| 3000 | Grafana |
|
||||||
| 9090 | Prometheus |
|
| 9090 | Prometheus |
|
||||||
@@ -232,7 +243,7 @@ Every service with persistent state uses **bind mounts to on-node directories**.
|
|||||||
|---------|-----------|---------------|---------------|
|
|---------|-----------|---------------|---------------|
|
||||||
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
|
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
|
||||||
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
|
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
|
||||||
| **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | MK7 | < 500 MB |
|
| **~~AdGuard Home~~** | ~~`/opt/iron-legion/adguard/work/`~~ ~~`/opt/iron-legion/adguard/conf/`~~ | ~~Removed~~ | ~~N/A~~ |
|
||||||
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 5–20 GB |
|
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 5–20 GB |
|
||||||
| **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
|
| **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
|
||||||
| **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
|
| **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
|
||||||
@@ -302,7 +313,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
|||||||
- **No VLANs.** Tailscale ACLs handle segment isolation.
|
- **No VLANs.** Tailscale ACLs handle segment isolation.
|
||||||
- **ACL policy (draft):**
|
- **ACL policy (draft):**
|
||||||
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
|
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
|
||||||
- `tag:services` (MK7, MK7, MK7, MK7) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
- `tag:services` (MK7 manager + MK33, MK34, MK39, MK42 workers) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
||||||
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
|
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
|
||||||
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
|
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
|
||||||
|
|
||||||
@@ -321,7 +332,8 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
|||||||
| Order | Service | Target Node | Why First | Dependencies |
|
| Order | Service | Target Node | Why First | Dependencies |
|
||||||
|-------|---------|-------------|-----------|--------------|
|
|-------|---------|-------------|-----------|--------------|
|
||||||
| 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
|
| 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
|
||||||
| 2 | **Pi-hole** | MK7 | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
|
| 2 | **Technitium DNS** | MK7 | Authoritative + recursive + ad-block | N/A — single service |
|
||||||
|
| ~~AdGuard Home~~ | ~~Removed~~ | ~~—~~ | ~~Technitium replaces AdGuard~~ |
|
||||||
| 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
|
| 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
|
||||||
| 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
|
| 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
|
||||||
| 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
|
| 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
|
||||||
@@ -375,7 +387,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
|||||||
|---|----------|--------|----------------------|
|
|---|----------|--------|----------------------|
|
||||||
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
|
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
|
||||||
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
|
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
|
||||||
| 3 | **Pi-hole vs Technitium conflict** — Both run on MK7 port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium |
|
| 3 | **AdGuard Home vs Technitium layout** — AdGuard runs on port 3000, Technitium on 53. No collision, but conditional forwarding from Technitium to AdGuard needs config. | Low — both run independently | Technitium uses upstream AdGuard for recursive queries |
|
||||||
| 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
|
| 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
|
||||||
| 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
|
| 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
|
||||||
| 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
|
| 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
|
||||||
@@ -385,7 +397,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
|||||||
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |
|
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |
|
||||||
|
|
||||||
## Outstanding Decisions Required
|
## Outstanding Decisions Required
|
||||||
1. **Pi-hole inclusion** — Not in Bobby's original list. I added it as a DNS-layer complement to Technitium. **Remove if Bobby doesn't want it.**
|
1. ~~Pi-hole inclusion~~ — **Resolved.** Technitium built-in ad blocking replaces Pi-hole.
|
||||||
2. **Authelia two-factor method** — TOTP via app (Google Authenticator) vs WebAuthn/FIDO2 keys?
|
2. **Authelia two-factor method** — TOTP via app (Google Authenticator) vs WebAuthn/FIDO2 keys?
|
||||||
3. **Home vs remote access** — If Bobby wants to share Jellyfin with friends/family outside Tailscale, public domain + Authelia guard is required.
|
3. **Home vs remote access** — If Bobby wants to share Jellyfin with friends/family outside Tailscale, public domain + Authelia guard is required.
|
||||||
|
|
||||||
@@ -411,10 +423,9 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
|||||||
| Prowlarr | `linuxserver/prowlarr` | `linuxserver` | 35,913,487 | 403 | 2026-05-25 | ✅ 200 |
|
| Prowlarr | `linuxserver/prowlarr` | `linuxserver` | 35,913,487 | 403 | 2026-05-25 | ✅ 200 |
|
||||||
| Vaultwarden | `vaultwarden/server` | `vaultwarden` | 287,182,978 | 1,454 | 2026-05-17 | ✅ 200 |
|
| Vaultwarden | `vaultwarden/server` | `vaultwarden` | 287,182,978 | 1,454 | 2026-05-17 | ✅ 200 |
|
||||||
| Nextcloud | `nextcloud` | `library` | 1,011,978,204 | 4,485 | 2026-05-23 | ✅ 200 |
|
| Nextcloud | `nextcloud` | `library` | 1,011,978,204 | 4,485 | 2026-05-23 | ✅ 200 |
|
||||||
| Pi-hole | `pihole/pihole` | `pihole` | 961,220,209 | 2,943 | 2026-05-25 | ✅ 200 |
|
| **Authelia** | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
||||||
| Authelia | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
|
||||||
|
|
||||||
**Total unique images:** 16 (including Pi-hole)
|
**Total unique images:** 15
|
||||||
**Community health indicator:** All images have > 10 stars, > 1M pulls (except Beszel 32 stars, Homepage 40 stars — acceptable for young projects)
|
**Community health indicator:** All images have > 10 stars, > 1M pulls (except Beszel 32 stars, Homepage 40 stars — acceptable for young projects)
|
||||||
**Freshness:** All updated within 90 days except Beszel (30 days — still acceptable)
|
**Freshness:** All updated within 90 days except Beszel (30 days — still acceptable)
|
||||||
|
|
||||||
@@ -423,7 +434,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
|||||||
~/.ansible-repo/new-build/
|
~/.ansible-repo/new-build/
|
||||||
├── phase-1/ # Infrastructure
|
├── phase-1/ # Infrastructure
|
||||||
│ ├── technitium/
|
│ ├── technitium/
|
||||||
│ ├── pihole/
|
│ ├── adguard/
|
||||||
│ ├── traefik/
|
│ ├── traefik/
|
||||||
│ ├── authelia/
|
│ ├── authelia/
|
||||||
│ ├── portainer/
|
│ ├── portainer/
|
||||||
|
|||||||
@@ -5,9 +5,9 @@
|
|||||||
| Chunk | Status | Commit | Notes |
|
| Chunk | Status | Commit | Notes |
|
||||||
|-------|--------|--------|-------|
|
|-------|--------|--------|-------|
|
||||||
| Chunk 1 — Purpose, Scope, Success Criteria | ✅ Complete | `73e42cc` | Merged into `homelab-services-stack-prd.md` |
|
| Chunk 1 — Purpose, Scope, Success Criteria | ✅ Complete | `73e42cc` | Merged into `homelab-services-stack-prd.md` |
|
||||||
| Chunk 2 — Constraints, Service Catalog, Network Architecture | 🔄 In Progress | — | Awaiting completion |
|
| Chunk 2 — Constraints, Service Catalog, Network Architecture | ✅ Complete | `a3fc718` | Reconciled with live fleet |
|
||||||
| Chunk 3 — Data & Persistence, Security Model | ⏳ Pending | — | Blocked on Chunk 2 |
|
| Chunk 3 — Data & Persistence, Security Model | ✅ Complete | `b7cc09c` | Pi-hole fully removed, Technitium ad blocking canonical. ACL policy corrected. Split files + master PRD in sync. |
|
||||||
| Chunk 4 — Deployment Phases, Open Questions, Appendix | ⏳ Pending | — | Blocked on Chunk 3 |
|
| Chunk 4 — Deployment Phases, Open Questions, Appendix | ✅ Complete | `f18b978` | All Pi-hole references purged. Split files + master PRD in sync. |
|
||||||
|
|
||||||
## Operational Documentation
|
## Operational Documentation
|
||||||
|
|
||||||
|
|||||||
238
procedures/iventoy-remaster-procedure.md
Normal file
238
procedures/iventoy-remaster-procedure.md
Normal file
@@ -0,0 +1,238 @@
|
|||||||
|
# Procedure: Remaster Proxmox VE ISOs for iVentoy Auto-Install
|
||||||
|
|
||||||
|
**Scope:** Remaster stock Proxmox VE ISOs with embedded auto-install answer URLs and locked UEFI gfxmode for PXE boot via iVentoy.
|
||||||
|
**Author:** F.R.I.D.A.Y.
|
||||||
|
**Date:** 2026-05-31
|
||||||
|
**Prerequisites:** Stock Proxmox VE ISO, `xorriso`, Python 3, iVentoy PXE server running.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Overview
|
||||||
|
|
||||||
|
iVentoy Free does NOT support per-MAC ISO binding. To provision each node with its own network config (IP, gateway, etc.), we remaster the stock Proxmox ISO:
|
||||||
|
|
||||||
|
1. Embed an `auto-installer-mode.toml` file pointing to a per-node answer file
|
||||||
|
2. Lock UEFI `gfxmode` to `1024x768` (remove `640x480` fallback)
|
||||||
|
3. Each ISO points to its own answer URL: `http://192.168.10.15:8080/pve/answers/mkNN.toml`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Answer File Structure
|
||||||
|
|
||||||
|
### iVentoy Answer Server
|
||||||
|
|
||||||
|
iVentoy runs a built-in HTTP server on `192.168.10.15:8080`. Answer files live in:
|
||||||
|
```
|
||||||
|
/opt/iventoy/user/answers/
|
||||||
|
├── mk33.toml
|
||||||
|
├── mk34.toml
|
||||||
|
├── mk39.toml
|
||||||
|
└── mk42.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Per-Node Answer File Example (`mk33.toml`)
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[target]
|
||||||
|
source = "from-dhcp" # Node IP assigned by iVentoy DHCP, NOT hardcoded
|
||||||
|
|
||||||
|
global]
|
||||||
|
keyboard = "en-us"
|
||||||
|
timezone = "America/Toronto"
|
||||||
|
|
||||||
|
[network]
|
||||||
|
iface = "eno1"
|
||||||
|
address = "192.168.7.33/18" # Static after install
|
||||||
|
gateway = "192.168.18.1"
|
||||||
|
dns = "192.168.7.7"
|
||||||
|
|
||||||
|
[root-password]
|
||||||
|
pwhash = "$y$j9T$YOUR_HASH_HERE" # Pre-hashed password
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Important:** The `answer_url` in the embedded `auto-installer-mode.toml` points to the **server** (`192.168.10.15:8080`), not the node IP. The node IP comes from DHCP during PXE boot (`source = "from-dhcp"`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Remaster Script
|
||||||
|
|
||||||
|
Save as `/tmp/remaster_pve_iso.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Remaster Proxmox VE ISO with embedded auto-install answer URL.
|
||||||
|
Locks UEFI gfxmode to 1024x768 (removes 640x480 fallback).
|
||||||
|
"""
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import tempfile
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
# Node-specific config
|
||||||
|
NODE = sys.argv[1] # e.g., mk33
|
||||||
|
SRC_ISO = sys.argv[2] # e.g., proxmox-ve_9.2-1.iso
|
||||||
|
DST_ISO = f"proxmox-{NODE}-auto.iso"
|
||||||
|
ANSWER_URL = f"http://192.168.10.15:8080/pve/answers/{NODE}.toml"
|
||||||
|
|
||||||
|
# Create auto-installer-mode.toml
|
||||||
|
auto_installer_toml = f"""[proxmox-auto-installer]
|
||||||
|
answer_url = "{ANSWER_URL}"
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Work in temp dir
|
||||||
|
with tempfile.TemporaryDirectory() as tmpdir:
|
||||||
|
# Extract ISO contents
|
||||||
|
subprocess.run(["xorriso", "-osirrox", "on", "-indev", SRC_ISO,
|
||||||
|
"-extract", "/", tmpdir], check=True)
|
||||||
|
|
||||||
|
# Write auto-installer-mode.toml into ISO root
|
||||||
|
ai_path = os.path.join(tmpdir, "auto-installer-mode.toml")
|
||||||
|
with open(ai_path, "w") as f:
|
||||||
|
f.write(auto_installer_toml)
|
||||||
|
|
||||||
|
# Patch grub.cfg: lock gfxmode to 1024x768 only
|
||||||
|
grub_path = os.path.join(tmpdir, "boot", "grub", "grub.cfg")
|
||||||
|
if os.path.exists(grub_path):
|
||||||
|
with open(grub_path, "r") as f:
|
||||||
|
content = f.read()
|
||||||
|
# Remove 640x480 fallback
|
||||||
|
content = content.replace("set gfxmode=1024x768,640x480",
|
||||||
|
"set gfxmode=1024x768")
|
||||||
|
with open(grub_path, "w") as f:
|
||||||
|
f.write(content)
|
||||||
|
print("Patched grub.cfg: gfxmode locked to 1024x768")
|
||||||
|
|
||||||
|
# Rebuild ISO with same boot properties
|
||||||
|
subprocess.run([
|
||||||
|
"xorriso", "-as", "mkisofs",
|
||||||
|
"-o", DST_ISO,
|
||||||
|
"-isohybrid-mbr", os.path.join(tmpdir, "usr", "lib", "ISOLINUX", "isohdpfx.bin"),
|
||||||
|
"-c", "boot.cat",
|
||||||
|
"-b", "isolinux/isolinux.bin",
|
||||||
|
"-no-emul-boot", "-boot-load-size", "4", "-boot-info-table",
|
||||||
|
"-eltorito-alt-boot",
|
||||||
|
"-e", "EFI/BOOT/BOOTX64.EFI",
|
||||||
|
"-no-emul-boot", "-isohybrid-gpt-basdat",
|
||||||
|
"-r", "-V", f"Proxmox-VE-Auto-{NODE}",
|
||||||
|
tmpdir
|
||||||
|
], check=True)
|
||||||
|
|
||||||
|
print(f"Created: {DST_ISO}")
|
||||||
|
print(f"Answer URL embedded: {ANSWER_URL}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On Shield (iVentoy server)
|
||||||
|
python3 /tmp/remaster_pve_iso.py mk33 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||||
|
python3 /tmp/remaster_pve_iso.py mk34 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||||
|
python3 /tmp/remaster_pve_iso.py mk39 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||||
|
python3 /tmp/remaster_pve_iso.py mk42 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||||
|
|
||||||
|
# Move to iVentoy ISO directory
|
||||||
|
mv proxmox-mk33-auto.iso /opt/iventoy/iso/
|
||||||
|
mv proxmox-mk34-auto.iso /opt/iventoy/iso/
|
||||||
|
mv proxmox-mk39-auto.iso /opt/iventoy/iso/
|
||||||
|
mv proxmox-mk42-auto.iso /opt/iventoy/iso/
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. In-Place ISO Patching (gfxmode only)
|
||||||
|
|
||||||
|
If you already have remastered ISOs and only need to patch gfxmode:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Extract grub.cfg from ISO, patch, replace in-place
|
||||||
|
ISO=/opt/iventoy/iso/proxmox-mk33-auto.iso
|
||||||
|
xorriso -cpx /boot/grub/grub.cfg /tmp/grub.cfg -< /dev/null -- "$ISO"
|
||||||
|
sed -i 's/set gfxmode=1024x768,640x480/set gfxmode=1024x768/' /tmp/grub.cfg
|
||||||
|
xorriso -boot_image any replay -map /tmp/grub.cfg /boot/grub/grub.cfg -- "$ISO"
|
||||||
|
```
|
||||||
|
|
||||||
|
> The `-boot_image any replay` flag preserves boot properties after file replacement.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Confirm answer URL is embedded
|
||||||
|
strings /opt/iventoy/iso/proxmox-mk33-auto.iso | grep "192.168.10.15"
|
||||||
|
# Expected: http://192.168.10.15:8080/pve/answers/mk33.toml
|
||||||
|
|
||||||
|
# Confirm gfxmode is locked
|
||||||
|
xorriso -cpx /boot/grub/grub.cfg /tmp/verify.cfg -< /dev/null -- /opt/iventoy/iso/proxmox-mk33-auto.iso
|
||||||
|
grep gfxmode /tmp/verify.cfg
|
||||||
|
# Expected: set gfxmode=1024x768
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. iVentoy Configuration
|
||||||
|
|
||||||
|
### Web UI
|
||||||
|
- URL: `http://192.168.27.205:26000`
|
||||||
|
- Go to **ISO Management** → add remastered ISOs
|
||||||
|
|
||||||
|
### MAC Whitelist (Permit Mode)
|
||||||
|
Add node MACs to iVentoy whitelist:
|
||||||
|
```
|
||||||
|
E0-51-D8-1C-5D-56 (MK-33)
|
||||||
|
E0-51-D8-1C-5C-75 (MK-34)
|
||||||
|
PENDING (MK-39)
|
||||||
|
PENDING (MK-42)
|
||||||
|
```
|
||||||
|
|
||||||
|
Nodes must be in whitelist to PXE boot.
|
||||||
|
|
||||||
|
### DHCP Pool
|
||||||
|
- Subnet: `192.168.10.0/27`
|
||||||
|
- Range: `192.168.10.20` to `192.168.10.30`
|
||||||
|
- Nodes get temporary PXE IPs from this pool during install
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Post-Install
|
||||||
|
|
||||||
|
After node installs and reboots:
|
||||||
|
1. Remove node MAC from iVentoy whitelist (node boots from local disk)
|
||||||
|
2. Node gets production IP from `/etc/network/interfaces` (set in answer file)
|
||||||
|
3. Verify: `ping 192.168.7.33` (or appropriate node IP)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. iVentoy Pro Upgrade Notes
|
||||||
|
|
||||||
|
> **Status:** Awaiting private repo link from vendor.
|
||||||
|
|
||||||
|
Expected Pro features (to verify upon upgrade):
|
||||||
|
- Per-MAC ISO binding (may eliminate need for per-node remastered ISOs)
|
||||||
|
- Additional deployment modes
|
||||||
|
- Priority support
|
||||||
|
|
||||||
|
When the private repo link is received:
|
||||||
|
1. Clone the Pro repository
|
||||||
|
2. Review upgrade documentation in the repo
|
||||||
|
3. Backup current `/opt/iventoy/` configuration
|
||||||
|
4. Follow vendor upgrade procedure
|
||||||
|
5. Test with one node before fleet-wide rollout
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove remastered ISO
|
||||||
|
rm /opt/iventoy/iso/proxmox-mk33-auto.iso
|
||||||
|
|
||||||
|
# Re-add stock ISO in iVentoy Web UI
|
||||||
|
# Node will boot stock ISO -- manual install required
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Last updated: 2026-05-31*
|
||||||
165
procedures/pega-prox-deploy.md
Normal file
165
procedures/pega-prox-deploy.md
Normal file
@@ -0,0 +1,165 @@
|
|||||||
|
# Procedure: Deploy PegaProx on Docker Swarm
|
||||||
|
|
||||||
|
**Scope:** Deploy PegaProx (Proxmox VE cluster manager) as a Docker Swarm service on MK7.
|
||||||
|
**Author:** F.R.I.D.A.Y.
|
||||||
|
**Date:** 2026-05-31
|
||||||
|
**Prerequisites:** MK7 Swarm manager active, `traefik-public` overlay network exists.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Create Swarm Compose File
|
||||||
|
|
||||||
|
Save as `/tmp/pegaprox_swarm.yml` on MK7:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: "3.8"
|
||||||
|
services:
|
||||||
|
pegaprox:
|
||||||
|
image: pegaprox/pegaprox:latest
|
||||||
|
deploy:
|
||||||
|
mode: replicated
|
||||||
|
replicas: 1
|
||||||
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.role == manager
|
||||||
|
ports:
|
||||||
|
- target: 5000
|
||||||
|
published: 5000
|
||||||
|
mode: host
|
||||||
|
protocol: tcp
|
||||||
|
- target: 5001
|
||||||
|
published: 5001
|
||||||
|
mode: host
|
||||||
|
protocol: tcp
|
||||||
|
- target: 5002
|
||||||
|
published: 5002
|
||||||
|
mode: host
|
||||||
|
protocol: tcp
|
||||||
|
networks:
|
||||||
|
- traefik-public
|
||||||
|
volumes:
|
||||||
|
- pegaprox-config:/app/config
|
||||||
|
environment:
|
||||||
|
- PEGAPROX_DEBUG=0
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
pegaprox-config:
|
||||||
|
driver: local
|
||||||
|
|
||||||
|
networks:
|
||||||
|
traefik-public:
|
||||||
|
external: true
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Critical:** `mode: host` is required. `ingress` mode breaks WebSocket VNC/SSH consoles because Swarm ingress routing does not support WebSocket upgrade properly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Deploy Stack
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh jarvis@mk7.ai.home
|
||||||
|
docker stack deploy -c /tmp/pegaprox_swarm.yml pegaprox
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify:
|
||||||
|
```bash
|
||||||
|
docker service ls | grep pegaprox
|
||||||
|
docker ps | grep pegaprox
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Verify Service Health
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# HTTPS API
|
||||||
|
curl -sk https://192.168.7.7:5000/api/health
|
||||||
|
|
||||||
|
# Check container logs
|
||||||
|
docker logs $(docker ps -q -f name=pegaprox)
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `{"status":"ok"}`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. First Login & Password Change
|
||||||
|
|
||||||
|
1. Open `https://192.168.7.7:5000`
|
||||||
|
2. Login with default credentials:
|
||||||
|
- Username: `pegaprox`
|
||||||
|
- Password: `admin`
|
||||||
|
3. System will force password change on first login
|
||||||
|
4. API returns: `{"security_warning":"DEFAULT_PASSWORD","requires_password_change":true}`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. API Notes for Automation
|
||||||
|
|
||||||
|
### CSRF Protection
|
||||||
|
All state-changing API calls (POST/PUT/PATCH/DELETE) must include:
|
||||||
|
```
|
||||||
|
X-Requested-With: XMLHttpRequest
|
||||||
|
```
|
||||||
|
|
||||||
|
Exempt paths (no CSRF header needed):
|
||||||
|
- `/api/auth/login`
|
||||||
|
- `/api/auth/setup`
|
||||||
|
- `/api/auth/oidc/*`
|
||||||
|
- `/api/auth/check`
|
||||||
|
- `/api/auth/validate`
|
||||||
|
- `/api/auth/logout`
|
||||||
|
- `/api/health`
|
||||||
|
- `/api/webauthn/auth/begin`
|
||||||
|
|
||||||
|
### Add Cluster
|
||||||
|
```bash
|
||||||
|
curl -sk -X POST https://192.168.7.7:5000/api/clusters \
|
||||||
|
-b cookies.txt \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "X-Requested-With: XMLHttpRequest" \
|
||||||
|
-d '{
|
||||||
|
"name": "MK33",
|
||||||
|
"host": "192.168.7.33",
|
||||||
|
"user": "root@pam",
|
||||||
|
"pass": "YOUR_PVE_PASSWORD"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
> **CRITICAL:** `host` must be **bare IP only**. Do NOT append `:8006`. PegaProx appends the port internally. Supplying `192.168.7.33:8006` causes URL parse failure: `Failed to parse: https://[192.168.7.33:8006]:8006/...`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Backup Volume
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backup PegaProx config + DB
|
||||||
|
docker run --rm -v pegaprox_pegaprox-config:/src -v /tmp:/dst alpine \
|
||||||
|
tar czf /dst/pegaprox-config-$(date +%Y%m%d).tar.gz -C /src .
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Known Issues
|
||||||
|
|
||||||
|
| Issue | Cause | Fix |
|
||||||
|
|-------|-------|-----|
|
||||||
|
| WebSocket VNC/SSH broken | Swarm `ingress` mode strips upgrade headers | Use `mode: host` |
|
||||||
|
| URL parse error on add-cluster | `:8006` appended to host field | Use bare IP only |
|
||||||
|
| CSRF 403 on API calls | Missing `X-Requested-With` header | Add header to all state-changing calls |
|
||||||
|
| Self-signed cert warning | No CA-signed cert deployed | Accept in browser or deploy custom cert |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh jarvis@mk7.ai.home
|
||||||
|
docker stack rm pegaprox
|
||||||
|
docker volume rm pegaprox_pegaprox-config # WARNING: destroys all data
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Last updated: 2026-05-31*
|
||||||
2
swarm.md
2
swarm.md
@@ -29,7 +29,7 @@ All services deployed on MK7 manager via `docker stack deploy`.
|
|||||||
| `portainer` | Portainer CE | replicated | 1/1 | `9000` | `portainer.ai.home` |
|
| `portainer` | Portainer CE | replicated | 1/1 | `9000` | `portainer.ai.home` |
|
||||||
| `prometheus` | Prometheus | replicated | 1/1 | `9090` | `prom.ai.home` |
|
| `prometheus` | Prometheus | replicated | 1/1 | `9090` | `prom.ai.home` |
|
||||||
| `technitium` | Technitium DNS | replicated | 1/1 | `53/tcp`, `53/udp`, `5380` | `dns.ai.home` |
|
| `technitium` | Technitium DNS | replicated | 1/1 | `53/tcp`, `53/udp`, `5380` | `dns.ai.home` |
|
||||||
| `adguard` | AdGuard Home | replicated | 1/1 | `3000`, `30053` | `adguard.ai.home` |
|
| ~~`adguard`~~ | ~~AdGuard Home~~ | ~~removed~~ | ~~—~~ | ~~—~~ | ~~`adguard.ai.home`~~ |
|
||||||
| ~~authelia~~ | ~~Authelia~~ | ~~deferred~~ | — | — | ~~`auth.ai.home`~~ |
|
| ~~authelia~~ | ~~Authelia~~ | ~~deferred~~ | — | — | ~~`auth.ai.home`~~ |
|
||||||
|
|
||||||
> **Note:** Authelia deferred until local TLS is available (requires `https://auth.ai.home`).
|
> **Note:** Authelia deferred until local TLS is available (requires `https://auth.ai.home`).
|
||||||
|
|||||||
Reference in New Issue
Block a user