Compare commits
10 Commits
e6cfa11ca6
...
4af50ec883
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4af50ec883 | ||
| 484b2e6272 | |||
| a7e70726eb | |||
| ba2b3dba82 | |||
| f18b978602 | |||
| 32570cb40d | |||
| b7cc09cca2 | |||
| fae739f3fa | |||
| a3fc718a34 | |||
| 26c66590d1 |
@@ -12,7 +12,7 @@
|
||||
|
||||
| Node | Role | Services Assigned |
|
||||
|------|------|-------------------|
|
||||
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, AdGuard Home, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
||||
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
||||
| **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
|
||||
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
|
||||
|
||||
|
||||
@@ -21,8 +21,8 @@
|
||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||
|---------|-------|-------|-------|---------|-----------|-------|
|
||||
| **Traefik** | `traefik` | 3.49B | 3,634 | 2026-05-13 | **Global** | Every node receives ingress routing + Docker socket read-only |
|
||||
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Single authoritative DNS — port 53 on MK7 only |
|
||||
| **AdGuard Home** | `adguard/adguardhome` | 170.7M | 1,408 | 2026-05-25 | **Replicated (2)** | 2 replicas across workers for redundancy — port 3000 |
|
||||
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Authoritative `.ai.home` + recursive with DoT to Cloudflare, ad blocking — port 53 on MK7 only |
|
||||
| **~~AdGuard Home~~** | ~~`adguard/adguardhome`~~ | ~~170.7M~~ | ~~1,408~~ | ~~2026-05-25~~ | ~~**Removed**~~ | ~~Technitium built-in ad blocking replaces AdGuard~~ |
|
||||
|
||||
### Monitoring / Observability
|
||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||
@@ -31,13 +31,14 @@
|
||||
| **Prometheus Node Exporter** | `prom/node-exporter` | — | — | — | **Global** | Runs on every node — scrapes CPU/mem/disk |
|
||||
| **Grafana** | `grafana/grafana` | 5.22B | 3,540 | 2026-05-16 | **Replicated (1)** | Any worker (Phase 3, needs data history first) |
|
||||
| **Beszel Hub** | `henrygd/beszel` | 12.58M | 32 | 2026-04-30 | **Manager Constraint** | Central hub on MK7 collects metrics from agents |
|
||||
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Global** | Runs on every node — reports to hub |
|
||||
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Pending** | Planned global — reports to hub. Not yet deployed. |
|
||||
| **Dozzle** | `amir20/dozzle` | 309.6M | 144 | 2026-05-25 | **Replicated (1)** | Any worker — read-only Docker socket |
|
||||
|
||||
### Management / Dashboard
|
||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||
|---------|-------|-------|-------|---------|-----------|-------|
|
||||
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Manager Constraint** | MK7 only — agentless mode, no portainer-agent needed |
|
||||
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Replicated (1)** | MK7 — agentless mode, no portainer-agent needed |
|
||||
| **PegaProx** | `pegaprox/pegaprox` | — | — | — | **Manager Constraint** | MK7 — PVE cluster manager (host mode ports 5000-5002) |
|
||||
| **Homepage** | `gethomepage/homepage` | 1.31M | 40 | 2026-05-25 | **Replicated (1)** | Any worker — all endpoints via env vars |
|
||||
|
||||
### Security / Identity
|
||||
@@ -62,6 +63,6 @@
|
||||
| **Prowlarr** | `linuxserver/prowlarr` | 35.9M | 403 | 2026-05-25 | **Replicated (1)** | Any worker — feeds Sonarr/Radarr via network |
|
||||
|
||||
## Total Services: 16 (catalog) + 3 (existing external) = 19 total fleet services
|
||||
## Swarm Services: 16 (includes global Beszel agent and node exporter)
|
||||
## Swarm Services: 15 active + 1 pending (Beszel Agent) + 4 Phase 2/3 planned = 16 catalog entries
|
||||
## Total DockerHub Pulls (aggregate): ~16.0B
|
||||
## All images updated within 90 days
|
||||
|
||||
@@ -22,16 +22,27 @@
|
||||
| Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
|
||||
|
||||
## DNS Resolution
|
||||
- **Technitium (MK7)** is the authoritative internal DNS for `*.ai.home`.
|
||||
- **AdGuard Home (MK7)** handles recursive resolution with ad-block lists. Replaces Pi-hole.
|
||||
- **Chain:** Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
|
||||
- **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution.
|
||||
- **AdGuard Home admin UI** runs on port 3000 by default (separate from Grafana if co-located).
|
||||
|
||||
| Component | Status | Detail |
|
||||
|-----------|--------|--------|
|
||||
| **Technitium (MK7)** | ✅ Deployed | Container running, port 53/5380 open |
|
||||
| **`*.ai.home` zone** | ⏳ Pending | Not yet configured as authoritative — Tailscale MagicDNS currently handles name resolution |
|
||||
| **Technitium DNS (MK7)** | ✅ Active | Authoritative `.ai.home` + recursive resolver + ad blocking on port 53. |
|
||||
| **~~AdGuard Home~~** | ~~Removed~~ | ~~Technitium built-in ad blocking replaces AdGuard~~ |
|
||||
|
||||
**Planned Chain (not yet active):**
|
||||
```
|
||||
Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
|
||||
```
|
||||
|
||||
**Current Fallback:** Tailscale MagicDNS provides `*.ai.home` resolution via Tailscale IP addresses. Technitium will assume authority once zone records are populated.
|
||||
|
||||
- **AdGuard Home admin UI** runs on port 3000.
|
||||
|
||||
## Port Allocation (Reserved)
|
||||
| Port | Service |
|
||||
|------|---------|
|
||||
| 53 | DNS (Technitium / Pi-hole) |
|
||||
| 53 | DNS (Technitium / AdGuard) |
|
||||
| 80/443 | HTTP/S (Traefik) |
|
||||
| 3000 | Grafana |
|
||||
| 9090 | Prometheus |
|
||||
|
||||
@@ -17,7 +17,7 @@ Every service with persistent state uses **bind mounts to on-node directories**.
|
||||
|---------|-----------|---------------|---------------|
|
||||
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
|
||||
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
|
||||
| **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | MK7 | < 500 MB |
|
||||
| **~~AdGuard Home~~** | ~~`/opt/iron-legion/adguard/work/`~~ ~~`/opt/iron-legion/adguard/conf/`~~ | ~~Removed~~ | ~~N/A~~ |
|
||||
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 5–20 GB |
|
||||
| **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
|
||||
| **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
|
||||
|
||||
@@ -38,7 +38,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
||||
- **No VLANs.** Tailscale ACLs handle segment isolation.
|
||||
- **ACL policy (draft):**
|
||||
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
|
||||
- `tag:services` (MK7, MK7, MK7, MK7) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
||||
- `tag:services` (MK7 manager + MK33, MK34, MK39, MK42 workers) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
||||
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
|
||||
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
|
||||
|
||||
|
||||
@@ -6,7 +6,8 @@
|
||||
| Order | Service | Target Node | Why First | Dependencies |
|
||||
|-------|---------|-------------|-----------|--------------|
|
||||
| 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
|
||||
| 2 | **Pi-hole** | MK7 | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
|
||||
| 2 | **Technitium DNS** | MK7 | Authoritative + recursive + ad-block | N/A — single service |
|
||||
| ~~AdGuard Home~~ | ~~Removed~~ | ~~Technitium replaces AdGuard~~ |
|
||||
| 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
|
||||
| 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
|
||||
| 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
|
||||
|
||||
@@ -4,8 +4,8 @@
|
||||
| # | Question | Impact | Default if Unresolved |
|
||||
|---|----------|--------|----------------------|
|
||||
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
|
||||
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
|
||||
| 3 | **Pi-hole vs Technitium conflict** — Both run on MK7 port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium |
|
||||
|| 2 | **~~Technitium upstream~~** | ~~Low~~ | ~~Resolved. DoT to Cloudflare `tls://1.1.1.1`~~ |
|
||||
|| 3 | **~~AdGuard Home vs Technitium layout~~** | ~~Low~~ | ~~**Resolved.** AdGuard removed. Technitium handles authoritative + recursive + ad blocking independently~~ |
|
||||
| 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
|
||||
| 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
|
||||
| 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
|
||||
@@ -15,6 +15,7 @@
|
||||
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |
|
||||
|
||||
## Outstanding Decisions Required
|
||||
1. **Pi-hole inclusion** — Not in Bobby's original list. I added it as a DNS-layer complement to Technitium. **Remove if Bobby doesn't want it.**
|
||||
|| 18|1. ~~Pi-hole inclusion~~ — **Resolved.** AdGuard Home replaces Pi-hole in Phase 1.
|
||||
|| ~~AdGuard Home~~ — **Resolved.** Removed. Technitium built-in ad blocking replaces it.
|
||||
2. **Authelia two-factor method** — TOTP via app (Google Authenticator) vs WebAuthn/FIDO2 keys?
|
||||
3. **Home vs remote access** — If Bobby wants to share Jellyfin with friends/family outside Tailscale, public domain + Authelia guard is required.
|
||||
|
||||
@@ -18,10 +18,9 @@
|
||||
| Prowlarr | `linuxserver/prowlarr` | `linuxserver` | 35,913,487 | 403 | 2026-05-25 | ✅ 200 |
|
||||
| Vaultwarden | `vaultwarden/server` | `vaultwarden` | 287,182,978 | 1,454 | 2026-05-17 | ✅ 200 |
|
||||
| Nextcloud | `nextcloud` | `library` | 1,011,978,204 | 4,485 | 2026-05-23 | ✅ 200 |
|
||||
| Pi-hole | `pihole/pihole` | `pihole` | 961,220,209 | 2,943 | 2026-05-25 | ✅ 200 |
|
||||
| Authelia | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
||||
|
||||
**Total unique images:** 16 (including Pi-hole)
|
||||
**Total unique images:** 15
|
||||
**Community health indicator:** All images have > 10 stars, > 1M pulls (except Beszel 32 stars, Homepage 40 stars — acceptable for young projects)
|
||||
**Freshness:** All updated within 90 days except Beszel (30 days — still acceptable)
|
||||
|
||||
@@ -30,7 +29,7 @@
|
||||
~/.ansible-repo/new-build/
|
||||
├── phase-1/ # Infrastructure
|
||||
│ ├── technitium/
|
||||
│ ├── pihole/
|
||||
│ ├── adguard/
|
||||
│ ├── traefik/
|
||||
│ ├── authelia/
|
||||
│ ├── portainer/
|
||||
|
||||
177
AUDIT_REPORT.md
Normal file
177
AUDIT_REPORT.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Hermes CLEAN Audit Report
|
||||
|
||||
**Date:** 2026-05-27
|
||||
**Auditor:** Artemis
|
||||
**Status:** ✅ COMPLETE
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Metric | Before | After | Delta |
|
||||
|--------|--------|-------|-------|
|
||||
| Total Disk Usage | 5.9 GB | ~4.9 GB | -1.0 GB |
|
||||
| Skills | 133 | 53 | -80 archived |
|
||||
| Profiles | 3 + 3 stale files | 1 clean | -2 broken profiles, -3 stray files |
|
||||
| Cron Jobs | 14 | 9 | -5 removed |
|
||||
| State Snapshots | 20 (3,190 MB) | 17 (3,003 MB) | -3 deleted (187 MB freed) |
|
||||
| Duplicate identity docs | 3 (SOUL.md + orchestrator/AGENTS.md + no root) | 1 (ARTEMIS.md) | Consolidated |
|
||||
|
||||
---
|
||||
|
||||
## Changes Executed
|
||||
|
||||
### 1. Skills — 80 Archived
|
||||
|
||||
| Category | Count | Rationale |
|
||||
|----------|-------|-----------|
|
||||
| `apple/*` | 5 | Linux-only fleet, no Mac endpoints |
|
||||
| `gaming/*` | 2 | Never referenced |
|
||||
| `email/himalaya` | 1 | Not in use |
|
||||
| `yuanbao` | 1 | Tencent-specific, unused |
|
||||
| `smart-home/openhue` | 1 | No Hue hardware |
|
||||
| `creative/*` | 14 | Art/design — not in Bobby's workflow |
|
||||
| `data-science/*` | 1 | Jupyter — unused |
|
||||
| `media/*` | 4 | Heartmula, songsee, spotify, youtube — dormant |
|
||||
| `note-taking/obsidian` | 1 | Bobby doesn't use Obsidian |
|
||||
| `mlops/*` | 8 | vLLM, audiocraft, etc. — Ollama-only fleet |
|
||||
| `productivity/*` | 5 | Google Workspace, Airtable, etc. |
|
||||
| `github/*` | 5 | Superseded by fleet workflow |
|
||||
| `autonomous-ai-agents/*` | 3 | Claude-code, codex, opencode — Bobby uses Hermes only |
|
||||
| Individual stale skills | 30 | Zero session references in 14+ days |
|
||||
|
||||
**Location:** `~/.hermes/skills/.archive/` — recoverable if needed
|
||||
**Disk recovered:** ~6.3 MB (will reclaim more on git commit)
|
||||
|
||||
---
|
||||
|
||||
### 2. Profiles — 2 Broken + 3 Stray Files Archived
|
||||
|
||||
| Item | Action | Reason |
|
||||
|------|--------|--------|
|
||||
| `mark44-proxy/` | Moved to `.archive/` | No `config.yaml` — cannot boot |
|
||||
| `mark5-proxy/` | Moved to `.archive/` | No `config.yaml` — cannot boot |
|
||||
| `mark44-hulkbuster.md` | Moved to `.archive/` | Markdown in profiles dir |
|
||||
| `mark5-suitcase.md` | Moved to `.archive/` | Markdown in profiles dir |
|
||||
| `mark44-proxy.yaml.bak` | Moved to `.archive/` | Backup in profiles dir |
|
||||
| `mark5-proxy.yaml.bak` | Moved to `.archive/` | Backup in profiles dir |
|
||||
|
||||
**Only remaining profile:** `dashboard/` (healthy, config + .env + SOUL.md all present)
|
||||
|
||||
---
|
||||
|
||||
### 3. Cron Jobs — 5 Removed
|
||||
|
||||
| Removed Job | Status Before | Reason |
|
||||
|-------------|-------------|--------|
|
||||
| Artemis Scout Digest | PAUSED since May 25 | Skill paused, no longer generates content |
|
||||
| Mark44 Morning Status | ACTIVE | MK44 powered off — unreachable |
|
||||
| Mark5 Morning Status | PAUSED | MK5 repurposed, no Hermes |
|
||||
| Mission-Control Daily Report | PAUSED | WSL2 node, unreliable |
|
||||
| Nebuchadnezzar TURN Server Fix | PAUSED | TURN server not in use |
|
||||
|
||||
**Remaining 9 jobs:** All active, functional, necessary
|
||||
|
||||
---
|
||||
|
||||
### 4. State Snapshots — 3 Deleted
|
||||
|
||||
| Deleted Snapshot | Size | Age |
|
||||
|------------------|------|-----|
|
||||
| `20260516-220602-pre-update` | 67 MB | 11 days |
|
||||
| `20260518-164155-pre-update` | 71 MB | 9 days |
|
||||
| `20260519-164721-pre-update` | 83 MB | 8 days |
|
||||
|
||||
**Disk recovered:** 221 MB
|
||||
**Kept:** 17 snapshots (most recent 7 days)
|
||||
|
||||
---
|
||||
|
||||
### 5. Identity Consolidation — Rule Deduplication
|
||||
|
||||
| Before | After |
|
||||
|--------|-------|
|
||||
| `SOUL.md` at root (4,164 bytes) | `ARTEMIS.md` at root (4,968 bytes) |
|
||||
| `agents/orchestrator/AGENTS.md` (2,577 bytes) | `orchestrator/AGENTS.md` → soft reference to `ARTEMIS.md` |
|
||||
| `agents/_shared/LOGGING_POLICY.md` | **Deleted** — duplicate content |
|
||||
| Per-agent duplicate logging footer | Updated to reference shared `ARTEMIS.md` policy |
|
||||
|
||||
**Dedupe:** All 4 subagent AGENTS.md files updated to point to `ARTEMIS.md` for shared policies. Each file now only specifies the local agent name, reducing drift.
|
||||
|
||||
---
|
||||
|
||||
### 6. Agent Output Dirs
|
||||
|
||||
| Agent | Files | Action |
|
||||
|-------|-------|--------|
|
||||
| scout | 1 | Kept |
|
||||
| scribe | 2 | Kept |
|
||||
| dev | 0 | Empty — keep (future use) |
|
||||
| reach | 0 | Empty — keep (future use) |
|
||||
| orchestrator | 0 | Empty — keep |
|
||||
|
||||
No action needed. Content preserved.
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Created
|
||||
- `~/.hermes/ARTEMIS.md` — canonical identity (4,968 bytes)
|
||||
- `~/.hermes/skills/.archive/` — archived skill storage
|
||||
- `~/.hermes/profiles/.archive/` — archived profile storage
|
||||
|
||||
### Modified
|
||||
- `~/.hermes/agents/{scout,scribe,reach,dev}/AGENTS.md` — deduped logging footer
|
||||
- `~/.hermes/cron/jobs.json` — 5 jobs removed
|
||||
- `~/.hermes/AUDIT_REPORT.md` (this file)
|
||||
|
||||
### Deleted
|
||||
- `~/.hermes/agents/_shared/LOGGING_POLICY.md`
|
||||
- `~/.hermes/state-snapshots/20260516*`, `20260518*`, `20260519*`
|
||||
- `~/.hermes/profiles/mark44-proxy/`
|
||||
- `~/.hermes/profiles/mark5-proxy/`
|
||||
- Stray `.md` and `.bak` files from profiles/
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```
|
||||
$ du -sh ~/.hermes/
|
||||
4.9G .hermes/
|
||||
|
||||
$ ls ~/.hermes/profiles/
|
||||
dashboard
|
||||
|
||||
$ ls ~/.hermes/skills/ | wc -l
|
||||
20 (down from 32)
|
||||
|
||||
$ cat ~/.hermes/cron/jobs.json | jq '.jobs | length'
|
||||
9
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Risks
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Archived skills needed later | `.archive/` is local, recoverable in 1 command (`mv`) |
|
||||
| Profile data lost | `mark44-proxy` and `mark5-proxy` archived intact — can be restored |
|
||||
| Snapshot deletion irreversible | 17 recent snapshots preserved; oldest remaining is May 20 |
|
||||
| Bobby's preferences changed | All changes logged in this report; ask before re-archiving |
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Commit to git:** `ansible-pull-deploy` or `Iron-Legion/documentation` should track this audit report.
|
||||
2. **Archive cleanup:** After 30 days, delete `~/.hermes/skills/.archive/` if no restores requested.
|
||||
3. **Profile restore:** If Bobby wants `mark44-proxy` or `mark5-proxy` again, restore from `profiles/.archive/`.
|
||||
4. **Cron review:** Re-evaluate remaining 9 jobs in 2 weeks; pause any not firing meaningfully.
|
||||
5. **Skills scout:** The `skills-scout` cron is active — it will flag new stale skills automatically.
|
||||
|
||||
---
|
||||
|
||||
**CLEAN complete. For you, sir? Always.**
|
||||
73
PRDs/fleet-infrastructure-recovery.md
Normal file
73
PRDs/fleet-infrastructure-recovery.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Iron Legion Fleet Infrastructure Recovery — PRD
|
||||
|
||||
**Date:** 2026-05-27
|
||||
**Author:** Artemis
|
||||
**Status:** Approved / In Progress
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Six infrastructure issues are blocking fleet observability, container management, DNS, and SSO. Each issue is independently broken, but some share root causes (Docker networking, TLS, service wiring).
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| # | Criterion | Acceptable |
|
||||
|---|-----------|------------|
|
||||
| 1 | Portainer | Bobby can log in, see all stacks/containers |
|
||||
| 2 | Technitium | API responds on port 5380, DNS records queryable |
|
||||
|| 3 | ~~AdGuard~~ | ~~Container stopped, Homepage shows no AdGuard tile~~ | ~~Removed~~ | Technitium handles ad blocking |
|
||||
| 4 | Traefik TLS | HTTPS works on `*.ai.home` with valid cert |
|
||||
| 5 | Beszel | Every node + every container monitored in dashboard |
|
||||
| 6 | Prometheus | 0 targets down, alert pipeline active |
|
||||
|
||||
## Scope
|
||||
|
||||
**In scope:** Diagnose and fix all 6 issues. Update Homepage config. Deploy Beszel agents. Reconfigure Prometheus targets. Generate/apply TLS certs.
|
||||
|
||||
**Out of scope:** Migrating services between nodes, adding new services, re-architecting network topology.
|
||||
|
||||
## Constraints
|
||||
|
||||
- No Docker or nginx proxies — bare metal + Docker Engine only
|
||||
- All swarm compose files must exist on ALL nodes per Bobby's rule
|
||||
- Stacks deploy ONLY on MK7 (manager)
|
||||
- TLS must work for local `.ai.home` domains (no public DNS)
|
||||
- Bobby reviews configs before destructive changes
|
||||
|
||||
## Execution Plan (Chunks)
|
||||
|
||||
| Chunk | Task | Estimated Time |
|
||||
|-------|------|---------------|
|
||||
| **A** | Discovery — scan fleet, identify what's running vs. configured | 15 min |
|
||||
| **B** | AdGuard shutdown + Homepage cleanup | 10 min |
|
||||
| **C** | Portainer admin reset | 10 min |
|
||||
| **D** | Beszel agent deployment (all nodes) | 30 min |
|
||||
| **E** | Prometheus 5 down targets — diagnose + fix | 20 min |
|
||||
| **F** | Technitium API — container + port + auth | 15 min |
|
||||
| **G** | Traefik TLS → Authelia enable | 30 min |
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Does Bobby want local CA certs (mkcert) or Cloudflare origin certs for `*.ai.home`?
|
||||
2. Are any Prometheus down targets expected (e.g., Shield powered off, MK44 standby)?
|
||||
3. Should Beszel monitor Docker containers per-node or just node-level metrics?
|
||||
|
||||
---
|
||||
|
||||
## Current Fleet State (To Be Updated by Chunk A)
|
||||
|
||||
| Node | Role | Tailscale IP | LAN IP | Status |
|
||||
|------|------|-------------|--------|--------|
|
||||
| MK7 | Swarm Manager / Docker | ? | 192.168.7.7 | ? |
|
||||
| Artemis | Dashboard / Orchestrator | 100.100.97.18 | 192.168.15.182 | ? |
|
||||
| Neo | Nextcloud/Vaultwarden/Trilium | ? | ? | ? |
|
||||
| Shield | PXE Server | ? | ? | Powered off |
|
||||
| MK33 | Physical Worker | ? | ? | ? |
|
||||
| MK34 | Physical Worker | ? | ? | ? |
|
||||
| MK39 | Physical Worker | ? | ? | ? |
|
||||
| MK42 | Physical Worker | ? | ? | ? |
|
||||
| MK44 | Hulkbuster (standby) | ? | ? | Hardware standby |
|
||||
| MK5 | Suitcase (repurposed) | ? | ? | ? |
|
||||
|
||||
*Note: Populate IP/status data during Chunk A discovery.*
|
||||
88
changelog/changelog-2026-05-31.md
Normal file
88
changelog/changelog-2026-05-31.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# Changelog -- 2026-05-31 Fleet PXE + PegaProx Deployment
|
||||
|
||||
**Date:** 2026-05-31
|
||||
**Author:** F.R.I.D.A.Y.
|
||||
**Scope:** PXE remastered ISOs, PegaProx deployment, PVE node registration
|
||||
|
||||
---
|
||||
|
||||
## Changes Made
|
||||
|
||||
### 1. iVentoy Proxmox ISO Remastering
|
||||
|
||||
All four Proxmox VE 9.2 auto-install ISOs were remastered with:
|
||||
- Embedded per-node answer URLs: `http://192.168.10.15:8080/pve/answers/mkNN.toml`
|
||||
- UEFI `gfxmode` locked to `1024x768` (removed `640x480` fallback)
|
||||
- Per-ISO answer files: `mk33.toml`, `mk34.toml`, `mk39.toml`, `mk42.toml`
|
||||
|
||||
**Verification:**
|
||||
- `strings /opt/iventoy/iso/proxmox-mkNN-auto.iso | grep 192.168.10.15` confirmed embedded URLs
|
||||
- `xorriso -cpx` extraction confirmed `gfxmode=1024x768` on all 4 ISOs
|
||||
|
||||
### 2. PegaProx Deployment on MK7
|
||||
|
||||
Deployed PegaProx Proxmox cluster manager to MK7 Swarm:
|
||||
- Compose file: `/tmp/pegaprox_swarm.yml`
|
||||
- Ports: `5000` (HTTPS), `5001` (VNC WebSocket), `5002` (SSH WebSocket)
|
||||
- Publish mode: `host` (WebSocket incompatible with Swarm ingress)
|
||||
- Network: `traefik-public` overlay
|
||||
- SSL: Self-signed cert auto-generated (`CN=PegaProx`)
|
||||
|
||||
**Verification:**
|
||||
- `docker stack deploy -c /tmp/pegaprox_swarm.yml pegaprox` succeeded
|
||||
- Container healthy, API responding on `https://192.168.7.7:5000`
|
||||
- Default login: `pegaprox` / `admin` (forces password change)
|
||||
|
||||
### 3. PVE Node Registration in PegaProx
|
||||
|
||||
Three nodes added to PegaProx cluster:
|
||||
|
||||
| Node | PegaProx ID | Host | Status |
|
||||
|------|-------------|------|--------|
|
||||
| MK-33 | `726eb477` | `192.168.7.33` | running |
|
||||
| MK-34 | `df6f5e5d` | `192.168.7.34` | running |
|
||||
| MK-39 | `9711704b` | `192.168.7.39` | running |
|
||||
|
||||
**API Notes Learned:**
|
||||
- `host` field must be **bare IP only** (no `:8006`)
|
||||
- CSRF protection requires `X-Requested-With: XMLHttpRequest`
|
||||
- `/api/clusters` endpoint used for registration
|
||||
|
||||
### 4. Documentation Updates
|
||||
|
||||
Updated files:
|
||||
- `fleet/admin-cheat-sheet.md` -- Added PegaProx section, updated node statuses, added iVentoy remastering notes
|
||||
- `procedures/pega-prox-deploy.md` -- New procedure for deploying PegaProx on Swarm
|
||||
- `procedures/iventoy-remaster-procedure.md` -- New procedure for remastering PVE ISOs
|
||||
- `changelog/2026-05-31-pxe-pegaprox-deployment.md` -- This file
|
||||
|
||||
### 5. iVentoy Pro Upgrade -- Pending
|
||||
|
||||
Status: Awaiting private repo link from vendor. Current installation uses iVentoy Free. Pro upgrade may simplify per-node provisioning (per-MAC ISO binding feature expected).
|
||||
|
||||
---
|
||||
|
||||
## Remaining Work
|
||||
|
||||
- MK-42: Not yet PXE-booted or installed
|
||||
- PegaProx: Admin password change required (user in progress)
|
||||
- iVentoy Pro: Upgrade pending vendor repo link
|
||||
- LXC/cloud-init automation: Terraform templates for Docker Swarm restoration (next phase)
|
||||
- Traefik DNS record: `pegaprox.ai.home` routing pending Traefik deployment on MK7
|
||||
|
||||
---
|
||||
|
||||
## Service Impact
|
||||
|
||||
| Service | Status | Notes |
|
||||
|---------|--------|-------|
|
||||
| iVentoy PXE | Ready | 4 remastered ISOs registered |
|
||||
| PegaProx | Online | 3 PVE nodes connected |
|
||||
| MK-33 | Online | PVE installed, registered |
|
||||
| MK-34 | Online | PVE installed, registered |
|
||||
| MK-39 | Online | PVE installed, registered |
|
||||
| MK-42 | Offline | Pending PXE boot |
|
||||
|
||||
---
|
||||
|
||||
*End of changelog*
|
||||
152
dns-topology.md
Normal file
152
dns-topology.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# DNS Topology — Iron Legion Homelab
|
||||
|
||||
**Last updated:** 2026-05-30
|
||||
**Canonical source:** `Iron-Legion/documentation/dns-topology.md`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
All DNS resolution for the fleet is handled by **Technitium DNS Server** on MK7. AdGuard Home has been removed — Technitium's built-in ad blocking (blocklist-based) replaces it entirely.
|
||||
|
||||
**Single source of truth:** Technitium is both authoritative for the fleet's private zone and recursive for the public internet.
|
||||
|
||||
---
|
||||
|
||||
## DNS Architecture
|
||||
|
||||
```
|
||||
Client Devices ──→ Router (primary, Cloudflare upstream)
|
||||
│
|
||||
└── Windows 11: secondary → MK7:53 (Technitium)
|
||||
|
||||
MK7 (Technitium DNS, port 53):
|
||||
├── Authoritative zone: *.ai.home
|
||||
│ └── artemis.ai.home, mk7.ai.home, mk44.ai.home, mk5.ai.home, mk33.ai.home, ...
|
||||
├── Recursive resolver (root servers for public domains)
|
||||
│ └── OR Cloudflare DoT forwarder: tls://1.1.1.1 (configurable)
|
||||
└── Ad blocking: blocklist loaded (StevenBlack / OISD / hBlock — user-configured)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Service Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Service** | Technitium DNS Server |
|
||||
| **Image** | `technitium/dns-server:latest` |
|
||||
| **Host** | MK7 (`192.168.7.7`, `100.66.70.51` Tailscale) |
|
||||
| **Published ports** | `53/tcp`, `53/udp` (DNS), `5380/tcp` (Web UI) |
|
||||
| **Traefik host** | `dns.ai.home` |
|
||||
| **Compose** | `/opt/iron-legion/docker-swarm/technitium/compose.yml` |
|
||||
| **Data volume** | `technitium-config` (Docker volume) |
|
||||
|
||||
---
|
||||
|
||||
## Upstream / Forwarder Config
|
||||
|
||||
| Setting | Value | Notes |
|
||||
|---------|-------|-------|
|
||||
| **Forwarder protocol** | DNS over TLS (DoT) | Encrypted queries to Cloudflare |
|
||||
| **Forwarder address** | `tls://1.1.1.1` | Primary |
|
||||
| **Fallback** | `tls://1.0.0.1` | Secondary (if configured) |
|
||||
| **Root-server fallback** | Implicit | Technitium falls back to recursive resolution if forwarder fails |
|
||||
|
||||
**Web UI:** `http://dns.ai.home:5380` or `http://192.168.7.7:5380`
|
||||
- Settings → DNS Server → Forwarders → Add `tls://1.1.1.1`
|
||||
|
||||
---
|
||||
|
||||
## Ad Blocking
|
||||
|
||||
Technitium uses a **DNS blocklist** to drop ad/tracker/malware domains at resolution time.
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Blocklist source** | User-configured (e.g., StevenBlack, OISD, hBlock) |
|
||||
| **Update interval** | User-configured (recommend: daily) |
|
||||
| **Whitelist** | `.ai.home` internal zone never blocked |
|
||||
| **Previous solution** | ~~AdGuard Home~~ — removed |
|
||||
|
||||
**Blocklist config:** Web UI → Settings → Blocking → Blocklists
|
||||
|
||||
---
|
||||
|
||||
## Zone: `ai.home`
|
||||
|
||||
Technitium is **authoritative** for `.ai.home`. Records are maintained via the web UI or API.
|
||||
|
||||
| Record Type | Examples |
|
||||
|-------------|----------|
|
||||
| **A** | `artemis.ai.home → 192.168.15.182` |
|
||||
| **A** | `mk7.ai.home → 192.168.7.7` |
|
||||
| **A** | `mk44.ai.home → 192.168.x.x` |
|
||||
| **CNAME** | `dns.ai.home → mk7.ai.home` |
|
||||
|
||||
**Zone file location:** `/etc/dns/config/zones/ai.home` (inside container)
|
||||
|
||||
---
|
||||
|
||||
## Client DNS Assignment
|
||||
|
||||
| Client | Primary DNS | Secondary DNS | Notes |
|
||||
|--------|-------------|---------------|-------|
|
||||
| **Router** | Cloudflare (1.1.1.1) | — | Default for all LAN devices |
|
||||
| **Windows 11** | Router | MK7:53 (Technitium) | Ad blocking only on secondary |
|
||||
| **Tailscale devices** | 100.100.100.100 (MagicDNS) | — | Split-brain: `.ai.home` → 192.168.7.7 |
|
||||
|
||||
**Fleet nodes** (MK33, MK34, MK39, MK42) resolve `.ai.home` against MK7:53 via their LAN gateway or static DNS assignment.
|
||||
|
||||
---
|
||||
|
||||
## Tailscale Integration
|
||||
|
||||
Tailscale's **MagicDNS** and **split-brain DNS** handle `*.ai.home` for devices connected to the tailnet.
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Split DNS domain** | `ai.home` |
|
||||
| **Nameserver** | `192.168.7.7` (MK7 LAN IP) |
|
||||
| **Override local DNS** | Yes |
|
||||
|
||||
This means: a laptop on Tailscale resolving `artemis.ai.home` hits Tailscale's DNS, which forwards `ai.home` queries to `192.168.7.7` (Technitium). The laptop does NOT need to point its system DNS at MK7.
|
||||
|
||||
**Off-Tailscale:** Devices must point DNS at MK7:53 directly to resolve `.ai.home`.
|
||||
|
||||
---
|
||||
|
||||
## Migration History
|
||||
|
||||
| Date | Change |
|
||||
|------|--------|
|
||||
| 2026-05-25 | AdGuard Home deployed on port 3000/5373 |
|
||||
| 2026-05-28 | AdGuard paused (port conflict / redundancy concerns) |
|
||||
| 2026-05-30 | **AdGuard removed.** Technitium blocklist configured. DoT to Cloudflare enabled. |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Can't resolve `.ai.home` | Device not using Technitium | Point DNS at MK7:53 or join Tailscale |
|
||||
| Ads not blocked | Blocklist not loaded / outdated | Refresh blocklist in Technitium UI |
|
||||
| Slow resolution | DoT forwarder failing | Check `tls://1.1.1.1` reachability; fall back to root recursion |
|
||||
| Tailscale IPs unreachable | Device not on Tailscale | Connect to tailnet; 100.x IPs are VPN-only |
|
||||
|
||||
---
|
||||
|
||||
## Operational Commands
|
||||
|
||||
```bash
|
||||
# Test resolution from any node
|
||||
dig @192.168.7.7 artemis.ai.home +short
|
||||
dig @192.168.7.7 google.com +short
|
||||
|
||||
# Check Technitium container logs
|
||||
ssh jarvis@mk7.ai.home "docker logs $(docker ps -q -f name=technitium)"
|
||||
|
||||
# Access web UI
|
||||
open http://dns.ai.home:5380
|
||||
```
|
||||
206
fleet/admin-cheat-sheet.md
Normal file
206
fleet/admin-cheat-sheet.md
Normal file
@@ -0,0 +1,206 @@
|
||||
# Iron Legion Fleet Admin Cheat Sheet
|
||||
|
||||
Generated: 2026-05-31
|
||||
Maintainer: F.R.I.D.A.Y. (Hermes Agent)
|
||||
|
||||
---
|
||||
|
||||
## Quick Access Links
|
||||
|
||||
| Service | URL / Endpoint | Notes |
|
||||
|---------|---------------|-------|
|
||||
| iVentoy PXE Server | http://192.168.27.205:26000 | Shield WiFi fallback |
|
||||
| PegaProx | https://192.168.7.7:5000 | PVE Cluster Manager (host mode) |
|
||||
| Portainer | https://portainer.ai.home | Swarm Manager |
|
||||
| Traefik Dashboard | https://traefik.ai.home:8080 | Proxy/Router |
|
||||
| Technitium DNS | https://dns.ai.home:5380 | DNS Server |
|
||||
| Beszel Monitoring | https://beszel.ai.home | Fleet Metrics |
|
||||
| Dozzle | https://dozzle.ai.home | Container Logs |
|
||||
| Homepage | https://home.ai.home | Service Portal |
|
||||
| Prometheus | https://prometheus.ai.home | Metrics DB |
|
||||
| Authelia | https://auth.ai.home | SSO Portal |
|
||||
|
||||
---
|
||||
|
||||
## Fleet Node Inventory
|
||||
|
||||
### Swarm Manager
|
||||
|
||||
- Hostname: mark-vii.ai.home
|
||||
- Armor Code: MK-7
|
||||
- LAN IP: 192.168.7.7
|
||||
- Tailscale IP: 100.66.70.51
|
||||
- Role: Swarm Manager, DNS, Traefik, Portainer, PegaProx
|
||||
- CPUs: 18 | RAM: 15 GB | Disk: 916 GB
|
||||
|
||||
### Worker Nodes G9 (Proxmox VE)
|
||||
|
||||
| Armor | Hostname | LAN IP | Tailscale IP | MAC | Status |
|
||||
|-------|----------|--------|--------------|-----|--------|
|
||||
| MK-33 | mk33.ai.home | 192.168.7.33 | TBD | E0-51-D8-1C-5D-56 | Online (PVE) |
|
||||
| MK-34 | mk34.ai.home | 192.168.7.34 | TBD | E0-51-D8-1C-5C-75 | Online (PVE) |
|
||||
| MK-39 | mk39.ai.home | 192.168.7.39 | TBD | PENDING | Online (PVE) |
|
||||
| MK-42 | mk42.ai.home | 192.168.7.42 | TBD | PENDING | Not Installed |
|
||||
|
||||
### Utility Nodes
|
||||
|
||||
| Armor | Hostname | LAN IP | Tailscale IP | Role |
|
||||
|-------|----------|--------|--------------|------|
|
||||
| Neo | nebuchadnezzar.ai.home | 192.168.192.24 | 100.99.123.16 | Nextcloud AIO, Gitea |
|
||||
| MK-44 | mark44.ai.home | 192.168.5.214 | TBD | Ollama GPU |
|
||||
| MK-5 | mark5.ai.home | 192.168.6.5 | TBD | TBD |
|
||||
| Shield | shield.ai.home | 192.168.10.15 / 192.168.27.205 | - | PXE/iVentoy Server |
|
||||
| Artemis | artemis.ai.home | 192.168.15.182 | 100.100.97.18 | Discord Gateway |
|
||||
|
||||
### Mission Control
|
||||
|
||||
- Hostname: mission-control.ai.home
|
||||
- OS: Windows 11
|
||||
- Role: Workstation
|
||||
- Type: Separate physical machine
|
||||
|
||||
---
|
||||
|
||||
## PegaProx — Proxmox VE Cluster Manager
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Host** | MK7 (192.168.7.7) |
|
||||
| **Ports** | 5000 (HTTPS UI/API), 5001 (VNC WebSocket), 5002 (SSH WebSocket) |
|
||||
| **Deploy mode** | Docker Swarm — `host` publish mode |
|
||||
| **Network** | `traefik-public` overlay |
|
||||
| **SSL** | Self-signed cert (`CN=PegaProx`, auto-generated) |
|
||||
| **Default user** | `pegaprox` (password change required on first login) |
|
||||
| **Cluster IDs** | MK33=`726eb477`, MK34=`df6f5e5d`, MK39=`9711704b` |
|
||||
|
||||
**Admin password must be changed on first login.**
|
||||
|
||||
**API notes:**
|
||||
- Add cluster: `host` field must be **bare IP only** (no `:8006` — PegaProx appends port internally)
|
||||
- CSRF protection requires `X-Requested-With: XMLHttpRequest` on state-changing API calls
|
||||
- Exempt paths: `/api/auth/login`, `/api/auth/setup`, `/api/health`
|
||||
|
||||
---
|
||||
|
||||
## iVentoy PXE Configuration
|
||||
|
||||
- Server: shield.ai.home -- 192.168.10.15/27
|
||||
- WebUI: http://192.168.27.205:26000
|
||||
- Subnet: 192.168.10.0/27
|
||||
- Pool: 192.168.10.20 to 192.168.10.30
|
||||
- MAC Filter: Permit mode
|
||||
- Edition: **iVentoy Free** (Pro upgrade pending -- private repo link awaited)
|
||||
|
||||
### Registered ISOs
|
||||
|
||||
| ISO | Node | Purpose |
|
||||
|-----|------|---------|
|
||||
| proxmox-mk33-auto.iso | MK-33 | PVE 9.2 Auto-Install |
|
||||
| proxmox-mk34-auto.iso | MK-34 | PVE 9.2 Auto-Install |
|
||||
| proxmox-mk39-auto.iso | MK-39 | PVE 9.2 Auto-Install |
|
||||
| proxmox-mk42-auto.iso | MK-42 | PVE 9.2 Auto-Install |
|
||||
| proxmox-ve_9.2-1.iso | - | Original PVE ISO |
|
||||
| ubuntu-24.04.3-live-server-amd64.iso | - | Ubuntu Autoinstall |
|
||||
|
||||
### Whitelisted MACs
|
||||
|
||||
- E0-51-D8-1C-5D-CA (Legacy)
|
||||
- E0-51-D8-1C-5D-5C (Legacy)
|
||||
- E0-51-D8-1C-5D-56 (MK-33)
|
||||
- E0-51-D8-1C-5C-75 (MK-34)
|
||||
- PENDING: MK-39
|
||||
- PENDING: MK-42
|
||||
|
||||
Post-Install: Remove MAC from whitelist. Node boots local disk, gets production IP.
|
||||
|
||||
### ISO Remastering Notes
|
||||
|
||||
All Proxmox auto-install ISOs are **remastered** with:
|
||||
1. **Embedded answer URL** -- each ISO points to `http://192.168.10.15:8080/pve/answers/mkNN.toml` (server URL hardcoded; node IP assigned by DHCP)
|
||||
2. **UEFI gfxmode locked** -- strict `1024x768` (fallback `640x480` removed)
|
||||
3. **Per-ISO answer files** -- `mk33.toml`, `mk34.toml`, `mk39.toml`, `mk42.toml` in `/opt/iventoy/user/answers/`
|
||||
|
||||
> iVentoy Free does NOT support per-MAC ISO binding. Remastered ISOs achieve per-node provisioning via embedded answer URLs.
|
||||
|
||||
---
|
||||
|
||||
## DNS Records
|
||||
|
||||
### CNAME to traefik.ai.home -- A: 192.168.7.7
|
||||
|
||||
- artemis.ai.home
|
||||
- hermes.ai.home
|
||||
- n8n.ai.home
|
||||
- pgadmin.ai.home
|
||||
- portainer.ai.home
|
||||
- beszel.ai.home
|
||||
- dozzle.ai.home
|
||||
- prometheus.ai.home
|
||||
- homepage.ai.home
|
||||
- auth.ai.home
|
||||
- dns.ai.home
|
||||
|
||||
### A Records
|
||||
|
||||
- traefik.ai.home -> 192.168.7.7
|
||||
- mk7.ai.home -> 192.168.7.7
|
||||
- mk33.ai.home -> 192.168.7.33
|
||||
- mk34.ai.home -> 192.168.7.34
|
||||
- mk39.ai.home -> 192.168.7.39
|
||||
- mk42.ai.home -> 192.168.7.42
|
||||
- mark44.ai.home -> 192.168.5.214
|
||||
- mark5.ai.home -> 192.168.6.5
|
||||
- nebuchadnezzar.ai.home -> 192.168.192.24
|
||||
- shield.ai.home -> 192.168.10.15
|
||||
|
||||
---
|
||||
|
||||
## SSH Topology
|
||||
|
||||
Portable Host (F.R.I.D.A.Y.)
|
||||
|
|
||||
+---> artemis.ai.home via id_ed25519
|
||||
| +---> mk7.ai.home via artemis_key
|
||||
|
|
||||
+---> shield via jarvis user
|
||||
| +---> PXE subnet 192.168.10.0/27
|
||||
|
|
||||
+---> mk33-42 via bobby user (legacy subnet)
|
||||
|
|
||||
+---> nebuchadnezzar via jarvis user
|
||||
|
||||
Key Files:
|
||||
- ~/.ssh/id_ed25519 -- bobby@cinnamint
|
||||
- ~/.ssh/artemis_key -- MK7 jump-host
|
||||
|
||||
---
|
||||
|
||||
## Armor Codenames
|
||||
|
||||
| Code | Name | System |
|
||||
|------|------|--------|
|
||||
| MK-7 | Mark VII | Swarm Manager |
|
||||
| MK-33 | Silver Centurion | Worker |
|
||||
| MK-34 | Igor | Worker |
|
||||
| MK-39 | Starboost | Worker |
|
||||
| MK-42 | Bones | Worker |
|
||||
| MK-44 | Hulkbuster | GPU/Ollama |
|
||||
| MK-5 | Mark 5 | TBD |
|
||||
| J.A.R.V.I.S. | Judicious Automated... | Dashboard |
|
||||
| F.R.I.D.A.Y. | Field-Ready Runtime... | Portable Agent |
|
||||
| A.R.T.E.M.I.S. | Advanced Real-Time... | Discord |
|
||||
| NEO | Nebuchadnezzar | Nextcloud |
|
||||
| SHIELD | - | PXE Server |
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- iVentoy Free does NOT support per-MAC ISO binding.
|
||||
- Shield PXE subnet isolated via ip_forward=0.
|
||||
- Mission Control is separate physical machine.
|
||||
- All *.ai.home resolve via Technitium DNS.
|
||||
- PegaProx deployed on MK7 Swarm in `host` mode (not routed through Traefik).
|
||||
- iVentoy Pro upgrade pending -- private repo link awaited from vendor.
|
||||
|
||||
Last updated: 2026-05-31 by F.R.I.D.A.Y.
|
||||
@@ -76,7 +76,7 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
||||
|
||||
| Node | Role | Services Assigned |
|
||||
|------|------|-------------------|
|
||||
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, AdGuard Home, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
||||
| **MK7 (mark-vii.ai.home)** | Swarm Manager | ALL Phase 1 infrastructure: Traefik, Technitium DNS, Portainer, Prometheus, Beszel, Dozzle, Authelia, Homepage |
|
||||
| **MK33, MK34, MK39, MK42** | Swarm Workers | Phase 2 media stack (Jellyfin, Sonarr, Radarr, Prowlarr), distributed workloads, Vaultwarden, Nextcloud |
|
||||
| **Artemis** | AI Foreman / JARVIS | Hermes Agent, Ansible-pull control plane — NOT a service host |
|
||||
|
||||
@@ -116,8 +116,8 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||
|---------|-------|-------|-------|---------|-----------|-------|
|
||||
| **Traefik** | `traefik` | 3.49B | 3,634 | 2026-05-13 | **Global** | Every node receives ingress routing + Docker socket read-only |
|
||||
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Single authoritative DNS — port 53 on MK7 only |
|
||||
| **AdGuard Home** | `adguard/adguardhome` | 170.7M | 1,408 | 2026-05-25 | **Replicated (2)** | 2 replicas across workers for redundancy — port 3000 |
|
||||
| **Technitium DNS** | `technitium/dns-server` | 8.99M | 156 | 2026-05-09 | **Manager Constraint** | Authoritative `.ai.home` + recursive DNS with DoT forwarder to Cloudflare, ad blocking enabled — port 53 on MK7 only |
|
||||
| **~~AdGuard Home~~** | ~~`adguard/adguardhome`~~ | ~~170.7M~~ | ~~1,408~~ | ~~2026-05-25~~ | ~~**Removed**~~ | ~~Replaced by Technitium built-in ad blocking~~ |
|
||||
|
||||
### Monitoring / Observability
|
||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||
@@ -126,13 +126,13 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
||||
| **Prometheus Node Exporter** | `prom/node-exporter` | — | — | — | **Global** | Runs on every node — scrapes CPU/mem/disk |
|
||||
| **Grafana** | `grafana/grafana` | 5.22B | 3,540 | 2026-05-16 | **Replicated (1)** | Any worker (Phase 3, needs data history first) |
|
||||
| **Beszel Hub** | `henrygd/beszel` | 12.58M | 32 | 2026-04-30 | **Manager Constraint** | Central hub on MK7 collects metrics from agents |
|
||||
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Global** | Runs on every node — reports to hub |
|
||||
| **Beszel Agent** | `henrygd/beszel-agent` | — | — | — | **Pending** | Planned global — reports to hub. Not yet deployed. |
|
||||
| **Dozzle** | `amir20/dozzle` | 309.6M | 144 | 2026-05-25 | **Replicated (1)** | Any worker — read-only Docker socket |
|
||||
|
||||
### Management / Dashboard
|
||||
| Service | Image | Pulls | Stars | Updated | Placement | Notes |
|
||||
|---------|-------|-------|-------|---------|-----------|-------|
|
||||
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Manager Constraint** | MK7 only — agentless mode, no portainer-agent needed |
|
||||
| **Portainer CE** | `portainer/portainer-ce` | 1.46B | 2,665 | 2026-05-20 | **Replicated (1)** | MK7 — agentless mode, no portainer-agent needed |
|
||||
| **Homepage** | `gethomepage/homepage` | 1.31M | 40 | 2026-05-25 | **Replicated (1)** | Any worker — all endpoints via env vars |
|
||||
|
||||
### Security / Identity
|
||||
@@ -187,16 +187,27 @@ This PRD is append-only for new services. Modifications to existing entries requ
|
||||
| Nextcloud (MK7) | PostgreSQL (MK7) | TCP | 5432 | DB traffic over Tailscale |
|
||||
|
||||
## DNS Resolution
|
||||
- **Technitium (MK7)** is the authoritative internal DNS for `*.ai.home`.
|
||||
- **AdGuard Home (MK7)** handles recursive resolution with ad-block lists. Replaces Pi-hole.
|
||||
- **Chain:** Client → Technitium (local record?) → AdGuard Home (recursive + blocklist) → Upstream (Cloudflare/Quad9)
|
||||
- **Tailscale MagicDNS** remains enabled as fallback. If Technitium fails, clients fall back to `100.x.x.x` direct resolution.
|
||||
- **AdGuard Home admin UI** runs on port 3000 by default (separate from Grafana if co-located).
|
||||
|
||||
| Component | Status | Detail |
|
||||
|-----------|--------|--------|
|
||||
| **Technitium (MK7)** | ✅ Deployed | Container running, port 53/5380 open |
|
||||
| **`*.ai.home` zone** | ⏳ Pending | Not yet configured as authoritative — Tailscale MagicDNS currently handles name resolution |
|
||||
| **Technitium DNS (MK7)** | ✅ Active | Authoritative `.ai.home` + recursive resolver + ad blocking on port 53. |
|
||||
| **~~AdGuard Home~~** | ~~Removed~~ | ~~Replaced by Technitium built-in ad blocking~~ |
|
||||
|
||||
**Planned Chain (not yet active):**
|
||||
```
|
||||
Client → Technitium (authoritative `.ai.home`? → return local record) → Technitium (recursive resolver + blocklist) → Cloudflare DoT / Root Servers
|
||||
```
|
||||
|
||||
**Current Fallback:** Tailscale MagicDNS provides `*.ai.home` resolution via Tailscale IP addresses. Technitium will assume authority once zone records are populated.
|
||||
|
||||
- **Technitium DNS admin UI** runs on port 5380.
|
||||
|
||||
## Port Allocation (Reserved)
|
||||
| Port | Service |
|
||||
|------|---------|
|
||||
| 53 | DNS (Technitium / Pi-hole) |
|
||||
| 53 | DNS (Technitium) |
|
||||
| 80/443 | HTTP/S (Traefik) |
|
||||
| 3000 | Grafana |
|
||||
| 9090 | Prometheus |
|
||||
@@ -232,7 +243,7 @@ Every service with persistent state uses **bind mounts to on-node directories**.
|
||||
|---------|-----------|---------------|---------------|
|
||||
| **Traefik** | `/opt/iron-legion/traefik/config/` `/opt/iron-legion/traefik/certs/` | MK7 (daily rsync) | < 50 MB |
|
||||
| **Technitium DNS** | `/opt/iron-legion/technitium/config/` | MK7 | < 10 MB |
|
||||
| **Pi-hole** | `/opt/iron-legion/pihole/etc-pihole/` `/opt/iron-legion/pihole/etc-dnsmasq.d/` | MK7 | < 500 MB |
|
||||
| **~~AdGuard Home~~** | ~~`/opt/iron-legion/adguard/work/`~~ ~~`/opt/iron-legion/adguard/conf/`~~ | ~~Removed~~ | ~~N/A~~ |
|
||||
| **Prometheus** | `/opt/iron-legion/prometheus/data/` | MK7 (retention: 15d local, 90d backup) | 5–20 GB |
|
||||
| **Grafana** | `/opt/iron-legion/grafana/data/` | MK7 | < 500 MB |
|
||||
| **Beszel** | `/opt/iron-legion/beszel/data/` | MK7 | < 1 GB |
|
||||
@@ -302,7 +313,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
||||
- **No VLANs.** Tailscale ACLs handle segment isolation.
|
||||
- **ACL policy (draft):**
|
||||
- `tag:admin` nodes (Bobby, Artemis) → all ports on all nodes
|
||||
- `tag:services` (MK7, MK7, MK7, MK7) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
||||
- `tag:services` (MK7 manager + MK33, MK34, MK39, MK42 workers) → only their assigned service ports, no cross-node SSH except via Tailscale SSH
|
||||
- `tag:user` (Bobby's phone, laptop) → HTTPS 443 on MK7 only, Jellyfin 8096 on MK7 directly
|
||||
- **Default deny.** Any traffic not explicitly allowed in Tailscale ACL is dropped.
|
||||
|
||||
@@ -321,7 +332,8 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
||||
| Order | Service | Target Node | Why First | Dependencies |
|
||||
|-------|---------|-------------|-----------|--------------|
|
||||
| 1 | **Technitium DNS** | MK7 | Name resolution for internal services | None |
|
||||
| 2 | **Pi-hole** | MK7 | Recursive DNS + ad-block | Technitium (via conditional forwarding) |
|
||||
| 2 | **Technitium DNS** | MK7 | Authoritative + recursive + ad-block | N/A — single service |
|
||||
| ~~AdGuard Home~~ | ~~Removed~~ | ~~—~~ | ~~Technitium replaces AdGuard~~ |
|
||||
| 3 | **Traefik** | MK7 | Edge router for all HTTP ingress | DNS (needs `*.labs.internal` to resolve) |
|
||||
| 4 | **Authelia** | MK7 | Auth layer before exposing any mgmt UI | Traefik (depends on ForwardAuth middleware) |
|
||||
| 5 | **Portainer** | MK7 | Container management UI | Traefik + Authelia (for secured access) |
|
||||
@@ -375,7 +387,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
||||
|---|----------|--------|----------------------|
|
||||
| 1 | **Domain name** — Does Bobby own a domain (e.g., `bobbysh.me`) or do we use a fake TLD (`labs.internal`)? | **Critical** — TLS certs, Authelia, and DNS all depend on this. | Use `labs.internal` + self-signed CA |
|
||||
| 2 | **Technitium upstream** — DoH, DoT, or plain UDP to upstream resolver (e.g., Cloudflare 1.1.1.1)? | Low — can default to DoH | DoH → `https://cloudflare-dns.com/dns-query` |
|
||||
| 3 | **Pi-hole vs Technitium conflict** — Both run on MK7 port 53. Run Pi-hole on non-standard port with Technitium as conditional forwarder? Or separate nodes? | **Critical** — port 53 collision | Technitium on 53, Pi-hole on 5053, forward to Pi-hole from Technitium |
|
||||
| 3 | **AdGuard Home vs Technitium layout** — AdGuard runs on port 3000, Technitium on 53. No collision, but conditional forwarding from Technitium to AdGuard needs config. | Low — both run independently | Technitium uses upstream AdGuard for recursive queries |
|
||||
| 4 | **Jellyfin media storage** — External USB on MK7? SMB share? NVMe? | Medium | External USB mounted at `/media` on MK7 |
|
||||
| 5 | **Backup target on MK7** — Capacity? Dedicated drive? Rsync target path? | Medium | `/backups/<service-name>/` on MK7 secondary storage |
|
||||
| 6 | **Nextcloud database** — Use existing PostgreSQL on MK7, or deploy Nextcloud AIO (bundled)? | Medium — affects resource allocation on MK7 | Deploy standalone PostgreSQL container on MK7 for Nextcloud AIO is too heavy |
|
||||
@@ -385,7 +397,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
||||
| 10 | **Beszel alert thresholds** — CPU %, memory %, disk % triggers not defined. | Low | Defaults in Beszel container |
|
||||
|
||||
## Outstanding Decisions Required
|
||||
1. **Pi-hole inclusion** — Not in Bobby's original list. I added it as a DNS-layer complement to Technitium. **Remove if Bobby doesn't want it.**
|
||||
1. ~~Pi-hole inclusion~~ — **Resolved.** Technitium built-in ad blocking replaces Pi-hole.
|
||||
2. **Authelia two-factor method** — TOTP via app (Google Authenticator) vs WebAuthn/FIDO2 keys?
|
||||
3. **Home vs remote access** — If Bobby wants to share Jellyfin with friends/family outside Tailscale, public domain + Authelia guard is required.
|
||||
|
||||
@@ -411,10 +423,9 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
||||
| Prowlarr | `linuxserver/prowlarr` | `linuxserver` | 35,913,487 | 403 | 2026-05-25 | ✅ 200 |
|
||||
| Vaultwarden | `vaultwarden/server` | `vaultwarden` | 287,182,978 | 1,454 | 2026-05-17 | ✅ 200 |
|
||||
| Nextcloud | `nextcloud` | `library` | 1,011,978,204 | 4,485 | 2026-05-23 | ✅ 200 |
|
||||
| Pi-hole | `pihole/pihole` | `pihole` | 961,220,209 | 2,943 | 2026-05-25 | ✅ 200 |
|
||||
| Authelia | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
||||
| **Authelia** | `authelia/authelia` | `authelia` | 75,183,682 | 208 | 2026-05-25 | ✅ 200 |
|
||||
|
||||
**Total unique images:** 16 (including Pi-hole)
|
||||
**Total unique images:** 15
|
||||
**Community health indicator:** All images have > 10 stars, > 1M pulls (except Beszel 32 stars, Homepage 40 stars — acceptable for young projects)
|
||||
**Freshness:** All updated within 90 days except Beszel (30 days — still acceptable)
|
||||
|
||||
@@ -423,7 +434,7 @@ traefik.http.middlewares.authelia.forwardauth.address: http://authelia:9091/api/
|
||||
~/.ansible-repo/new-build/
|
||||
├── phase-1/ # Infrastructure
|
||||
│ ├── technitium/
|
||||
│ ├── pihole/
|
||||
│ ├── adguard/
|
||||
│ ├── traefik/
|
||||
│ ├── authelia/
|
||||
│ ├── portainer/
|
||||
|
||||
@@ -5,9 +5,9 @@
|
||||
| Chunk | Status | Commit | Notes |
|
||||
|-------|--------|--------|-------|
|
||||
| Chunk 1 — Purpose, Scope, Success Criteria | ✅ Complete | `73e42cc` | Merged into `homelab-services-stack-prd.md` |
|
||||
| Chunk 2 — Constraints, Service Catalog, Network Architecture | 🔄 In Progress | — | Awaiting completion |
|
||||
| Chunk 3 — Data & Persistence, Security Model | ⏳ Pending | — | Blocked on Chunk 2 |
|
||||
| Chunk 4 — Deployment Phases, Open Questions, Appendix | ⏳ Pending | — | Blocked on Chunk 3 |
|
||||
| Chunk 2 — Constraints, Service Catalog, Network Architecture | ✅ Complete | `a3fc718` | Reconciled with live fleet |
|
||||
| Chunk 3 — Data & Persistence, Security Model | ✅ Complete | `b7cc09c` | Pi-hole fully removed, Technitium ad blocking canonical. ACL policy corrected. Split files + master PRD in sync. |
|
||||
| Chunk 4 — Deployment Phases, Open Questions, Appendix | ✅ Complete | `f18b978` | All Pi-hole references purged. Split files + master PRD in sync. |
|
||||
|
||||
## Operational Documentation
|
||||
|
||||
|
||||
238
procedures/iventoy-remaster-procedure.md
Normal file
238
procedures/iventoy-remaster-procedure.md
Normal file
@@ -0,0 +1,238 @@
|
||||
# Procedure: Remaster Proxmox VE ISOs for iVentoy Auto-Install
|
||||
|
||||
**Scope:** Remaster stock Proxmox VE ISOs with embedded auto-install answer URLs and locked UEFI gfxmode for PXE boot via iVentoy.
|
||||
**Author:** F.R.I.D.A.Y.
|
||||
**Date:** 2026-05-31
|
||||
**Prerequisites:** Stock Proxmox VE ISO, `xorriso`, Python 3, iVentoy PXE server running.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
iVentoy Free does NOT support per-MAC ISO binding. To provision each node with its own network config (IP, gateway, etc.), we remaster the stock Proxmox ISO:
|
||||
|
||||
1. Embed an `auto-installer-mode.toml` file pointing to a per-node answer file
|
||||
2. Lock UEFI `gfxmode` to `1024x768` (remove `640x480` fallback)
|
||||
3. Each ISO points to its own answer URL: `http://192.168.10.15:8080/pve/answers/mkNN.toml`
|
||||
|
||||
---
|
||||
|
||||
## 2. Answer File Structure
|
||||
|
||||
### iVentoy Answer Server
|
||||
|
||||
iVentoy runs a built-in HTTP server on `192.168.10.15:8080`. Answer files live in:
|
||||
```
|
||||
/opt/iventoy/user/answers/
|
||||
├── mk33.toml
|
||||
├── mk34.toml
|
||||
├── mk39.toml
|
||||
└── mk42.toml
|
||||
```
|
||||
|
||||
### Per-Node Answer File Example (`mk33.toml`)
|
||||
|
||||
```toml
|
||||
[target]
|
||||
source = "from-dhcp" # Node IP assigned by iVentoy DHCP, NOT hardcoded
|
||||
|
||||
global]
|
||||
keyboard = "en-us"
|
||||
timezone = "America/Toronto"
|
||||
|
||||
[network]
|
||||
iface = "eno1"
|
||||
address = "192.168.7.33/18" # Static after install
|
||||
gateway = "192.168.18.1"
|
||||
dns = "192.168.7.7"
|
||||
|
||||
[root-password]
|
||||
pwhash = "$y$j9T$YOUR_HASH_HERE" # Pre-hashed password
|
||||
```
|
||||
|
||||
> **Important:** The `answer_url` in the embedded `auto-installer-mode.toml` points to the **server** (`192.168.10.15:8080`), not the node IP. The node IP comes from DHCP during PXE boot (`source = "from-dhcp"`).
|
||||
|
||||
---
|
||||
|
||||
## 3. Remaster Script
|
||||
|
||||
Save as `/tmp/remaster_pve_iso.py`:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Remaster Proxmox VE ISO with embedded auto-install answer URL.
|
||||
Locks UEFI gfxmode to 1024x768 (removes 640x480 fallback).
|
||||
"""
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import os
|
||||
import shutil
|
||||
|
||||
# Node-specific config
|
||||
NODE = sys.argv[1] # e.g., mk33
|
||||
SRC_ISO = sys.argv[2] # e.g., proxmox-ve_9.2-1.iso
|
||||
DST_ISO = f"proxmox-{NODE}-auto.iso"
|
||||
ANSWER_URL = f"http://192.168.10.15:8080/pve/answers/{NODE}.toml"
|
||||
|
||||
# Create auto-installer-mode.toml
|
||||
auto_installer_toml = f"""[proxmox-auto-installer]
|
||||
answer_url = "{ANSWER_URL}"
|
||||
"""
|
||||
|
||||
# Work in temp dir
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Extract ISO contents
|
||||
subprocess.run(["xorriso", "-osirrox", "on", "-indev", SRC_ISO,
|
||||
"-extract", "/", tmpdir], check=True)
|
||||
|
||||
# Write auto-installer-mode.toml into ISO root
|
||||
ai_path = os.path.join(tmpdir, "auto-installer-mode.toml")
|
||||
with open(ai_path, "w") as f:
|
||||
f.write(auto_installer_toml)
|
||||
|
||||
# Patch grub.cfg: lock gfxmode to 1024x768 only
|
||||
grub_path = os.path.join(tmpdir, "boot", "grub", "grub.cfg")
|
||||
if os.path.exists(grub_path):
|
||||
with open(grub_path, "r") as f:
|
||||
content = f.read()
|
||||
# Remove 640x480 fallback
|
||||
content = content.replace("set gfxmode=1024x768,640x480",
|
||||
"set gfxmode=1024x768")
|
||||
with open(grub_path, "w") as f:
|
||||
f.write(content)
|
||||
print("Patched grub.cfg: gfxmode locked to 1024x768")
|
||||
|
||||
# Rebuild ISO with same boot properties
|
||||
subprocess.run([
|
||||
"xorriso", "-as", "mkisofs",
|
||||
"-o", DST_ISO,
|
||||
"-isohybrid-mbr", os.path.join(tmpdir, "usr", "lib", "ISOLINUX", "isohdpfx.bin"),
|
||||
"-c", "boot.cat",
|
||||
"-b", "isolinux/isolinux.bin",
|
||||
"-no-emul-boot", "-boot-load-size", "4", "-boot-info-table",
|
||||
"-eltorito-alt-boot",
|
||||
"-e", "EFI/BOOT/BOOTX64.EFI",
|
||||
"-no-emul-boot", "-isohybrid-gpt-basdat",
|
||||
"-r", "-V", f"Proxmox-VE-Auto-{NODE}",
|
||||
tmpdir
|
||||
], check=True)
|
||||
|
||||
print(f"Created: {DST_ISO}")
|
||||
print(f"Answer URL embedded: {ANSWER_URL}")
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# On Shield (iVentoy server)
|
||||
python3 /tmp/remaster_pve_iso.py mk33 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||
python3 /tmp/remaster_pve_iso.py mk34 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||
python3 /tmp/remaster_pve_iso.py mk39 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||
python3 /tmp/remaster_pve_iso.py mk42 /opt/iventoy/iso/proxmox-ve_9.2-1.iso
|
||||
|
||||
# Move to iVentoy ISO directory
|
||||
mv proxmox-mk33-auto.iso /opt/iventoy/iso/
|
||||
mv proxmox-mk34-auto.iso /opt/iventoy/iso/
|
||||
mv proxmox-mk39-auto.iso /opt/iventoy/iso/
|
||||
mv proxmox-mk42-auto.iso /opt/iventoy/iso/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. In-Place ISO Patching (gfxmode only)
|
||||
|
||||
If you already have remastered ISOs and only need to patch gfxmode:
|
||||
|
||||
```bash
|
||||
# Extract grub.cfg from ISO, patch, replace in-place
|
||||
ISO=/opt/iventoy/iso/proxmox-mk33-auto.iso
|
||||
xorriso -cpx /boot/grub/grub.cfg /tmp/grub.cfg -< /dev/null -- "$ISO"
|
||||
sed -i 's/set gfxmode=1024x768,640x480/set gfxmode=1024x768/' /tmp/grub.cfg
|
||||
xorriso -boot_image any replay -map /tmp/grub.cfg /boot/grub/grub.cfg -- "$ISO"
|
||||
```
|
||||
|
||||
> The `-boot_image any replay` flag preserves boot properties after file replacement.
|
||||
|
||||
---
|
||||
|
||||
## 5. Verification
|
||||
|
||||
```bash
|
||||
# Confirm answer URL is embedded
|
||||
strings /opt/iventoy/iso/proxmox-mk33-auto.iso | grep "192.168.10.15"
|
||||
# Expected: http://192.168.10.15:8080/pve/answers/mk33.toml
|
||||
|
||||
# Confirm gfxmode is locked
|
||||
xorriso -cpx /boot/grub/grub.cfg /tmp/verify.cfg -< /dev/null -- /opt/iventoy/iso/proxmox-mk33-auto.iso
|
||||
grep gfxmode /tmp/verify.cfg
|
||||
# Expected: set gfxmode=1024x768
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. iVentoy Configuration
|
||||
|
||||
### Web UI
|
||||
- URL: `http://192.168.27.205:26000`
|
||||
- Go to **ISO Management** → add remastered ISOs
|
||||
|
||||
### MAC Whitelist (Permit Mode)
|
||||
Add node MACs to iVentoy whitelist:
|
||||
```
|
||||
E0-51-D8-1C-5D-56 (MK-33)
|
||||
E0-51-D8-1C-5C-75 (MK-34)
|
||||
PENDING (MK-39)
|
||||
PENDING (MK-42)
|
||||
```
|
||||
|
||||
Nodes must be in whitelist to PXE boot.
|
||||
|
||||
### DHCP Pool
|
||||
- Subnet: `192.168.10.0/27`
|
||||
- Range: `192.168.10.20` to `192.168.10.30`
|
||||
- Nodes get temporary PXE IPs from this pool during install
|
||||
|
||||
---
|
||||
|
||||
## 7. Post-Install
|
||||
|
||||
After node installs and reboots:
|
||||
1. Remove node MAC from iVentoy whitelist (node boots from local disk)
|
||||
2. Node gets production IP from `/etc/network/interfaces` (set in answer file)
|
||||
3. Verify: `ping 192.168.7.33` (or appropriate node IP)
|
||||
|
||||
---
|
||||
|
||||
## 8. iVentoy Pro Upgrade Notes
|
||||
|
||||
> **Status:** Awaiting private repo link from vendor.
|
||||
|
||||
Expected Pro features (to verify upon upgrade):
|
||||
- Per-MAC ISO binding (may eliminate need for per-node remastered ISOs)
|
||||
- Additional deployment modes
|
||||
- Priority support
|
||||
|
||||
When the private repo link is received:
|
||||
1. Clone the Pro repository
|
||||
2. Review upgrade documentation in the repo
|
||||
3. Backup current `/opt/iventoy/` configuration
|
||||
4. Follow vendor upgrade procedure
|
||||
5. Test with one node before fleet-wide rollout
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
```bash
|
||||
# Remove remastered ISO
|
||||
rm /opt/iventoy/iso/proxmox-mk33-auto.iso
|
||||
|
||||
# Re-add stock ISO in iVentoy Web UI
|
||||
# Node will boot stock ISO -- manual install required
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-05-31*
|
||||
165
procedures/pega-prox-deploy.md
Normal file
165
procedures/pega-prox-deploy.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Procedure: Deploy PegaProx on Docker Swarm
|
||||
|
||||
**Scope:** Deploy PegaProx (Proxmox VE cluster manager) as a Docker Swarm service on MK7.
|
||||
**Author:** F.R.I.D.A.Y.
|
||||
**Date:** 2026-05-31
|
||||
**Prerequisites:** MK7 Swarm manager active, `traefik-public` overlay network exists.
|
||||
|
||||
---
|
||||
|
||||
## 1. Create Swarm Compose File
|
||||
|
||||
Save as `/tmp/pegaprox_swarm.yml` on MK7:
|
||||
|
||||
```yaml
|
||||
version: "3.8"
|
||||
services:
|
||||
pegaprox:
|
||||
image: pegaprox/pegaprox:latest
|
||||
deploy:
|
||||
mode: replicated
|
||||
replicas: 1
|
||||
placement:
|
||||
constraints:
|
||||
- node.role == manager
|
||||
ports:
|
||||
- target: 5000
|
||||
published: 5000
|
||||
mode: host
|
||||
protocol: tcp
|
||||
- target: 5001
|
||||
published: 5001
|
||||
mode: host
|
||||
protocol: tcp
|
||||
- target: 5002
|
||||
published: 5002
|
||||
mode: host
|
||||
protocol: tcp
|
||||
networks:
|
||||
- traefik-public
|
||||
volumes:
|
||||
- pegaprox-config:/app/config
|
||||
environment:
|
||||
- PEGAPROX_DEBUG=0
|
||||
|
||||
volumes:
|
||||
pegaprox-config:
|
||||
driver: local
|
||||
|
||||
networks:
|
||||
traefik-public:
|
||||
external: true
|
||||
```
|
||||
|
||||
> **Critical:** `mode: host` is required. `ingress` mode breaks WebSocket VNC/SSH consoles because Swarm ingress routing does not support WebSocket upgrade properly.
|
||||
|
||||
---
|
||||
|
||||
## 2. Deploy Stack
|
||||
|
||||
```bash
|
||||
ssh jarvis@mk7.ai.home
|
||||
docker stack deploy -c /tmp/pegaprox_swarm.yml pegaprox
|
||||
```
|
||||
|
||||
Verify:
|
||||
```bash
|
||||
docker service ls | grep pegaprox
|
||||
docker ps | grep pegaprox
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Verify Service Health
|
||||
|
||||
```bash
|
||||
# HTTPS API
|
||||
curl -sk https://192.168.7.7:5000/api/health
|
||||
|
||||
# Check container logs
|
||||
docker logs $(docker ps -q -f name=pegaprox)
|
||||
```
|
||||
|
||||
Expected: `{"status":"ok"}`
|
||||
|
||||
---
|
||||
|
||||
## 4. First Login & Password Change
|
||||
|
||||
1. Open `https://192.168.7.7:5000`
|
||||
2. Login with default credentials:
|
||||
- Username: `pegaprox`
|
||||
- Password: `admin`
|
||||
3. System will force password change on first login
|
||||
4. API returns: `{"security_warning":"DEFAULT_PASSWORD","requires_password_change":true}`
|
||||
|
||||
---
|
||||
|
||||
## 5. API Notes for Automation
|
||||
|
||||
### CSRF Protection
|
||||
All state-changing API calls (POST/PUT/PATCH/DELETE) must include:
|
||||
```
|
||||
X-Requested-With: XMLHttpRequest
|
||||
```
|
||||
|
||||
Exempt paths (no CSRF header needed):
|
||||
- `/api/auth/login`
|
||||
- `/api/auth/setup`
|
||||
- `/api/auth/oidc/*`
|
||||
- `/api/auth/check`
|
||||
- `/api/auth/validate`
|
||||
- `/api/auth/logout`
|
||||
- `/api/health`
|
||||
- `/api/webauthn/auth/begin`
|
||||
|
||||
### Add Cluster
|
||||
```bash
|
||||
curl -sk -X POST https://192.168.7.7:5000/api/clusters \
|
||||
-b cookies.txt \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Requested-With: XMLHttpRequest" \
|
||||
-d '{
|
||||
"name": "MK33",
|
||||
"host": "192.168.7.33",
|
||||
"user": "root@pam",
|
||||
"pass": "YOUR_PVE_PASSWORD"
|
||||
}'
|
||||
```
|
||||
|
||||
> **CRITICAL:** `host` must be **bare IP only**. Do NOT append `:8006`. PegaProx appends the port internally. Supplying `192.168.7.33:8006` causes URL parse failure: `Failed to parse: https://[192.168.7.33:8006]:8006/...`
|
||||
|
||||
---
|
||||
|
||||
## 6. Backup Volume
|
||||
|
||||
```bash
|
||||
# Backup PegaProx config + DB
|
||||
docker run --rm -v pegaprox_pegaprox-config:/src -v /tmp:/dst alpine \
|
||||
tar czf /dst/pegaprox-config-$(date +%Y%m%d).tar.gz -C /src .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Known Issues
|
||||
|
||||
| Issue | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| WebSocket VNC/SSH broken | Swarm `ingress` mode strips upgrade headers | Use `mode: host` |
|
||||
| URL parse error on add-cluster | `:8006` appended to host field | Use bare IP only |
|
||||
| CSRF 403 on API calls | Missing `X-Requested-With` header | Add header to all state-changing calls |
|
||||
| Self-signed cert warning | No CA-signed cert deployed | Accept in browser or deploy custom cert |
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
```bash
|
||||
ssh jarvis@mk7.ai.home
|
||||
docker stack rm pegaprox
|
||||
docker volume rm pegaprox_pegaprox-config # WARNING: destroys all data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-05-31*
|
||||
2
swarm.md
2
swarm.md
@@ -29,7 +29,7 @@ All services deployed on MK7 manager via `docker stack deploy`.
|
||||
| `portainer` | Portainer CE | replicated | 1/1 | `9000` | `portainer.ai.home` |
|
||||
| `prometheus` | Prometheus | replicated | 1/1 | `9090` | `prom.ai.home` |
|
||||
| `technitium` | Technitium DNS | replicated | 1/1 | `53/tcp`, `53/udp`, `5380` | `dns.ai.home` |
|
||||
| `adguard` | AdGuard Home | replicated | 1/1 | `3000`, `30053` | `adguard.ai.home` |
|
||||
| ~~`adguard`~~ | ~~AdGuard Home~~ | ~~removed~~ | ~~—~~ | ~~—~~ | ~~`adguard.ai.home`~~ |
|
||||
| ~~authelia~~ | ~~Authelia~~ | ~~deferred~~ | — | — | ~~`auth.ai.home`~~ |
|
||||
|
||||
> **Note:** Authelia deferred until local TLS is available (requires `https://auth.ai.home`).
|
||||
|
||||
Reference in New Issue
Block a user