PRD Updates: Fix MK7/Neo references; add Atlantis section; new Ansible Web UI comparison PRD

2026-06-02 06:31:15 -04:00
parent 4377ffaffa
commit fa7a6a2669
2 changed files with 396 additions and 8 deletions
--- a/Drafts/ansible-automation-webui-comparison.md
+++ b/Drafts/ansible-automation-webui-comparison.md
@@ -0,0 +1,316 @@
+# Ansible Automation Web UI Comparison PRD
+
+**Status:** Draft | **Author:** F.R.I.D.A.Y. (Hermes Agent) | **Date:** 2026-06-02
+
+---
+
+## 1. Purpose & Scope
+
+This PRD evaluates web-based UIs for running and managing Ansible playbooks in the Iron Legion fleet. The focus is on self-hosted, Docker-friendly solutions that integrate with our existing Gitea SCM and are deployable on Swarm or standalone nodes.
+
+**Tools Evaluated:**
+1. Semaphore UI (Ansible-native) — RECOMMENDED
+2. Kestra (Generic orchestration, Ansible-compatible)
+3. AWX (Official Red Hat Ansible platform)
+4. Rundeck (Ops automation with Ansible plugin)
+5. Jenkins + Ansible Plugin (CI/CD generalist)
+
+---
+
+## 2. Requirements
+
+**Must-Have:**
+- [x] Docker Compose or Swarm deployable
+- [x] Ansible playbook execution (not just shell scripts calling ansible)
+- [x] Web UI for triggering runs, viewing logs, managing inventories
+- [x] Self-hosted (no cloud dependency)
+- [x] Works on Iron Legion architecture (x86_64, moderate RAM)
+
+**Nice-to-Have:**
+- [ ] Gitea webhook integration (auto-trigger on push)
+- [ ] RBAC / multi-user access
+- [ ] API for automation
+- [ ] Scheduled runs (cron-like)
+- [ ] Low resource footprint (fit on G9 nodes)
+
+---
+
+## 3. Comparison Matrix
+
+| Criterion | Semaphore UI | Kestra | AWX | Rundeck | Jenkins + Ansible |
+|-----------|-------------|--------|-----|---------|-------------------|
+| **Primary Purpose** | Ansible-native runner | Generic workflow engine | Enterprise Ansible platform | Ops automation | CI/CD generalist |
+| **Docker Compose** | ✅ Simple | ✅ Simple | ⚠️ Complex (K8s preferred) | ✅ Simple | ✅ Simple |
+| **RAM Needed** | ~256 MB | ~512 MB | ~4 GB (6+ GB recommended) | ~512 MB | ~1 GB |
+| **Ansible Integration** | Native | Via shell/HTTP tasks | Native | Plugin-based | Plugin-based |
+| **Inventory Management** | Built-in (static + dynamic) | Via external files | Advanced (sources, scripts) | Basic | Via files/plugins |
+| **Gitea Webhooks** | ✅ Supported | ✅ Supported | ⚠️ Requires AWX project sync | ✅ Via plugin | ✅ Via SCM polling |
+| **RBAC / Multi-user** | ✅ | ✅ | ✅ Enterprise-grade | ✅ | ✅ Plugin-based |
+| **Scheduled Runs** | ✅ Cron UI | ✅ Triggers | ✅ Schedules | ✅ Jobs scheduler | ✅ Cron trigger plugin |
+| **Log Viewer** | ✅ Real-time | ✅ Real-time | ✅ Real-time + facts | ✅ | ✅ Plugin-dependent |
+| **Vault Integration** | ✅ Key store built-in | Via secrets | ✅ Native | Via plugins | Via plugins |
+| **Complexity** | Low | Medium | High | Medium | High |
+
+---
+
+## 4. Tool Deep-Dives
+
+### 4.1 Semaphore UI (RECOMMENDED)
+
+**Why it wins:** Purpose-built for Ansible, minimal footprint, fast UI, and fits Iron Legion constraints.
+
+**Docker Compose:**
+
+```yaml
+services:
+  mysql:
+    image: mysql:8.0
+    environment:
+      MYSQL_ROOT_PASSWORD: semaphore-db-password
+      MYSQL_DATABASE: semaphore
+      MYSQL_USER: semaphore
+      MYSQL_PASSWORD: semaphore-db-password
+    volumes:
+      - semaphore-mysql:/var/lib/mysql
+    restart: unless-stopped
+
+  semaphore:
+    image: semaphoreui/semaphore:latest
+    ports:
+      - "3000:3000"
+    environment:
+      SEMAPHORE_DB_DIALECT: mysql
+      SEMAPHORE_DB_HOST: mysql
+      SEMAPHORE_DB_NAME: semaphore
+      SEMAPHORE_DB_USER: semaphore
+      SEMAPHORE_DB_PASS: semaphore-db-password
+      SEMAPHORE_ADMIN_PASSWORD: admin-password
+      SEMAPHORE_ADMIN_NAME: admin
+      SEMAPHORE_ADMIN_EMAIL: admin@localhost
+      SEMAPHORE_ADMIN: admin
+      # Optional: Telegram / Slack / Gitea integration
+      SEMAPHORE_WEBHOOK: "1"
+    volumes:
+      - semaphore-config:/etc/semaphore
+      - /path/to/ansible/playbooks:/playbooks:ro
+      - /path/to/inventories:/inventories:ro
+      - /path/to/ssh/keys:/ssh:ro
+    depends_on:
+      - mysql
+    restart: unless-stopped
+
+volumes:
+  semaphore-mysql:
+    driver: local
+  semaphore-config:
+    driver: local
+```
+
+**Key Features:**
+- **Project-centric:** Organize playbooks into projects with separate inventories, env vars, and access
+- **Task Templates:** Define reusable job definitions with variables and surveys
+- **Key Store:** Built-in encrypted vault for SSH keys, passwords, Ansible vault passwords
+- **Cron Schedules:** UI-driven scheduling without crontab
+- **Real-time Logs:** WebSocket-based live log streaming
+- **Gitea Integration:** Add a Gitea repository as a project, clone on each run, webhooks for auto-trigger
+
+**Resource Footprint:**
+- MySQL: ~200 MB RAM
+- Semaphore: ~50–100 MB RAM
+- Total: **~300 MB** — deployable on any G9 worker node
+
+**Cons:**
+- Smaller community than AWX/Jenkins
+- Less granular RBAC than AWX
+- No built-in credential plugins (e.g., HashiCorp Vault) — must use env vars or files
+
+---
+
+### 4.2 Kestra
+
+**What it is:** Language-agnostic workflow orchestration platform with a visual DAG editor. Not Ansible-specific, but can invoke Ansible via `io.kestra.plugin.scripts.shell.Commands` or `io.kestra.plugin.core.http.Request`.
+
+**Docker Compose:**
+
+```yaml
+volumes:
+  postgres-data:
+    driver: local
+  kestra-data:
+    driver: local
+
+services:
+  postgres:
+    image: postgres:18
+    volumes:
+      - postgres-data:/var/lib/postgresql
+    environment:
+      POSTGRES_DB: kestra
+      POSTGRES_USER: kestra
+      POSTGRES_PASSWORD: k3str4
+
+  kestra:
+    image: kestra/kestra:latest
+    user: "root"
+    command: server standalone
+    volumes:
+      - kestra-data:/app/storage
+      - /var/run/docker.sock:/var/run/docker.sock
+      - /tmp/kestra-wd:/tmp/kestra-wd
+      - /path/to/ansible:/ansible:ro
+    environment:
+      KESTRA_CONFIGURATION: |
+        datasources:
+          postgres:
+            url: jdbc:postgresql://postgres:5432/kestra
+            password: k3str4
+        repository:
+          type: postgres
+        storage:
+          type: local
+          local:
+            base-path: "/app/storage"
+        queue:
+          type: postgres
+        url: http://localhost:8080/
+    ports:
+      - "8080:8080"
+    depends_on:
+      - postgres
+```
+
+**Key Features:**
+- **Visual DAG Editor:** Drag-and-drop workflow construction
+- **Rich Triggers:** Schedule, webhook, event-driven (Kafka, S3, HTTP)
+- **Plugin Ecosystem:** 400+ plugins (not Ansible-native — invoke via shell)
+- **Scalability:** Built for large-scale data pipelines; may be overkill for fleet Ansible
+
+**Resource Footprint:**
+- PostgreSQL: ~300 MB RAM
+- Kestra: ~512 MB–1 GB RAM
+- Total: **~1 GB** — heavier than Semaphore
+
+**Verdict for Iron Legion:** Powerful but misaligned. We need Ansible-native execution, not generic workflow orchestration. Use Kestra for data/ETL pipelines, not playbook management.
+
+---
+
+### 4.3 AWX
+
+**What it is:** The upstream open-source project behind Ansible Automation Platform (formerly Ansible Tower). Full-featured enterprise Ansible management.
+
+**Key Features:**
+- **Projects:** Link to Git repos (Gitea supported), auto-sync on push
+- **Inventories:** Static, dynamic (custom scripts, cloud providers), smart inventories
+- **Job Templates:** Parameterized with surveys, credentials, and RBAC
+- **Workflows:** Chain multiple job templates into visual pipelines
+- **RBAC:** Teams, organizations, user roles — most granular of all options
+- **Notifications:** Email, Slack, webhook on job success/failure
+
+**Deployment:**
+- Docker Compose exists but is officially a **development** target; production requires Kubernetes
+- Requires Redis, PostgreSQL, memcached, and multiple AWX services
+- Total RAM: **4–6 GB minimum**
+
+**Verdict for Iron Legion:** Overkill. Our fleet nodes (G9: ~11 GB RAM) could run AWX, but it would consume half a node's capacity. G9 nodes are better used as PVE workers with LXCs. AWX belongs on a dedicated management VM or MK7 if hardware permits.
+
+---
+
+### 4.4 Rundeck
+
+**What it is:** Open-source operations automation platform with an Ansible plugin.
+
+**Docker Compose:** Simple single-container deployment with external database.
+
+**Key Features:**
+- **Job Definitions:** YAML or XML, supports Ansible ad-hoc and playbook execution
+- **Node Inventory:** Static or dynamic via Ansible inventory scripts
+- **ACL Policies:** File-based RBAC
+- **Scheduled Executions:** Built-in scheduler
+- **Plugin Architecture:** Ansible, Slack, HTTP webhooks
+
+**Resource Footprint:**
+- Rundeck: ~512 MB RAM
+- MySQL/PostgreSQL: ~200–300 MB
+- Total: **~700–800 MB**
+
+**Verdict for Iron Legion:** Viable middle-ground. Better than Jenkins for Ansible, but Semaphore is purpose-built and lighter. Rundeck's strength is multi-tool orchestration (Ansible + scripts + HTTP APIs), which we don't need yet.
+
+---
+
+### 4.5 Jenkins + Ansible Plugin
+
+**What it is:** General-purpose CI/CD platform with Ansible integration via plugins.
+
+**Docker Compose:**
+
+```yaml
+services:
+  jenkins:
+    image: jenkins/jenkins:lts
+    ports:
+      - "8080:8080"
+      - "50000:50000"
+    volumes:
+      - jenkins-data:/var/jenkins_home
+      - /path/to/ansible/playbooks:/playbooks:ro
+      - /path/to/inventories:/inventories:ro
+    restart: unless-stopped
+
+volumes:
+  jenkins-data:
+    driver: local
+```
+
+**Key Features:**
+- **Pipelines:** Groovy-based Jenkinsfile pipelines for Ansible execution
+- **Blue Ocean:** Modern UI for pipeline visualization
+- **Plugin Ecosystem:** Massive library (Ansible, Slack, Git, Gitea)
+- **Distributed Builds:** Agent nodes for parallel playbook runs
+
+**Resource Footprint:**
+- Jenkins: ~1 GB RAM (grows with plugin load)
+- Optional agents: variable
+- Total: **~1–2 GB**
+
+**Verdict for Iron Legion:** Wrong tool for the job. Jenkins excels at CI/CD pipelines (build → test → deploy), not at day-to-day Ansible playbook management. The UI is pipeline-centric, not inventory- or template-centric. Use Jenkins for software CI/CD, not fleet automation.
+
+---
+
+## 5. Recommendation
+
+| Use Case | Recommended Tool |
+|----------|---------------|
+| **Primary Ansible playbook runner** | **Semaphore UI** |
+| Complex enterprise RBAC + workflows | AWX (on dedicated VM) |
+| Generic workflow orchestration (not Ansible-specific) | Kestra |
+| Multi-tool ops automation (Ansible + scripts + APIs) | Rundeck |
+| Software CI/CD pipelines | Jenkins |
+
+**Iron Legion Path Forward:**
+1. **Deploy Semaphore UI** on MK7 Swarm or a lightweight LXC on MK33
+2. Create a Project pointing to `Iron-Legion/ansible-playbooks` on Gitea
+3. Configure inventories, task templates, and schedules
+4. Add Gitea webhook to auto-trigger Semaphore tasks on push to `main`
+5. **Optional:** Evaluate AWX later if RBAC/complexity demands grow — deploy on a dedicated management LXC with 4 GB RAM reservation
+
+---
+
+## 6. Open Questions
+
+1. **Should Semaphore run as a standalone Docker Compose stack or as a Swarm service?**
+   - Standalone: simpler, survives Swarm reconfiguration
+   - Swarm: automatic placement, Traefik ingress, less manual maintenance
+
+2. **Where does the Ansible inventory live?**
+   - Option A: In the Gitea repo alongside playbooks (version-controlled)
+   - Option B: Static files on the Semaphore host (faster Semaphore startup)
+   - Option C: Dynamic inventory script pulling from Technitium DNS/PVE API
+
+3. **Gitea webhook reachability:**
+   - Gitea on Neo (`192.168.192.24`) → Semaphore on MK7 or G9 node
+   - Must ensure Semaphore endpoint is reachable from Neo (LAN routing)
+   - Can use Tailscale as fallback
+
+---
+
+*End of PRD — Iron Legion Labs*
--- a/Drafts/terraform-proxmox-lxc-automation.md
+++ b/Drafts/terraform-proxmox-lxc-automation.md
@@ -63,7 +63,7 @@ terraform {
 }

 provider "proxmox" {
-  endpoint = "https://192.168.7.7:8006/"
+  endpoint = "https://192.168.7.33:8006/"
  username = "root@pam"
  password = var.proxmox_password  # or PROXMOX_VE_PASSWORD env var
  insecure = true                  # self-signed TLS
@@ -156,13 +156,13 @@ terraform {
 variable "proxmox_endpoint" {
  description = "PVE API URL"
  type        = string
-  default     = "https://192.168.7.7:8006/"
+  default     = "https://192.168.7.33:8006/"
 }

 variable "proxmox_node" {
  description = "Target PVE node name"
  type        = string
-  default     = "mk7"
+  default     = "mk33"
 }

 variable "ssh_public_key" {
@@ -332,7 +332,7 @@ output "lxc_passwords" {
 module "dev_lxcs" {
  source = "../../modules/lxc"

-  proxxmox_node  = "mk7"
+  proxxmox_node  = "mk33"
  ssh_public_key = file("~/.ssh/id_ed25519.pub")

  lxc_configs = {
@@ -400,13 +400,13 @@ Use data sources to query existing infrastructure without managing it:

 ```hcl
 data "proxmox_virtual_environment_datastores" "available" {
-  node_name = "mk7"
+  node_name = var.proxmox_node
 }

 data "proxmox_virtual_environment_nodes" "cluster" {}

 data "proxmox_virtual_environment_container" "existing" {
-  node_name = "mk7"
+  node_name = var.proxmox_node  # or specify target node explicitly
  vm_id     = 2001
 }
 ```
@@ -422,7 +422,7 @@ data "proxmox_virtual_environment_container" "existing" {

 ### Recommended: S3-Compatible Backend

-Iron Legion already runs self-hosted services. A Garage or Minio instance on Neo/MK7 can serve as the Terraform state backend:
+Iron Legion already runs self-hosted services. A Garage or Minio instance on a fleet storage node (e.g., Neo) can serve as the Terraform state backend:

 ```hcl
 terraform {
@@ -447,6 +447,78 @@ Add a DynamoDB-compatible table or use a native locking mechanism. If S3 backend

 ---

+## Optional: Atlantis Web UI for Terraform PR Automation
+
+### What Atlantis Is
+
+Atlantis is a self-hosted web application that listens for webhook events from Git repositories and runs `terraform plan` / `terraform apply` automatically inside PR/MR workflows. It posts plan output back to the PR as comments, enforces approval gates, and locks workspaces to prevent concurrent applies.
+
+### Can Atlantis Manage LXC Resources via `bpg/proxmox`?
+
+**Yes.** Atlantis is a Terraform orchestration layer, not a provider. It supports any Terraform provider including `bpg/proxmox`. The workflow is:
+1. Developer opens a PR adding/modifying `.tf` files defining LXC containers
+2. Atlantis receives the webhook and runs `terraform plan` in a isolated directory
+3. Plan output posted as a PR comment — team reviews before approval
+4. After approval (or `atlantis apply` comment), Atlantis runs `terraform apply`
+
+### Atlantis Docker Compose (Self-Hosted)
+
+```yaml
+services:
+  atlantis:
+    image: ghcr.io/runatlantis/atlantis:latest
+    ports:
+      - "4141:4141"
+    volumes:
+      - ${HOME}/.ssh:/home/atlantis/.ssh:ro           # Git SSH key
+      - /var/run/docker.sock:/var/run/docker.sock:ro # if using Docker TF provider
+      - atlantis-data:/home/atlantis/.atlantis
+    environment:
+      ATLANTIS_GH_USER: "iron-legion-bot"              # or ATLANTIS_GITLAB_USER / ATLANTIS_GITEA_USER
+      ATLANTIS_GH_TOKEN: "${ATLANTIS_GH_TOKEN}"        # personal access token
+      ATLANTIS_REPO_ALLOWLIST: "github.com/Iron-Legion/*"
+      ATLANTIS_GH_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
+      # For Gitea:
+      # ATLANTIS_GITEA_USER: "iron-legion-bot"
+      # ATLANTIS_GITEA_TOKEN: "${GITEA_TOKEN}"
+      # ATLANTIS_GITEA_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
+    command: server
+    restart: unless-stopped
+
+    # Optional: Redis for distributed locking in multi-replica setups
+    # redis:
+    #   image: redis:8-alpine
+    #   volumes:
+    #     - redis-data:/data
+    #   restart: always
+
+volumes:
+  atlantis-data:
+    driver: local
+```
+
+### Key Features
+
+- **Plan Comments:** Every PR gets an auto-generated `terraform plan` comment
+- **Apply Locking:** One apply at a time per workspace; concurrent PRs queue
+- **Policy Checks:** Integrate OPA (Open Policy Agent) or custom scripts to block non-compliant changes
+- **Custom Workflows:** Define per-repo or per-directory workflows (e.g., plan-only for dev, auto-apply for staging)
+- **Self-Hosted SCM:** Native webhook support for GitHub, GitLab, Bitbucket, **and Gitea**
+
+### Resource Footprint
+
+- Atlantis container: ~100–200 MB RAM, minimal CPU
+- Optional Redis: ~20 MB RAM
+- Total: fits comfortably on any Iron Legion node (MK7, MK33–42, Neo)
+
+### Gitea Integration Notes
+
+- Atlantis supports Gitea via the `--gitea-user`, `--gitea-token`, `--gitea-webhook-secret` flags
+- Must expose Atlantis endpoint to Gitea (Tailscale funnel, reverse proxy, or LAN if Gitea is in-network)
+- Webhook URL: `http://atlantis-host:4141/events`
+
+---
+
 ## 9. Operational Workflow

 ### Day 0 — Bootstrap
@@ -509,7 +581,7 @@ terraform apply tfplan
   - Terraform can call a `dns_a_record` module (if Technitium provider exists)
   - Or: use PVE's built-in DHCP + DNSMASQ if configured

-3. **CI/CD pipeline: GitHub Actions runner, or local Gitea Actions on Neo?**
+3. **CI/CD pipeline: GitHub Actions runner, or local Gitea Actions on the fleet SCM host?**
   - Gitea Actions keeps secrets in-network
   - GitHub Actions requires Tailscale funnel or external exposure