PRD Updates: Fix MK7/Neo references; add Atlantis section; new Ansible Web UI comparison PRD

This commit is contained in:
F.R.I.D.A.Y.
2026-06-02 06:31:15 -04:00
committed by F.R.I.D.A.Y.
parent 4377ffaffa
commit fa7a6a2669
2 changed files with 396 additions and 8 deletions

View File

@@ -0,0 +1,316 @@
# Ansible Automation Web UI Comparison PRD
**Status:** Draft | **Author:** F.R.I.D.A.Y. (Hermes Agent) | **Date:** 2026-06-02
---
## 1. Purpose & Scope
This PRD evaluates web-based UIs for running and managing Ansible playbooks in the Iron Legion fleet. The focus is on self-hosted, Docker-friendly solutions that integrate with our existing Gitea SCM and are deployable on Swarm or standalone nodes.
**Tools Evaluated:**
1. Semaphore UI (Ansible-native) — RECOMMENDED
2. Kestra (Generic orchestration, Ansible-compatible)
3. AWX (Official Red Hat Ansible platform)
4. Rundeck (Ops automation with Ansible plugin)
5. Jenkins + Ansible Plugin (CI/CD generalist)
---
## 2. Requirements
**Must-Have:**
- [x] Docker Compose or Swarm deployable
- [x] Ansible playbook execution (not just shell scripts calling ansible)
- [x] Web UI for triggering runs, viewing logs, managing inventories
- [x] Self-hosted (no cloud dependency)
- [x] Works on Iron Legion architecture (x86_64, moderate RAM)
**Nice-to-Have:**
- [ ] Gitea webhook integration (auto-trigger on push)
- [ ] RBAC / multi-user access
- [ ] API for automation
- [ ] Scheduled runs (cron-like)
- [ ] Low resource footprint (fit on G9 nodes)
---
## 3. Comparison Matrix
| Criterion | Semaphore UI | Kestra | AWX | Rundeck | Jenkins + Ansible |
|-----------|-------------|--------|-----|---------|-------------------|
| **Primary Purpose** | Ansible-native runner | Generic workflow engine | Enterprise Ansible platform | Ops automation | CI/CD generalist |
| **Docker Compose** | ✅ Simple | ✅ Simple | ⚠️ Complex (K8s preferred) | ✅ Simple | ✅ Simple |
| **RAM Needed** | ~256 MB | ~512 MB | ~4 GB (6+ GB recommended) | ~512 MB | ~1 GB |
| **Ansible Integration** | Native | Via shell/HTTP tasks | Native | Plugin-based | Plugin-based |
| **Inventory Management** | Built-in (static + dynamic) | Via external files | Advanced (sources, scripts) | Basic | Via files/plugins |
| **Gitea Webhooks** | ✅ Supported | ✅ Supported | ⚠️ Requires AWX project sync | ✅ Via plugin | ✅ Via SCM polling |
| **RBAC / Multi-user** | ✅ | ✅ | ✅ Enterprise-grade | ✅ | ✅ Plugin-based |
| **Scheduled Runs** | ✅ Cron UI | ✅ Triggers | ✅ Schedules | ✅ Jobs scheduler | ✅ Cron trigger plugin |
| **Log Viewer** | ✅ Real-time | ✅ Real-time | ✅ Real-time + facts | ✅ | ✅ Plugin-dependent |
| **Vault Integration** | ✅ Key store built-in | Via secrets | ✅ Native | Via plugins | Via plugins |
| **Complexity** | Low | Medium | High | Medium | High |
---
## 4. Tool Deep-Dives
### 4.1 Semaphore UI (RECOMMENDED)
**Why it wins:** Purpose-built for Ansible, minimal footprint, fast UI, and fits Iron Legion constraints.
**Docker Compose:**
```yaml
services:
mysql:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: semaphore-db-password
MYSQL_DATABASE: semaphore
MYSQL_USER: semaphore
MYSQL_PASSWORD: semaphore-db-password
volumes:
- semaphore-mysql:/var/lib/mysql
restart: unless-stopped
semaphore:
image: semaphoreui/semaphore:latest
ports:
- "3000:3000"
environment:
SEMAPHORE_DB_DIALECT: mysql
SEMAPHORE_DB_HOST: mysql
SEMAPHORE_DB_NAME: semaphore
SEMAPHORE_DB_USER: semaphore
SEMAPHORE_DB_PASS: semaphore-db-password
SEMAPHORE_ADMIN_PASSWORD: admin-password
SEMAPHORE_ADMIN_NAME: admin
SEMAPHORE_ADMIN_EMAIL: admin@localhost
SEMAPHORE_ADMIN: admin
# Optional: Telegram / Slack / Gitea integration
SEMAPHORE_WEBHOOK: "1"
volumes:
- semaphore-config:/etc/semaphore
- /path/to/ansible/playbooks:/playbooks:ro
- /path/to/inventories:/inventories:ro
- /path/to/ssh/keys:/ssh:ro
depends_on:
- mysql
restart: unless-stopped
volumes:
semaphore-mysql:
driver: local
semaphore-config:
driver: local
```
**Key Features:**
- **Project-centric:** Organize playbooks into projects with separate inventories, env vars, and access
- **Task Templates:** Define reusable job definitions with variables and surveys
- **Key Store:** Built-in encrypted vault for SSH keys, passwords, Ansible vault passwords
- **Cron Schedules:** UI-driven scheduling without crontab
- **Real-time Logs:** WebSocket-based live log streaming
- **Gitea Integration:** Add a Gitea repository as a project, clone on each run, webhooks for auto-trigger
**Resource Footprint:**
- MySQL: ~200 MB RAM
- Semaphore: ~50100 MB RAM
- Total: **~300 MB** — deployable on any G9 worker node
**Cons:**
- Smaller community than AWX/Jenkins
- Less granular RBAC than AWX
- No built-in credential plugins (e.g., HashiCorp Vault) — must use env vars or files
---
### 4.2 Kestra
**What it is:** Language-agnostic workflow orchestration platform with a visual DAG editor. Not Ansible-specific, but can invoke Ansible via `io.kestra.plugin.scripts.shell.Commands` or `io.kestra.plugin.core.http.Request`.
**Docker Compose:**
```yaml
volumes:
postgres-data:
driver: local
kestra-data:
driver: local
services:
postgres:
image: postgres:18
volumes:
- postgres-data:/var/lib/postgresql
environment:
POSTGRES_DB: kestra
POSTGRES_USER: kestra
POSTGRES_PASSWORD: k3str4
kestra:
image: kestra/kestra:latest
user: "root"
command: server standalone
volumes:
- kestra-data:/app/storage
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/kestra-wd:/tmp/kestra-wd
- /path/to/ansible:/ansible:ro
environment:
KESTRA_CONFIGURATION: |
datasources:
postgres:
url: jdbc:postgresql://postgres:5432/kestra
password: k3str4
repository:
type: postgres
storage:
type: local
local:
base-path: "/app/storage"
queue:
type: postgres
url: http://localhost:8080/
ports:
- "8080:8080"
depends_on:
- postgres
```
**Key Features:**
- **Visual DAG Editor:** Drag-and-drop workflow construction
- **Rich Triggers:** Schedule, webhook, event-driven (Kafka, S3, HTTP)
- **Plugin Ecosystem:** 400+ plugins (not Ansible-native — invoke via shell)
- **Scalability:** Built for large-scale data pipelines; may be overkill for fleet Ansible
**Resource Footprint:**
- PostgreSQL: ~300 MB RAM
- Kestra: ~512 MB1 GB RAM
- Total: **~1 GB** — heavier than Semaphore
**Verdict for Iron Legion:** Powerful but misaligned. We need Ansible-native execution, not generic workflow orchestration. Use Kestra for data/ETL pipelines, not playbook management.
---
### 4.3 AWX
**What it is:** The upstream open-source project behind Ansible Automation Platform (formerly Ansible Tower). Full-featured enterprise Ansible management.
**Key Features:**
- **Projects:** Link to Git repos (Gitea supported), auto-sync on push
- **Inventories:** Static, dynamic (custom scripts, cloud providers), smart inventories
- **Job Templates:** Parameterized with surveys, credentials, and RBAC
- **Workflows:** Chain multiple job templates into visual pipelines
- **RBAC:** Teams, organizations, user roles — most granular of all options
- **Notifications:** Email, Slack, webhook on job success/failure
**Deployment:**
- Docker Compose exists but is officially a **development** target; production requires Kubernetes
- Requires Redis, PostgreSQL, memcached, and multiple AWX services
- Total RAM: **46 GB minimum**
**Verdict for Iron Legion:** Overkill. Our fleet nodes (G9: ~11 GB RAM) could run AWX, but it would consume half a node's capacity. G9 nodes are better used as PVE workers with LXCs. AWX belongs on a dedicated management VM or MK7 if hardware permits.
---
### 4.4 Rundeck
**What it is:** Open-source operations automation platform with an Ansible plugin.
**Docker Compose:** Simple single-container deployment with external database.
**Key Features:**
- **Job Definitions:** YAML or XML, supports Ansible ad-hoc and playbook execution
- **Node Inventory:** Static or dynamic via Ansible inventory scripts
- **ACL Policies:** File-based RBAC
- **Scheduled Executions:** Built-in scheduler
- **Plugin Architecture:** Ansible, Slack, HTTP webhooks
**Resource Footprint:**
- Rundeck: ~512 MB RAM
- MySQL/PostgreSQL: ~200300 MB
- Total: **~700800 MB**
**Verdict for Iron Legion:** Viable middle-ground. Better than Jenkins for Ansible, but Semaphore is purpose-built and lighter. Rundeck's strength is multi-tool orchestration (Ansible + scripts + HTTP APIs), which we don't need yet.
---
### 4.5 Jenkins + Ansible Plugin
**What it is:** General-purpose CI/CD platform with Ansible integration via plugins.
**Docker Compose:**
```yaml
services:
jenkins:
image: jenkins/jenkins:lts
ports:
- "8080:8080"
- "50000:50000"
volumes:
- jenkins-data:/var/jenkins_home
- /path/to/ansible/playbooks:/playbooks:ro
- /path/to/inventories:/inventories:ro
restart: unless-stopped
volumes:
jenkins-data:
driver: local
```
**Key Features:**
- **Pipelines:** Groovy-based Jenkinsfile pipelines for Ansible execution
- **Blue Ocean:** Modern UI for pipeline visualization
- **Plugin Ecosystem:** Massive library (Ansible, Slack, Git, Gitea)
- **Distributed Builds:** Agent nodes for parallel playbook runs
**Resource Footprint:**
- Jenkins: ~1 GB RAM (grows with plugin load)
- Optional agents: variable
- Total: **~12 GB**
**Verdict for Iron Legion:** Wrong tool for the job. Jenkins excels at CI/CD pipelines (build → test → deploy), not at day-to-day Ansible playbook management. The UI is pipeline-centric, not inventory- or template-centric. Use Jenkins for software CI/CD, not fleet automation.
---
## 5. Recommendation
| Use Case | Recommended Tool |
|----------|---------------|
| **Primary Ansible playbook runner** | **Semaphore UI** |
| Complex enterprise RBAC + workflows | AWX (on dedicated VM) |
| Generic workflow orchestration (not Ansible-specific) | Kestra |
| Multi-tool ops automation (Ansible + scripts + APIs) | Rundeck |
| Software CI/CD pipelines | Jenkins |
**Iron Legion Path Forward:**
1. **Deploy Semaphore UI** on MK7 Swarm or a lightweight LXC on MK33
2. Create a Project pointing to `Iron-Legion/ansible-playbooks` on Gitea
3. Configure inventories, task templates, and schedules
4. Add Gitea webhook to auto-trigger Semaphore tasks on push to `main`
5. **Optional:** Evaluate AWX later if RBAC/complexity demands grow — deploy on a dedicated management LXC with 4 GB RAM reservation
---
## 6. Open Questions
1. **Should Semaphore run as a standalone Docker Compose stack or as a Swarm service?**
- Standalone: simpler, survives Swarm reconfiguration
- Swarm: automatic placement, Traefik ingress, less manual maintenance
2. **Where does the Ansible inventory live?**
- Option A: In the Gitea repo alongside playbooks (version-controlled)
- Option B: Static files on the Semaphore host (faster Semaphore startup)
- Option C: Dynamic inventory script pulling from Technitium DNS/PVE API
3. **Gitea webhook reachability:**
- Gitea on Neo (`192.168.192.24`) → Semaphore on MK7 or G9 node
- Must ensure Semaphore endpoint is reachable from Neo (LAN routing)
- Can use Tailscale as fallback
---
*End of PRD — Iron Legion Labs*

View File

@@ -63,7 +63,7 @@ terraform {
}
provider "proxmox" {
endpoint = "https://192.168.7.7:8006/"
endpoint = "https://192.168.7.33:8006/"
username = "root@pam"
password = var.proxmox_password # or PROXMOX_VE_PASSWORD env var
insecure = true # self-signed TLS
@@ -156,13 +156,13 @@ terraform {
variable "proxmox_endpoint" {
description = "PVE API URL"
type = string
default = "https://192.168.7.7:8006/"
default = "https://192.168.7.33:8006/"
}
variable "proxmox_node" {
description = "Target PVE node name"
type = string
default = "mk7"
default = "mk33"
}
variable "ssh_public_key" {
@@ -332,7 +332,7 @@ output "lxc_passwords" {
module "dev_lxcs" {
source = "../../modules/lxc"
proxxmox_node = "mk7"
proxxmox_node = "mk33"
ssh_public_key = file("~/.ssh/id_ed25519.pub")
lxc_configs = {
@@ -400,13 +400,13 @@ Use data sources to query existing infrastructure without managing it:
```hcl
data "proxmox_virtual_environment_datastores" "available" {
node_name = "mk7"
node_name = var.proxmox_node
}
data "proxmox_virtual_environment_nodes" "cluster" {}
data "proxmox_virtual_environment_container" "existing" {
node_name = "mk7"
node_name = var.proxmox_node # or specify target node explicitly
vm_id = 2001
}
```
@@ -422,7 +422,7 @@ data "proxmox_virtual_environment_container" "existing" {
### Recommended: S3-Compatible Backend
Iron Legion already runs self-hosted services. A Garage or Minio instance on Neo/MK7 can serve as the Terraform state backend:
Iron Legion already runs self-hosted services. A Garage or Minio instance on a fleet storage node (e.g., Neo) can serve as the Terraform state backend:
```hcl
terraform {
@@ -447,6 +447,78 @@ Add a DynamoDB-compatible table or use a native locking mechanism. If S3 backend
---
## Optional: Atlantis Web UI for Terraform PR Automation
### What Atlantis Is
Atlantis is a self-hosted web application that listens for webhook events from Git repositories and runs `terraform plan` / `terraform apply` automatically inside PR/MR workflows. It posts plan output back to the PR as comments, enforces approval gates, and locks workspaces to prevent concurrent applies.
### Can Atlantis Manage LXC Resources via `bpg/proxmox`?
**Yes.** Atlantis is a Terraform orchestration layer, not a provider. It supports any Terraform provider including `bpg/proxmox`. The workflow is:
1. Developer opens a PR adding/modifying `.tf` files defining LXC containers
2. Atlantis receives the webhook and runs `terraform plan` in a isolated directory
3. Plan output posted as a PR comment — team reviews before approval
4. After approval (or `atlantis apply` comment), Atlantis runs `terraform apply`
### Atlantis Docker Compose (Self-Hosted)
```yaml
services:
atlantis:
image: ghcr.io/runatlantis/atlantis:latest
ports:
- "4141:4141"
volumes:
- ${HOME}/.ssh:/home/atlantis/.ssh:ro # Git SSH key
- /var/run/docker.sock:/var/run/docker.sock:ro # if using Docker TF provider
- atlantis-data:/home/atlantis/.atlantis
environment:
ATLANTIS_GH_USER: "iron-legion-bot" # or ATLANTIS_GITLAB_USER / ATLANTIS_GITEA_USER
ATLANTIS_GH_TOKEN: "${ATLANTIS_GH_TOKEN}" # personal access token
ATLANTIS_REPO_ALLOWLIST: "github.com/Iron-Legion/*"
ATLANTIS_GH_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
# For Gitea:
# ATLANTIS_GITEA_USER: "iron-legion-bot"
# ATLANTIS_GITEA_TOKEN: "${GITEA_TOKEN}"
# ATLANTIS_GITEA_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
command: server
restart: unless-stopped
# Optional: Redis for distributed locking in multi-replica setups
# redis:
# image: redis:8-alpine
# volumes:
# - redis-data:/data
# restart: always
volumes:
atlantis-data:
driver: local
```
### Key Features
- **Plan Comments:** Every PR gets an auto-generated `terraform plan` comment
- **Apply Locking:** One apply at a time per workspace; concurrent PRs queue
- **Policy Checks:** Integrate OPA (Open Policy Agent) or custom scripts to block non-compliant changes
- **Custom Workflows:** Define per-repo or per-directory workflows (e.g., plan-only for dev, auto-apply for staging)
- **Self-Hosted SCM:** Native webhook support for GitHub, GitLab, Bitbucket, **and Gitea**
### Resource Footprint
- Atlantis container: ~100200 MB RAM, minimal CPU
- Optional Redis: ~20 MB RAM
- Total: fits comfortably on any Iron Legion node (MK7, MK3342, Neo)
### Gitea Integration Notes
- Atlantis supports Gitea via the `--gitea-user`, `--gitea-token`, `--gitea-webhook-secret` flags
- Must expose Atlantis endpoint to Gitea (Tailscale funnel, reverse proxy, or LAN if Gitea is in-network)
- Webhook URL: `http://atlantis-host:4141/events`
---
## 9. Operational Workflow
### Day 0 — Bootstrap
@@ -509,7 +581,7 @@ terraform apply tfplan
- Terraform can call a `dns_a_record` module (if Technitium provider exists)
- Or: use PVE's built-in DHCP + DNSMASQ if configured
3. **CI/CD pipeline: GitHub Actions runner, or local Gitea Actions on Neo?**
3. **CI/CD pipeline: GitHub Actions runner, or local Gitea Actions on the fleet SCM host?**
- Gitea Actions keeps secrets in-network
- GitHub Actions requires Tailscale funnel or external exposure