Files
documentation/PRD Drafts/ansible-automation-webui-comparison.md

465 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ansible Automation Web UI Comparison PRD
**Status:** Draft | **Author:** F.R.I.D.A.Y. (Hermes Agent) | **Date:** 2026-06-02
---
## 1. Purpose & Scope
This PRD evaluates web-based UIs for running and managing Ansible playbooks in the Iron Legion fleet. The focus is on self-hosted, Docker-friendly solutions that integrate with our existing Gitea SCM and are deployable on Swarm or standalone nodes.
**Tools Evaluated:**
1. Semaphore UI (Ansible-native) — RECOMMENDED
2. Kestra (Generic orchestration, Ansible-compatible)
3. AWX (Official Red Hat Ansible platform)
4. Rundeck (Ops automation with Ansible plugin)
5. Jenkins + Ansible Plugin (CI/CD generalist)
---
## 2. Requirements
**Must-Have:**
- [x] Docker Compose or Swarm deployable
- [x] Ansible playbook execution (not just shell scripts calling ansible)
- [x] Web UI for triggering runs, viewing logs, managing inventories
- [x] Self-hosted (no cloud dependency)
- [x] Works on Iron Legion architecture (x86_64, moderate RAM)
**Nice-to-Have:**
- [ ] Gitea webhook integration (auto-trigger on push)
- [ ] RBAC / multi-user access
- [ ] API for automation
- [ ] Scheduled runs (cron-like)
- [ ] Low resource footprint (fit on G9 nodes)
---
## 3. Comparison Matrix
| Criterion | Semaphore UI | Kestra | AWX | Rundeck | Jenkins + Ansible |
|-----------|-------------|--------|-----|---------|-------------------|
| **Primary Purpose** | Ansible-native runner | Generic workflow engine | Enterprise Ansible platform | Ops automation | CI/CD generalist |
| **Docker Compose** | ✅ Simple | ✅ Simple | ⚠️ Complex (K8s preferred) | ✅ Simple | ✅ Simple |
| **RAM Needed** | ~256 MB | ~512 MB | ~4 GB (6+ GB recommended) | ~512 MB | ~1 GB |
| **Ansible Integration** | Native | Via shell/HTTP tasks | Native | Plugin-based | Plugin-based |
| **Inventory Management** | Built-in (static + dynamic) | Via external files | Advanced (sources, scripts) | Basic | Via files/plugins |
| **Gitea Webhooks** | ✅ Supported | ✅ Supported | ⚠️ Requires AWX project sync | ✅ Via plugin | ✅ Via SCM polling |
| **RBAC / Multi-user** | ✅ | ✅ | ✅ Enterprise-grade | ✅ | ✅ Plugin-based |
| **Scheduled Runs** | ✅ Cron UI | ✅ Triggers | ✅ Schedules | ✅ Jobs scheduler | ✅ Cron trigger plugin |
| **Log Viewer** | ✅ Real-time | ✅ Real-time | ✅ Real-time + facts | ✅ | ✅ Plugin-dependent |
| **Vault Integration** | ✅ Key store built-in | Via secrets | ✅ Native | Via plugins | Via plugins |
| **Complexity** | Low | Medium | High | Medium | High |
---
## 4. Tool Deep-Dives
### 4.1 Semaphore UI (RECOMMENDED)
**Why it wins:** Purpose-built for Ansible, minimal footprint, fast UI, and fits Iron Legion constraints.
**Docker Compose:**
```yaml
services:
mysql:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: semaphore-db-password
MYSQL_DATABASE: semaphore
MYSQL_USER: semaphore
MYSQL_PASSWORD: semaphore-db-password
volumes:
- semaphore-mysql:/var/lib/mysql
restart: unless-stopped
semaphore:
image: semaphoreui/semaphore:latest
ports:
- "3000:3000"
environment:
SEMAPHORE_DB_DIALECT: mysql
SEMAPHORE_DB_HOST: mysql
SEMAPHORE_DB_NAME: semaphore
SEMAPHORE_DB_USER: semaphore
SEMAPHORE_DB_PASS: semaphore-db-password
SEMAPHORE_ADMIN_PASSWORD: admin-password
SEMAPHORE_ADMIN_NAME: admin
SEMAPHORE_ADMIN_EMAIL: admin@localhost
SEMAPHORE_ADMIN: admin
# Optional: Telegram / Slack / Gitea integration
SEMAPHORE_WEBHOOK: "1"
volumes:
- semaphore-config:/etc/semaphore
- /path/to/ansible/playbooks:/playbooks:ro
- /path/to/inventories:/inventories:ro
- /path/to/ssh/keys:/ssh:ro
depends_on:
- mysql
restart: unless-stopped
volumes:
semaphore-mysql:
driver: local
semaphore-config:
driver: local
```
**Key Features:**
- **Project-centric:** Organize playbooks into projects with separate inventories, env vars, and access
- **Task Templates:** Define reusable job definitions with variables and surveys
- **Key Store:** Built-in encrypted vault for SSH keys, passwords, Ansible vault passwords
- **Cron Schedules:** UI-driven scheduling without crontab
- **Real-time Logs:** WebSocket-based live log streaming
- **Gitea Integration:** Add a Gitea repository as a project, clone on each run, webhooks for auto-trigger
**Resource Footprint:**
- MySQL: ~200 MB RAM
- Semaphore: ~50100 MB RAM
- Total: **~300 MB** — deployable on any G9 worker node
**Cons:**
- Smaller community than AWX/Jenkins
- Less granular RBAC than AWX
- No built-in credential plugins (e.g., HashiCorp Vault) — must use env vars or files
---
### 4.2 Kestra
**What it is:** Language-agnostic workflow orchestration platform with a visual DAG editor. Not Ansible-specific, but can invoke Ansible via `io.kestra.plugin.scripts.shell.Commands` or `io.kestra.plugin.core.http.Request`.
**Docker Compose:**
```yaml
volumes:
postgres-data:
driver: local
kestra-data:
driver: local
services:
postgres:
image: postgres:18
volumes:
- postgres-data:/var/lib/postgresql
environment:
POSTGRES_DB: kestra
POSTGRES_USER: kestra
POSTGRES_PASSWORD: k3str4
kestra:
image: kestra/kestra:latest
user: "root"
command: server standalone
volumes:
- kestra-data:/app/storage
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/kestra-wd:/tmp/kestra-wd
- /path/to/ansible:/ansible:ro
environment:
KESTRA_CONFIGURATION: |
datasources:
postgres:
url: jdbc:postgresql://postgres:5432/kestra
password: k3str4
repository:
type: postgres
storage:
type: local
local:
base-path: "/app/storage"
queue:
type: postgres
url: http://localhost:8080/
ports:
- "8080:8080"
depends_on:
- postgres
```
**Key Features:**
- **Visual DAG Editor:** Drag-and-drop workflow construction
- **Rich Triggers:** Schedule, webhook, event-driven (Kafka, S3, HTTP)
- **Plugin Ecosystem:** 400+ plugins (not Ansible-native — invoke via shell)
- **Scalability:** Built for large-scale data pipelines; may be overkill for fleet Ansible
**Resource Footprint:**
- PostgreSQL: ~300 MB RAM
- Kestra: ~512 MB1 GB RAM
- Total: **~1 GB** — heavier than Semaphore
**Verdict for Iron Legion:** Powerful but misaligned. We need Ansible-native execution, not generic workflow orchestration. Use Kestra for data/ETL pipelines, not playbook management.
---
### 4.3 AWX
**What it is:** The upstream open-source project behind Ansible Automation Platform (formerly Ansible Tower). Full-featured enterprise Ansible management.
**Key Features:**
- **Projects:** Link to Git repos (Gitea supported), auto-sync on push
- **Inventories:** Static, dynamic (custom scripts, cloud providers), smart inventories
- **Job Templates:** Parameterized with surveys, credentials, and RBAC
- **Workflows:** Chain multiple job templates into visual pipelines
- **RBAC:** Teams, organizations, user roles — most granular of all options
- **Notifications:** Email, Slack, webhook on job success/failure
**Deployment:**
- Docker Compose exists but is officially a **development** target; production requires Kubernetes
- Requires Redis, PostgreSQL, memcached, and multiple AWX services
- Total RAM: **46 GB minimum**
**Verdict for Iron Legion:** Overkill. Our fleet nodes (G9: ~11 GB RAM) could run AWX, but it would consume half a node's capacity. G9 nodes are better used as PVE workers with LXCs. AWX belongs on a dedicated management VM or MK7 if hardware permits.
---
### 4.4 Rundeck
**What it is:** Open-source operations automation platform with an Ansible plugin.
**Docker Compose:** Simple single-container deployment with external database.
**Key Features:**
- **Job Definitions:** YAML or XML, supports Ansible ad-hoc and playbook execution
- **Node Inventory:** Static or dynamic via Ansible inventory scripts
- **ACL Policies:** File-based RBAC
- **Scheduled Executions:** Built-in scheduler
- **Plugin Architecture:** Ansible, Slack, HTTP webhooks
**Resource Footprint:**
- Rundeck: ~512 MB RAM
- MySQL/PostgreSQL: ~200300 MB
- Total: **~700800 MB**
**Verdict for Iron Legion:** Viable middle-ground. Better than Jenkins for Ansible, but Semaphore is purpose-built and lighter. Rundeck's strength is multi-tool orchestration (Ansible + scripts + HTTP APIs), which we don't need yet.
---
### 4.5 Jenkins + Ansible Plugin
**What it is:** General-purpose CI/CD platform with Ansible integration via plugins.
**Docker Compose:**
```yaml
services:
jenkins:
image: jenkins/jenkins:lts
ports:
- "8080:8080"
- "50000:50000"
volumes:
- jenkins-data:/var/jenkins_home
- /path/to/ansible/playbooks:/playbooks:ro
- /path/to/inventories:/inventories:ro
restart: unless-stopped
volumes:
jenkins-data:
driver: local
```
**Key Features:**
- **Pipelines:** Groovy-based Jenkinsfile pipelines for Ansible execution
- **Blue Ocean:** Modern UI for pipeline visualization
- **Plugin Ecosystem:** Massive library (Ansible, Slack, Git, Gitea)
- **Distributed Builds:** Agent nodes for parallel playbook runs
**Resource Footprint:**
- Jenkins: ~1 GB RAM (grows with plugin load)
- Optional agents: variable
- Total: **~12 GB**
**Verdict for Iron Legion:** Wrong tool for the job. Jenkins excels at CI/CD pipelines (build → test → deploy), not at day-to-day Ansible playbook management. The UI is pipeline-centric, not inventory- or template-centric. Use Jenkins for software CI/CD, not fleet automation.
---
## 5. Recommendation
| Use Case | Recommended Tool |
|----------|---------------|
| **Primary Ansible playbook runner** | **Semaphore UI** |
| Complex enterprise RBAC + workflows | AWX (on dedicated VM) |
| Generic workflow orchestration (not Ansible-specific) | Kestra |
| Multi-tool ops automation (Ansible + scripts + APIs) | Rundeck |
| Software CI/CD pipelines | Jenkins |
**Iron Legion Path Forward:**
1. **Deploy Semaphore UI** on MK7 Swarm or a lightweight LXC on MK33
2. Create a Project pointing to `Iron-Legion/ansible-playbooks` on Gitea
3. Configure inventories, task templates, and schedules
4. Add Gitea webhook to auto-trigger Semaphore tasks on push to `main`
5. **Optional:** Evaluate AWX later if RBAC/complexity demands grow — deploy on a dedicated management LXC with 4 GB RAM reservation
---
## 6. Open Questions
1. **Should Semaphore run as a standalone Docker Compose stack or as a Swarm service?**
- Standalone: simpler, survives Swarm reconfiguration
- Swarm: automatic placement, Traefik ingress, less manual maintenance
2. **Where does the Ansible inventory live?**
- Option A: In the Gitea repo alongside playbooks (version-controlled)
- Option B: Static files on the Semaphore host (faster Semaphore startup)
- Option C: Dynamic inventory script pulling from Technitium DNS/PVE API
3. **Gitea webhook reachability:**
- Gitea on Neo (`192.168.192.24`) → Semaphore on MK7 or G9 node
- Must ensure Semaphore endpoint is reachable from Neo (LAN routing)
- Can use Tailscale as fallback
---
*End of PRD — Iron Legion Labs*
---
## Appendix: Iron Legion Fleet Inventory (`inventory.yml`)
This inventory file is the authoritative source for Ansible targeting across the fleet. It is structured for **Semaphore UI**, **AWX**, or **command-line Ansible** consumption.
**File:** `inventories/iron-legion.yml`
```yaml
# Iron Legion Fleet Inventory
# Generated: 2026-06-03
# Source: fleet documentation + live SSH config
---
all:
children:
fleet_nodes:
children:
core_services:
hosts:
mk7:
ansible_host: 192.168.7.7
ansible_user: jarvis
node_role: swarm_manager
docker_host: true
nebuchadnezzar:
ansible_host: 192.168.192.24
ansible_user: jarvis
node_role: docker_host
docker_host: true
pve_workers:
hosts:
mk33:
ansible_host: 192.168.7.33
ansible_user: root
node_role: pve_worker
pve_api_url: "https://192.168.7.33:8006/"
mk34:
ansible_host: 192.168.7.34
ansible_user: root
node_role: pve_worker
pve_api_url: "https://192.168.7.34:8006/"
mk39:
ansible_host: 192.168.7.39
ansible_user: root
node_role: pve_worker
pve_api_url: "https://192.168.7.39:8006/"
physical_agents:
hosts:
artemis:
ansible_host: 192.168.15.182
ansible_user: jarvis
node_role: discord_gateway
hermes_agent: true
mark44:
ansible_host: 192.168.5.214
ansible_user: jarvis
node_role: gpu_host
gpu: true
mark5:
ansible_host: 192.168.6.5
ansible_user: jarvis
node_role: tbd
mk42:
ansible_host: 192.168.0.196
ansible_user: jarvis
node_role: pve_worker
infrastructure:
hosts:
shield:
ansible_host: 192.168.27.205
ansible_user: jarvis
node_role: pxe_server
igor:
ansible_host: 192.168.10.211
ansible_user: jarvis
node_role: nas
tailscale_fallback:
hosts:
ts-mk7:
ansible_host: 100.66.70.51
ansible_user: jarvis
ts-mk33:
ansible_host: 100.125.155.41
ansible_user: jarvis
ts-mk34:
ansible_host: 100.94.190.43
ansible_user: jarvis
docker_hosts:
children:
swarm_manager:
hosts:
mk7:
standalone_docker:
hosts:
nebuchadnezzar:
vars:
ansible_ssh_private_key_file: "~/.ssh/artemis_key"
ansible_python_interpreter: /usr/bin/python3
ansible_connection: ssh
ansible_ssh_common_args: ">-
-o StrictHostKeyChecking=accept-new
-o ConnectTimeout=5"
fleet_domain: ai.home
pve_workers:
vars:
ansible_ssh_private_key_file: "~/.ssh/vscode_ed25519"
ansible_become: true
ansible_become_user: root
core_services:
vars:
ansible_become: true
ansible_become_user: root
ansible_ssh_private_key_file: "~/.ssh/artemis_key"
physical_agents:
vars:
ansible_become: false
ansible_ssh_private_key_file: "~/.ssh/artemis_key"
```
**Usage:**
```bash
# Test reachability
ansible all -m ping -i inventories/iron-legion.yml
# Target PVE workers only
ansible pve_workers -m setup -i inventories/iron-legion.yml
# Check Docker services on swarm manager
ansible swarm_manager -a "docker service ls" -i inventories/iron-legion.yml
```