Files
documentation/PRD Drafts/n8n-terraform-ansible-orchestrator.md

173 lines
6.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# N8N Webhook Orchestrator — Terraform LXC + Ansible Provisioning
**Status:** Draft | **Author:** Artemis | **Date:** 2026-06-05
> **Purpose:** N8N on MK7 receives Telegram-triggered webhooks, SSHs to Artemis, and executes existing terraform/ansible containers. No new infrastructure — orchestrates what already exists.
---
## 1. Architecture
```
[Telegram: Bobby] → Artemis (parse intent) → POST to N8N (MK7)
↓ SSH (jarvis@192.168.15.182)
Artemis (this machine)
[A] ~/docker/terraform-pve/run.sh apply
LXC created + inventory generated
[B] ~/docker/ansible-push/lxc-common.sh
LXC provisioned (jarvis + git + ansible)
```
**N8N role:** Trigger + SSH executor only. No Docker socket, no state awareness, no config generation.
**Artemis role:** Hosts existing run.sh + lxc-common.sh. Owns terraform state, ansible inventory, SSH keys.
---
## 2. Workflow A: `/build` — Create and Provision LXCs
### 2.1 Telegram Input
```
You: "/build 5 lxcs"
Artemis parses → count=5, vmid_base=auto (next available)
You: "/build 5 lxcs at vmid 62128"
Artemis parses → vmid_base=62128 (explicit override), count=5
```
### 2.2 Webhook Payload (POST to N8N)
```json
{
"action": "lxc_build",
"vmid_base": 62128,
"lxc_count": 5,
"specs": "default"
}
```
### 2.3 N8N Execution Steps
| Step | Node | Command |
|------|------|---------|
| 1 | Webhook trigger | Receive JSON payload |
| 2 | Set SSH env vars | Export `TF_VAR_lxc_count=5 TF_VAR_vmid_base=62128` |
|| 3 | Execute SSH | `ssh jarvis@192.168.15.182 "cd ~/docker/terraform-pve && ./run.sh apply -auto-approve"` |
| 4 | Wait | Poll until `run.sh` exits (blocks until completion) |
| 5 | Verify inventory | Check `~/docker/ansible-push/terraform-prefill/inventory-lxc.yml` exists |
|| 6 | Execute SSH | `ssh jarvis@192.168.15.182 "cd ~/docker/ansible-push && ./lxc-common.sh"` |
| 7 | Notify | POST result back to Telegram/Discord |
### 2.4 Constraints
- **Specs locked to "default" for POC** (2 cores, 2GB RAM, 8GB disk)
- **Custom specs deferred to Phase 4** — requires terraform variable expansion
- **vmid_base range:** Must not overlap existing PVE VMs/LXCs (check before apply)
- **lxc_count max:** Phase 2 validated at N=7; N=4 had transient 500 race condition
---
## 3. Workflow B: `/fleet-update` — Apt Update + Upgrade
### 3.1 Telegram Input
```
You: "/fleet-update"
Artemis parses → action=fleet_update
```
### 3.2 Webhook Payload (POST to N8N)
```json
{
"action": "fleet_update"
}
```
### 3.3 N8N Execution Steps
| Step | Node | Command |
|------|------|---------|
| 1 | Webhook trigger | Receive JSON payload |
|| 2 | Execute SSH | `ssh jarvis@192.168.15.182 "cd ~/docker/ansible-push && docker compose up -d && docker exec ansible ansible-playbook playbooks/main.yml -i inventory.yml --tags fleet_update"` |
| 3 | Wait | Poll until ansible exits |
| 4 | Notify | POST result back to Telegram/Discord |
### 3.4 Target Scope
| Included | Excluded |
|----------|----------|
| `managed_nodes` group (from inventory.yml) | `pve_hosts` (MK33/34/39) — PVE self-manages |
| `physical_agents` | Neo (ZimaOS, not Debian) |
| `core_services` (MK7) | `igor` (ZimaOS NAS) |
| | Ephemeral LXCs — rebuilt from scratch |
---
## 4. N8N Requirements (MK7)
### 4.1 Container Mounts
- **SSH client:** `openssh-client` package installed in N8N image
- **Private key:** Mount `~/.ssh/artemis_key``/root/.ssh/id_ed25519` inside N8N container
- **Known hosts:** Pre-populated `~/.ssh/known_hosts` for `192.168.15.182`
### 4.2 N8N Endpoint
- **Webhook URL:** `https://n8n.ai.home` (Traefik-routed, TLS-terminated)
- **DNS:** CNAME `n8n.ai.home``traefik.ai.home` (Technitium DNS)
- **Network:** LAN-only (`192.168.x.x`), no external access
### 4.3 N8N Credentials
- **SSH Private Key:** Store `artemis_key` in N8N "Credentials" → SSH type
- **SSH Host:** `192.168.15.182` (LAN IP, no DNS resolution dependency)
- **SSH User:** `jarvis`
- **SSH Port:** `22`
### 4.3 Security Constraints
- N8N connects **to Artemis only** — never to PVE nodes, Neo, or LXCs directly
- N8N never sees PVE API tokens or sudo passwords
- All terraform/ansible state stays on Artemis filesystem (not in N8N container)
---
## 5. Artemis Prerequisites (Already Exists)
| Component | Path | Status |
|-----------|------|--------|
| Terraform container | `~/docker/terraform-pve/` | ✅ Validated Phase 2 |
| Ansible container | `~/docker/ansible-push/` | ✅ Validated |
| Run script | `./run.sh` | ✅ Forwards TF_VAR_*, supports init/plan/apply/destroy |
| LXC provision script | `./lxc-common.sh` | ✅ Runs lxc_common role |
| Inventory template | `terraform/inventory-lxc.tmpl` | ✅ Auto-generates ansible_host |
---
## 6. Error Handling
| Scenario | N8N Action |
|----------|------------|
| Terraform apply fails | Abort, notify with stderr |
| Inventory not generated after apply | Retry once, then fail |
| Ansible unreachable | Report per-host, continue others |
| SSH connection refused | Retry 3× with backoff, then alert |
---
## 7. Resolved Questions
| # | Question | Decision |
|---|----------|----------|
| 1 | Should `/build` auto-increment `vmid_base`? | **Yes** — default to auto-increment with optional explicit override |
| 2 | Should N8N trigger Gitea commit of generated inventory? | **No** — LXCs are ephemeral, inventory is temporary |
| 3 | Should `/fleet-update` include PVE nodes? | **No** — PVE self-managed, separate workflow later |
| 4 | N8N webhook via Tailscale or LAN? | **LAN IP only**`192.168.15.182`, no prod server access |
## 8. Decision Points
| Decision | Options | Recommended |
|----------|---------|-------------|
| N8N SSH key | `artemis_key` vs dedicated `n8n_key` | `artemis_key` for POC; rotate to dedicated key later |
| Notification target | Telegram vs Discord vs both | Both via existing gateway webhook |
| vmid_base tracking | Manual vs auto-increment | Auto-increment via PVE API query before apply |
| Fleet-update schedule | On-demand vs cron | On-demand only via `/fleet-update` |