Draft: N8N webhook orchestrator for terraform LXC + ansible provisioning

This commit is contained in:
F.R.I.D.A.Y.
2026-06-05 21:33:47 -04:00
parent bfff090225
commit df965892d5

View File

@@ -0,0 +1,164 @@
# N8N Webhook Orchestrator — Terraform LXC + Ansible Provisioning
**Status:** Draft | **Author:** Artemis | **Date:** 2026-06-05
> **Purpose:** N8N on MK7 receives Telegram-triggered webhooks, SSHs to Artemis, and executes existing terraform/ansible containers. No new infrastructure — orchestrates what already exists.
---
## 1. Architecture
```
[Telegram: Bobby] → Artemis (parse intent) → POST to N8N (MK7)
↓ SSH (jarvis@artemis.ai.home)
Artemis (this machine)
[A] ~/docker/terraform-pve/run.sh apply
LXC created + inventory generated
[B] ~/docker/ansible-push/lxc-common.sh
LXC provisioned (jarvis + git + ansible)
```
**N8N role:** Trigger + SSH executor only. No Docker socket, no state awareness, no config generation.
**Artemis role:** Hosts existing run.sh + lxc-common.sh. Owns terraform state, ansible inventory, SSH keys.
---
## 2. Workflow A: `/build` — Create and Provision LXCs
### 2.1 Telegram Input
```
You: "/build 5 lxcs starting at vmid 62128"
Artemis parses → vmid_base=62128, count=5, specs=default
```
### 2.2 Webhook Payload (POST to N8N)
```json
{
"action": "lxc_build",
"vmid_base": 62128,
"lxc_count": 5,
"specs": "default"
}
```
### 2.3 N8N Execution Steps
| Step | Node | Command |
|------|------|---------|
| 1 | Webhook trigger | Receive JSON payload |
| 2 | Set SSH env vars | Export `TF_VAR_lxc_count=5 TF_VAR_vmid_base=62128` |
| 3 | Execute SSH | `ssh jarvis@artemis.ai.home "cd ~/docker/terraform-pve && ./run.sh apply -auto-approve"` |
| 4 | Wait | Poll until `run.sh` exits (blocks until completion) |
| 5 | Verify inventory | Check `~/docker/ansible-push/terraform-prefill/inventory-lxc.yml` exists |
| 6 | Execute SSH | `ssh jarvis@artemis.ai.home "cd ~/docker/ansible-push && ./lxc-common.sh"` |
| 7 | Notify | POST result back to Telegram/Discord |
### 2.4 Constraints
- **Specs locked to "default" for POC** (2 cores, 2GB RAM, 8GB disk)
- **Custom specs deferred to Phase 4** — requires terraform variable expansion
- **vmid_base range:** Must not overlap existing PVE VMs/LXCs (check before apply)
- **lxc_count max:** Phase 2 validated at N=7; N=4 had transient 500 race condition
---
## 3. Workflow B: `/fleet-update` — Apt Update + Upgrade
### 3.1 Telegram Input
```
You: "/fleet-update"
Artemis parses → action=fleet_update
```
### 3.2 Webhook Payload (POST to N8N)
```json
{
"action": "fleet_update"
}
```
### 3.3 N8N Execution Steps
| Step | Node | Command |
|------|------|---------|
| 1 | Webhook trigger | Receive JSON payload |
| 2 | Execute SSH | `ssh jarvis@artemis.ai.home "cd ~/docker/ansible-push && docker compose up -d && docker exec ansible ansible-playbook playbooks/main.yml -i inventory.yml --tags fleet_update"` |
| 3 | Wait | Poll until ansible exits |
| 4 | Notify | POST result back to Telegram/Discord |
### 3.4 Target Scope
| Included | Excluded |
|----------|----------|
| `managed_nodes` group (from inventory.yml) | `pve_hosts` (MK33/34/39) — PVE self-manages |
| `physical_agents` | Neo (ZimaOS, not Debian) |
| `core_services` (MK7) | `igor` (ZimaOS NAS) |
| | Ephemeral LXCs — rebuilt from scratch |
---
## 4. N8N Requirements (MK7)
### 4.1 Container Mounts
- **SSH client:** `openssh-client` package installed in N8N image
- **Private key:** Mount `~/.ssh/artemis_key``/root/.ssh/id_ed25519` inside N8N container
- **Known hosts:** Pre-populated `~/.ssh/known_hosts` for `artemis.ai.home`
### 4.2 N8N Credentials
- **SSH Private Key:** Store `artemis_key` in N8N "Credentials" → SSH type
- **SSH Host:** `artemis.ai.home` (or LAN IP `192.168.15.182`)
- **SSH User:** `jarvis`
- **SSH Port:** `22`
### 4.3 Security Constraints
- N8N connects **to Artemis only** — never to PVE nodes, Neo, or LXCs directly
- N8N never sees PVE API tokens or sudo passwords
- All terraform/ansible state stays on Artemis filesystem (not in N8N container)
---
## 5. Artemis Prerequisites (Already Exists)
| Component | Path | Status |
|-----------|------|--------|
| Terraform container | `~/docker/terraform-pve/` | ✅ Validated Phase 2 |
| Ansible container | `~/docker/ansible-push/` | ✅ Validated |
| Run script | `./run.sh` | ✅ Forwards TF_VAR_*, supports init/plan/apply/destroy |
| LXC provision script | `./lxc-common.sh` | ✅ Runs lxc_common role |
| Inventory template | `terraform/inventory-lxc.tmpl` | ✅ Auto-generates ansible_host |
---
## 6. Error Handling
| Scenario | N8N Action |
|----------|------------|
| Terraform apply fails | Abort, notify with stderr |
| Inventory not generated after apply | Retry once, then fail |
| Ansible unreachable | Report per-host, continue others |
| SSH connection refused | Retry 3× with backoff, then alert |
---
## 7. Open Questions
1. **Should `/build` auto-increment `vmid_base` from last used, or always require explicit input?**
2. **Should N8N trigger a Gitea commit of generated inventory after apply?**
3. **Should `/fleet-update` include PVE nodes via `apt` (not `dist-upgrade`) differently?**
4. **N8N webhook URL exposed via Tailscale or local LAN only?**
---
## 8. Decision Points
| Decision | Options | Recommended |
|----------|---------|-------------|
| N8N SSH key | `artemis_key` vs dedicated `n8n_key` | `artemis_key` for POC; rotate to dedicated key later |
| Notification target | Telegram vs Discord vs both | Both via existing gateway webhook |
| vmid_base tracking | Manual vs auto-increment vs check-before-apply | Manual for POC; auto-track in Phase 4 |
| Fleet-update schedule | On-demand only vs weekly cron | On-demand only via `/fleet-update` |