Files
documentation/PRD Drafts/n8n-terraform-ansible-orchestrator.md

6.4 KiB
Raw Blame History

N8N Webhook Orchestrator — Terraform LXC + Ansible Provisioning

Status: Draft | Author: Artemis | Date: 2026-06-05

Purpose: N8N on MK7 receives Telegram-triggered webhooks, SSHs to Artemis, and executes existing terraform/ansible containers. No new infrastructure — orchestrates what already exists.


1. Architecture

[Telegram: Bobby] → Artemis (parse intent) → POST to N8N (MK7)
                                                      ↓ SSH (jarvis@192.168.15.182)
                                              Artemis (this machine)
                                                      ↓
                                    [A] ~/docker/terraform-pve/run.sh apply
                                                      ↓
                                          LXC created + inventory generated
                                                      ↓
                                    [B] ~/docker/ansible-push/lxc-common.sh
                                                      ↓
                                          LXC provisioned (jarvis + git + ansible)

N8N role: Trigger + SSH executor only. No Docker socket, no state awareness, no config generation.

Artemis role: Hosts existing run.sh + lxc-common.sh. Owns terraform state, ansible inventory, SSH keys.


2. Workflow A: /build — Create and Provision LXCs

2.1 Telegram Input

You: "/build 5 lxcs"
Artemis parses → count=5, vmid_base=auto (next available)

You: "/build 5 lxcs at vmid 62128"
Artemis parses → vmid_base=62128 (explicit override), count=5

2.2 Webhook Payload (POST to N8N)

{
  "action": "lxc_build",
  "vmid_base": 62128,
  "lxc_count": 5,
  "specs": "default"
}

2.3 N8N Execution Steps

Step Node Command
1 Webhook trigger Receive JSON payload
2 Set SSH env vars Export TF_VAR_lxc_count=5 TF_VAR_vmid_base=62128
3 Execute SSH
4 Wait Poll until run.sh exits (blocks until completion)
5 Verify inventory Check ~/docker/ansible-push/terraform-prefill/inventory-lxc.yml exists
6 Execute SSH
7 Notify POST result back to Telegram/Discord

2.4 Constraints

  • Specs locked to "default" for POC (2 cores, 2GB RAM, 8GB disk)
  • Custom specs deferred to Phase 4 — requires terraform variable expansion
  • vmid_base range: Must not overlap existing PVE VMs/LXCs (check before apply)
  • lxc_count max: Phase 2 validated at N=7; N=4 had transient 500 race condition

3. Workflow B: /fleet-update — Apt Update + Upgrade

3.1 Telegram Input

You: "/fleet-update"
Artemis parses → action=fleet_update

3.2 Webhook Payload (POST to N8N)

{
  "action": "fleet_update"
}

3.3 N8N Execution Steps

Step Node Command
1 Webhook trigger Receive JSON payload
2 Execute SSH
3 Wait Poll until ansible exits
4 Notify POST result back to Telegram/Discord

3.4 Target Scope

Included Excluded
managed_nodes group (from inventory.yml) pve_hosts (MK33/34/39) — PVE self-manages
physical_agents Neo (ZimaOS, not Debian)
core_services (MK7) igor (ZimaOS NAS)
Ephemeral LXCs — rebuilt from scratch

4. N8N Requirements (MK7)

4.1 Container Mounts

  • SSH client: openssh-client package installed in N8N image
  • Private key: Mount ~/.ssh/artemis_key/root/.ssh/id_ed25519 inside N8N container
  • Known hosts: Pre-populated ~/.ssh/known_hosts for 192.168.15.182

4.2 N8N Endpoint

  • Webhook URL: https://n8n.ai.home (Traefik-routed, TLS-terminated)
  • DNS: CNAME n8n.ai.hometraefik.ai.home (Technitium DNS)
  • Network: LAN-only (192.168.x.x), no external access

4.3 N8N Credentials

  • SSH Private Key: Store artemis_key in N8N "Credentials" → SSH type
  • SSH Host: 192.168.15.182 (LAN IP, no DNS resolution dependency)
  • SSH User: jarvis
  • SSH Port: 22

4.3 Security Constraints

  • N8N connects to Artemis only — never to PVE nodes, Neo, or LXCs directly
  • N8N never sees PVE API tokens or sudo passwords
  • All terraform/ansible state stays on Artemis filesystem (not in N8N container)

5. Artemis Prerequisites (Already Exists)

Component Path Status
Terraform container ~/docker/terraform-pve/ Validated Phase 2
Ansible container ~/docker/ansible-push/ Validated
Run script ./run.sh Forwards TF_VAR_*, supports init/plan/apply/destroy
LXC provision script ./lxc-common.sh Runs lxc_common role
Inventory template terraform/inventory-lxc.tmpl Auto-generates ansible_host

6. Error Handling

Scenario N8N Action
Terraform apply fails Abort, notify with stderr
Inventory not generated after apply Retry once, then fail
Ansible unreachable Report per-host, continue others
SSH connection refused Retry 3× with backoff, then alert

7. Resolved Questions

# Question Decision
1 Should /build auto-increment vmid_base? Yes — default to auto-increment with optional explicit override
2 Should N8N trigger Gitea commit of generated inventory? No — LXCs are ephemeral, inventory is temporary
3 Should /fleet-update include PVE nodes? No — PVE self-managed, separate workflow later
4 N8N webhook via Tailscale or LAN? LAN IP only192.168.15.182, no prod server access

8. Decision Points

Decision Options Recommended
N8N SSH key artemis_key vs dedicated n8n_key artemis_key for POC; rotate to dedicated key later
Notification target Telegram vs Discord vs both Both via existing gateway webhook
vmid_base tracking Manual vs auto-increment Auto-increment via PVE API query before apply
Fleet-update schedule On-demand vs cron On-demand only via /fleet-update