diff --git a/PRD Drafts/terraform-lxc-deployment-phase3.md b/PRD Drafts/terraform-lxc-deployment-phase3.md new file mode 100644 index 0000000..68fad1d --- /dev/null +++ b/PRD Drafts/terraform-lxc-deployment-phase3.md @@ -0,0 +1,178 @@ +# Terraform LXC Deployment — Phase 3: Ansible-Integrated Pipeline + +**Status:** Draft | **Author:** Artemis | **Date:** 2026-06-05 + +> **Goal:** Extend the validated Phase 2 batch pipeline into a complete **create-and-provision** workflow. Terraform generates LXCs + Ansible inventory; Ansible provisions git, python3-pip, and ansible on each LXC. Future Stage 4 adds N8N orchestration. + +--- + +## 1. Pipeline Overview + +``` +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Trigger │────▶│ Terraform │────▶│ Inventory │────▶│ Ansible │ +│ (manual / │ │ (Docker) │ │ (YAML) │ │ (Docker) │ +│ N8N) │ │ Creates │ │ Generated │ │ Provisions │ +└─────────────┘ │ LXCs on │ │ per apply │ │ LXC group │ + │ PVE │ └─────────────┘ └─────────────┘ + └─────────────┘ +``` + +--- + +## 2. Stage 1: Terraform LXC Batch Factory (Complete) + +**Status:** ✅ Validated at N=4 and N=7 on MK33 + +### 2.1 Dynamic Derivation + +| Input | Example | Description | +|-------|---------|-------------| +| `vmid_base` | `5050` | Starting VMID | +| `lxc_count` | `4` | Number of LXCs | +| `subnet_prefix` | `192.168` | First two octets | + +**Auto-derived per LXC (index `i`):** +- **VMID:** `vmid_base + i` +- **Hostname:** `lxc-${vmid}` +- **IPv4:** `${subnet_prefix}.${first2(vmid)}.${last2(vmid)}/18` +- **IPv4 host (Ansible):** bare IP (CIDR stripped) + +### 2.2 Inventory Generation (NEW) + +Two files written on every `terraform apply`: +- `inventory-lxc.yml` — latest, overwritten +- `inventory-lxc-.yml` — archive + +Both written to `/ansible-push/terraform-prefill/` via Docker compose mount. + +### 2.3 Generated Inventory Format + +```yaml +all: + children: + lxcs: + hosts: + lxc-5050: + ansible_host: 192.168.50.50 + ansible_user: root + ansible_password: ubuntu + ansible_port: 22 + ansible_ssh_common_args: '-o StrictHostKeyChecking=no' + ansible_python_interpreter: auto_silent +``` + +--- + +## 3. Stage 2: Ansible Provisioning (Complete) + +**Status:** ✅ Validated against 5 LXCs (vmid_base=338, lxc_count=5) + +### 3.1 Playbook Structure + +``` +~/docker/ansible-push/playbooks/ +├── main.yml # Entry point +├── roles/ +│ ├── prepare/ # apt update/upgrade +│ ├── nfs_client/ # NFS mount (fleet nodes) +│ └── lxc_common/ # LXC bootstrap +│ └── tasks/main.yml +``` + +### 3.2 lxc_common Role (Updated 2026-06-05) + +Tasks execute in order: + +1. **Ensure apt cache updated** (`no_log: true`) +2. **Install git** (`no_log: true`) +3. **Install python3-pip** (`no_log: true`) +4. **Create jarvis user** (UID 1000, sudo group) +5. **Ensure jarvis .ssh directory** +6. **Copy root authorized_keys to jarvis** +7. **Passwordless sudo for jarvis** +8. **Install ansible via pip** (`no_log: true`, `break_system_packages: true`) + +### 3.3 Output Noise Reduction + +`ansible.cfg` at `~/docker/ansible-push/ansible.cfg`: +- `stdout_callback = dense` — grid layout instead of raw dpkg +- `deprecation_warnings = False` — silence `ansible_os_family` nag + +### 3.4 Execution Pattern + +```bash +# 1. Terraform creates LXCs + generates inventory +cd ~/docker/terraform-pve +TF_VAR_vmid_base=5050 TF_VAR_lxc_count=4 ./run.sh apply -auto-approve + +# 2. Fix inventory ownership (terraform container writes as root) +sudo chown jarvis:jarvis ~/docker/ansible-push/terraform-prefill/inventory-lxc.yml + +# 3. Ansible provisions +cd ~/docker/ansible-push +docker compose up -d +docker exec -it ansible ansible-playbook playbooks/main.yml \ + -i terraform-prefill/inventory-lxc.yml \ + --limit lxcs \ + --tags lxc_common,prepare +``` + +--- + +## 4. Open Questions / Phase 4 + +| Item | Status | Notes | +|------|--------|-------| +| Adjustable CPU/RAM/HDD | ❌ Deferred | Currently fixed 1vCPU/2GB/8GB | +| Vaulted secrets | ❌ Deferred | `ansible_password` in plaintext inventory | +| N8N orchestration | ❌ Deferred | Webhook trigger from Gitea? | +| User switch post-bootstrap | ❌ Blocked | First run must be `root`; jarvis created during run | + +--- + +## 5. Known Issues + +### 5.1 PVE Parallel Start Race Condition +- Creating multiple LXCs in parallel can hit HTTP 500 "already running" +- Transient; re-run `apply` resolves it +- No terraform-level workaround needed + +### 5.2 Root-Only First Run +- Fresh LXCs only have `root` user with SSH key +- `ansible_user: root` required for initial provisioning +- `jarvis` user is created during the playbook, not before + +### 5.3 Inventory Ownership +- Terraform container runs as `root`, writes inventory as `root` +- `jarvis` cannot modify without `chown` +- Future fix: run terraform container as `jarvis` UID + +### 5.4 Variable Precedence Trap +- `terraform.auto.tfvars` outranks `TF_VAR_*` env vars +- Dynamic vars (`lxc_count`, `vmid_base`) must NOT be in `.tfvars` + +--- + +## 6. File Locations + +| Component | Path | +|-----------|------| +| Terraform code | `~/docker/terraform-pve/terraform/` | +| Ansible code | `~/docker/ansible-push/playbooks/` | +| Generated inventory | `~/docker/ansible-push/terraform-prefill/inventory-lxc.yml` | +| PRD canonical | `~/documentation/PRDs/terraform-lxc-deployment-batch.md` | +| This draft | `~/documentation/PRD Drafts/terraform-lxc-deployment-phase3.md` | + +--- + +## 7. Decision Log + +| Decision | Chosen | Date | +|----------|--------|------| +| `ansible_user` | `root` for all runs | 2026-06-05 | +| `ansible_password` | `ubuntu` (matches fleet) | 2026-06-05 | +| SSH key discovery | Container mount `/root/.ssh/` auto-discovers `id_ed25519` | 2026-06-05 | +| `no_log` on apt | Enabled to suppress dpkg noise | 2026-06-05 | +| `dense` callback | Enabled in `ansible.cfg` | 2026-06-05 | +| Inventory output | Dual: `inventory-lxc.yml` + timestamped archive | 2026-06-05 |