Draft: Phase 3 PRD - Terraform LXC to Ansible provisioning pipeline

This commit is contained in:
F.R.I.D.A.Y.
2026-06-05 19:54:47 -04:00
parent 3f0e36c8bb
commit 0e42f6189e

View File

@@ -0,0 +1,178 @@
# Terraform LXC Deployment — Phase 3: Ansible-Integrated Pipeline
**Status:** Draft | **Author:** Artemis | **Date:** 2026-06-05
> **Goal:** Extend the validated Phase 2 batch pipeline into a complete **create-and-provision** workflow. Terraform generates LXCs + Ansible inventory; Ansible provisions git, python3-pip, and ansible on each LXC. Future Stage 4 adds N8N orchestration.
---
## 1. Pipeline Overview
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Trigger │────▶│ Terraform │────▶│ Inventory │────▶│ Ansible │
│ (manual / │ │ (Docker) │ │ (YAML) │ │ (Docker) │
│ N8N) │ │ Creates │ │ Generated │ │ Provisions │
└─────────────┘ │ LXCs on │ │ per apply │ │ LXC group │
│ PVE │ └─────────────┘ └─────────────┘
└─────────────┘
```
---
## 2. Stage 1: Terraform LXC Batch Factory (Complete)
**Status:** ✅ Validated at N=4 and N=7 on MK33
### 2.1 Dynamic Derivation
| Input | Example | Description |
|-------|---------|-------------|
| `vmid_base` | `5050` | Starting VMID |
| `lxc_count` | `4` | Number of LXCs |
| `subnet_prefix` | `192.168` | First two octets |
**Auto-derived per LXC (index `i`):**
- **VMID:** `vmid_base + i`
- **Hostname:** `lxc-${vmid}`
- **IPv4:** `${subnet_prefix}.${first2(vmid)}.${last2(vmid)}/18`
- **IPv4 host (Ansible):** bare IP (CIDR stripped)
### 2.2 Inventory Generation (NEW)
Two files written on every `terraform apply`:
- `inventory-lxc.yml` — latest, overwritten
- `inventory-lxc-<timestamp>.yml` — archive
Both written to `/ansible-push/terraform-prefill/` via Docker compose mount.
### 2.3 Generated Inventory Format
```yaml
all:
children:
lxcs:
hosts:
lxc-5050:
ansible_host: 192.168.50.50
ansible_user: root
ansible_password: ubuntu
ansible_port: 22
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
ansible_python_interpreter: auto_silent
```
---
## 3. Stage 2: Ansible Provisioning (Complete)
**Status:** ✅ Validated against 5 LXCs (vmid_base=338, lxc_count=5)
### 3.1 Playbook Structure
```
~/docker/ansible-push/playbooks/
├── main.yml # Entry point
├── roles/
│ ├── prepare/ # apt update/upgrade
│ ├── nfs_client/ # NFS mount (fleet nodes)
│ └── lxc_common/ # LXC bootstrap
│ └── tasks/main.yml
```
### 3.2 lxc_common Role (Updated 2026-06-05)
Tasks execute in order:
1. **Ensure apt cache updated** (`no_log: true`)
2. **Install git** (`no_log: true`)
3. **Install python3-pip** (`no_log: true`)
4. **Create jarvis user** (UID 1000, sudo group)
5. **Ensure jarvis .ssh directory**
6. **Copy root authorized_keys to jarvis**
7. **Passwordless sudo for jarvis**
8. **Install ansible via pip** (`no_log: true`, `break_system_packages: true`)
### 3.3 Output Noise Reduction
`ansible.cfg` at `~/docker/ansible-push/ansible.cfg`:
- `stdout_callback = dense` — grid layout instead of raw dpkg
- `deprecation_warnings = False` — silence `ansible_os_family` nag
### 3.4 Execution Pattern
```bash
# 1. Terraform creates LXCs + generates inventory
cd ~/docker/terraform-pve
TF_VAR_vmid_base=5050 TF_VAR_lxc_count=4 ./run.sh apply -auto-approve
# 2. Fix inventory ownership (terraform container writes as root)
sudo chown jarvis:jarvis ~/docker/ansible-push/terraform-prefill/inventory-lxc.yml
# 3. Ansible provisions
cd ~/docker/ansible-push
docker compose up -d
docker exec -it ansible ansible-playbook playbooks/main.yml \
-i terraform-prefill/inventory-lxc.yml \
--limit lxcs \
--tags lxc_common,prepare
```
---
## 4. Open Questions / Phase 4
| Item | Status | Notes |
|------|--------|-------|
| Adjustable CPU/RAM/HDD | ❌ Deferred | Currently fixed 1vCPU/2GB/8GB |
| Vaulted secrets | ❌ Deferred | `ansible_password` in plaintext inventory |
| N8N orchestration | ❌ Deferred | Webhook trigger from Gitea? |
| User switch post-bootstrap | ❌ Blocked | First run must be `root`; jarvis created during run |
---
## 5. Known Issues
### 5.1 PVE Parallel Start Race Condition
- Creating multiple LXCs in parallel can hit HTTP 500 "already running"
- Transient; re-run `apply` resolves it
- No terraform-level workaround needed
### 5.2 Root-Only First Run
- Fresh LXCs only have `root` user with SSH key
- `ansible_user: root` required for initial provisioning
- `jarvis` user is created during the playbook, not before
### 5.3 Inventory Ownership
- Terraform container runs as `root`, writes inventory as `root`
- `jarvis` cannot modify without `chown`
- Future fix: run terraform container as `jarvis` UID
### 5.4 Variable Precedence Trap
- `terraform.auto.tfvars` outranks `TF_VAR_*` env vars
- Dynamic vars (`lxc_count`, `vmid_base`) must NOT be in `.tfvars`
---
## 6. File Locations
| Component | Path |
|-----------|------|
| Terraform code | `~/docker/terraform-pve/terraform/` |
| Ansible code | `~/docker/ansible-push/playbooks/` |
| Generated inventory | `~/docker/ansible-push/terraform-prefill/inventory-lxc.yml` |
| PRD canonical | `~/documentation/PRDs/terraform-lxc-deployment-batch.md` |
| This draft | `~/documentation/PRD Drafts/terraform-lxc-deployment-phase3.md` |
---
## 7. Decision Log
| Decision | Chosen | Date |
|----------|--------|------|
| `ansible_user` | `root` for all runs | 2026-06-05 |
| `ansible_password` | `ubuntu` (matches fleet) | 2026-06-05 |
| SSH key discovery | Container mount `/root/.ssh/` auto-discovers `id_ed25519` | 2026-06-05 |
| `no_log` on apt | Enabled to suppress dpkg noise | 2026-06-05 |
| `dense` callback | Enabled in `ansible.cfg` | 2026-06-05 |
| Inventory output | Dual: `inventory-lxc.yml` + timestamped archive | 2026-06-05 |