Terraform LXC: promote batch PRD to canonical, Phase 2 validated

- terraform-lxc-deployment.md -> terraform-lxc-deployment-batch.md
- Phase 2 validated at N=4 and N=7 on MK33 (pve-swarm)
- All dynamic derivation rules tested and confirmed
- Runtime behavior notes: auto.tfvars vs TF_VAR_*, -auto-approve, PVE race conditions
This commit is contained in:
F.R.I.D.A.Y.
2026-06-05 08:38:02 -04:00
parent 520da27cd3
commit ff60037860

View File

@@ -0,0 +1,210 @@
# Terraform LXC Deployment — Batch/Dynamic Template PRD
**Status:** Batch POC Validated | **Author:** Artemis | **Date:** 2026-06-05
> **Goal:** Dynamic LXC factory — one `terraform apply` creates N containers with auto-derived VMID, IPv4, hostname, and naming from a single base input.
## 1. Objective
Extend the Phase 1 single-LXC proven pipeline into a **parameterized batch generator**. A single variable set (`vmid_base`, `lxc_count`, `subnet_prefix`) drives auto-incrementing VMIDs, auto-derived static IPv4s, and consistent hostnames — no per-container hardcoding.
## 2. Dynamic Derivation Rules
### 2.1 Input Variables (User-Supplied)
| Variable | Example | Description |
|----------|---------|-------------|
| `vmid_base` | `5050` | Starting VMID for first LXC |
| `lxc_count` | `4` | Number of LXCs to create |
| `subnet_prefix` | `192.168` | First two octets of IPv4 (fleet standard) |
| `name_prefix` | `lxc` | Hostname prefix |
| `gateway` | `192.168.18.1` | Default gateway |
| `dns_servers` | `["192.168.7.7", "1.1.1.1"]` | DNS list |
### 2.2 Auto-Derived Per-LXC (Index `i` from `0` to `lxc_count-1`)
| Property | Formula | Example (`vmid_base=5050`, `i=2`) |
|----------|---------|----------------------------------|
| **VMID** | `vmid_base + i` | `5052` |
| **IPv4** | `subnet_prefix.${first2(vmid)}.${last2(vmid)}/18` | `192.168.50.52/18` |
| **Hostname** | `${name_prefix}-${vmid}` | `lxc-5052` |
| **Cores** | Fixed | `2` |
| **RAM** | Fixed | `2048` MB |
| **Disk** | Fixed | `8` GB |
**IP Derivation Detail:**
```
vmid = 5052
first2(vmid) = 50 (digits 3-4)
last2(vmid) = 52 (digits 5-6)
IPv4 = 192.168.50.52/18
```
This keeps VMID and IPv4 tightly coupled — **VMID is the single source of truth** for IP assignment. All IPs fall within the fleet `/18` subnet (`192.168.0.0/18`).
### 2.3 Example Runs
```bash
# Create 4 LXCs: lxc-5050 → lxc-5053
# IPs: 192.168.50.50 → 192.168.50.53
TF_VAR_vmid_base=5050 TF_VAR_lxc_count=4 ./run.sh apply -auto-approve
# Create 2 LXCs starting at 5100
# IPs: 192.168.51.00, 192.168.51.01
TF_VAR_vmid_base=5100 TF_VAR_lxc_count=2 ./run.sh apply -auto-approve
# Create 7 LXCs at vmid_base=931 (validated POC run)
TF_VAR_vmid_base=931 TF_VAR_lxc_count=7 ./run.sh apply -auto-approve
```
## 2. Architecture
### 2.1 Docker Image
**Base:** `hashicorp/terraform:latest` with `bpg/proxmox` provider downloaded at container init
**Provider:** `bpg/proxmox` v0.70.0
**Pattern:** Lazy automator — local workspace mounted into container, credentials via `terraform.auto.tfvars`
```dockerfile
FROM hashicorp/terraform:latest
WORKDIR /workspace
COPY run.sh /usr/local/bin/run
RUN chmod +x /usr/local/bin/run
ENTRYPOINT ["bash"]
```
### 2.2 Credential Model
Native Terraform variable loading via `terraform.auto.tfvars` (no Docker env-file mapping):
```hcl
# terraform/terraform.auto.tfvars
pm_api_url = "https://192.168.7.33:8006/api2/json"
pm_api_token_id = "root@pam!terraform"
pm_api_token_secret = "<secret>"
```
PVE API token created on MK33: `root@pam!terraform`. Token stored in fleet credential store.
### 2.3 Runtime Parameterization (Phase 2)
| Parameter | Example | Effect |
|-----------|---------|--------|
| `count` | `4` | Number of LXCs to create |
| `vmid_base` | `5050` | Starting VMID |
Auto-derived per LXC (index `i` from 0 to `count-1`):
- **VMID:** `vmid_base + i`
- **Name:** `lxc-${vmid}`
- **IPv4:** `192.168.${first2digits(vmid)}.${last2digits(vmid)}/18`
### 2.4 LXC Configuration (Validated)
- **OS:** Debian 12 (`debian-12-standard_12.2-1_amd64.tar.zst`)
- **CPU:** 1 vCPU
- **RAM:** 2048 MB
- **Storage:** 8GB rootfs on `local` directory (test phase)
- **Network:** Static IPv4, gateway `192.168.18.1`, subnet `/18`
- **DNS:** `192.168.7.7`, `192.168.18.1`, `1.1.1.1`
- **Privilege:** Unprivileged (`unprivileged = true`)
- **Features:** Nesting enabled (`features { nesting = true }`)
### 2.5 User / SSH (Tested)
```hcl
initialization {
user_account {
username = "jarvis"
password = "<fleet_linux_pass>" # Required for console login verification
keys = [file("artemis_key.pub")]
}
}
```
## 3. Phase Breakdown
### Phase 1 — Single LXC (Plan/Build/Destroy) ✅ COMPLETE
**Completed:** 2026-06-04 on MK33 (pve-swarm, cluster node 33)
**Results:**
- `Dockerfile` — simplified to official `hashicorp/terraform:latest` image
- `docker-compose.yml` — workspace mount, no env-file credential mapping
- `run.sh` — wrapper for `terraform plan/apply/destroy`
- `terraform/providers.tf``bpg/proxmox` v0.70.0
- `terraform/main.tf` — single LXC resource (VMID 5050)
- `terraform/terraform.auto.tfvars` — native Terraform credential loading
**Validated:**
```bash
./run.sh plan # ✅ Validated
./run.sh apply # ✅ Created lxc-5050 (debian-12, 192.168.50.50/18)
./run.sh destroy # ✅ Clean teardown
```
**Key fixes discovered during testing:**
- Storage pool: `local-lvm` missing → used `local` (Directory)
- Template path: `nas-ct-stor:vztmpl/` (NFS shared templates)
- Unprivileged required: `unprivileged = true` + `features { nesting = true }`
- Password injection: `user_account.password` required for console login verification
### Phase 2 — Modular + Bulk Creation ✅ VALIDATED
**Completed:** 2026-06-05 on MK33 (pve-swarm)
**Results:**
- `modules/lxc/` — reusable LXC module with `proxmox_virtual_environment_container` resource
- `main.tf``for_each` over module with `lxc_count` parameterization
- `run.sh` — forwards `TF_VAR_*` environment variables into Docker container
**Validated at multiple scales:**
| Test | Command | Result |
|------|---------|--------|
| 4 LXCs at vmid_base=3550 | `TF_VAR_lxc_count=4 TF_VAR_vmid_base=3550 ./run.sh apply` | ✅ All created; 1 transient 500 error on start (PVE task queue race), container existed and operational despite error |
| 7 LXCs at vmid_base=931 | `TF_VAR_lxc_count=7 TF_VAR_vmid_base=931 ./run.sh apply` | ✅ All 7 created successfully, no errors, ~1416s per container |
| 7 LXCs destroy | `./run.sh destroy -auto-approve` | ✅ All 7 destroyed cleanly in ~8s each |
**Key runtime behavior discovered:**
- `terraform.auto.tfvars` outranks `TF_VAR_*` environment variables — dynamic variables must **not** be set in `.tfvars`
- `-auto-approve` required on Dockerized terraform (no interactive TTY for confirmation)
- Parallel creation (default) works at N=7; transient race condition observed at N=4 (PVE task queue, not terraform logic)
- All containers receive SSH key + password via `initialization.user_account` block
## 4. File Structure
```
~/docker/terraform-pve/
├── Dockerfile
├── docker-compose.yml
├── run.sh
├── terraform/
│ ├── .terraform/
│ ├── main.tf
│ ├── providers.tf
│ ├── terraform.auto.tfvars # Credentials (not committed)
│ ├── terraform.tfstate
│ ├── variables.tf
│ └── artemis_key.pub
```
## 5. Resolved Decisions
| Decision | Chosen | Notes |
|----------|--------|-------|
| Debian template | **12** | `debian-12-standard_12.2-1_amd64.tar.zst` on `nas-ct-stor` |
| Gateway | **192.168.18.1** | Router IP for 192.168.0.0/18 subnet |
| DNS | **192.168.7.7, 192.168.18.1, 1.1.1.1** | Technitium primary + fallback |
| SSH key | **artemis_key.pub** | Already registered fleet-wide |
| Storage (Phase 1) | **local** | `local-lvm` missing on nodes; migrate to `truenas-nfs` in Phase 2 |
| Privilege | **Unprivileged** | `unprivileged = true` with `nesting = true` for systemd 252 |
| Credential loading | **terraform.auto.tfvars** | Native Terraform pattern; no Docker env-file complexity |
## 6. Fleet Notes
- PVE API token: `root@pam!terraform` (Secret: fleet credential store)
- PVE root password: `proxmox12` (fleet credential store)
- Cluster: `pve-swarm` (MK33, MK34, MK39)
- Template storage: `nas-ct-stor` (NFS from TrueNAS)
- Disk storage (test): `local`
- **Code location:** `~/docker/terraform-pve/` — local only, not in any Gitea repo