- terraform-lxc-deployment.md -> terraform-lxc-deployment-batch.md - Phase 2 validated at N=4 and N=7 on MK33 (pve-swarm) - All dynamic derivation rules tested and confirmed - Runtime behavior notes: auto.tfvars vs TF_VAR_*, -auto-approve, PVE race conditions
210 lines
7.8 KiB
Markdown
210 lines
7.8 KiB
Markdown
# Terraform LXC Deployment — Batch/Dynamic Template PRD
|
||
|
||
**Status:** Batch POC Validated | **Author:** Artemis | **Date:** 2026-06-05
|
||
|
||
> **Goal:** Dynamic LXC factory — one `terraform apply` creates N containers with auto-derived VMID, IPv4, hostname, and naming from a single base input.
|
||
|
||
## 1. Objective
|
||
|
||
Extend the Phase 1 single-LXC proven pipeline into a **parameterized batch generator**. A single variable set (`vmid_base`, `lxc_count`, `subnet_prefix`) drives auto-incrementing VMIDs, auto-derived static IPv4s, and consistent hostnames — no per-container hardcoding.
|
||
|
||
## 2. Dynamic Derivation Rules
|
||
|
||
### 2.1 Input Variables (User-Supplied)
|
||
|
||
| Variable | Example | Description |
|
||
|----------|---------|-------------|
|
||
| `vmid_base` | `5050` | Starting VMID for first LXC |
|
||
| `lxc_count` | `4` | Number of LXCs to create |
|
||
| `subnet_prefix` | `192.168` | First two octets of IPv4 (fleet standard) |
|
||
| `name_prefix` | `lxc` | Hostname prefix |
|
||
| `gateway` | `192.168.18.1` | Default gateway |
|
||
| `dns_servers` | `["192.168.7.7", "1.1.1.1"]` | DNS list |
|
||
|
||
### 2.2 Auto-Derived Per-LXC (Index `i` from `0` to `lxc_count-1`)
|
||
|
||
| Property | Formula | Example (`vmid_base=5050`, `i=2`) |
|
||
|----------|---------|----------------------------------|
|
||
| **VMID** | `vmid_base + i` | `5052` |
|
||
| **IPv4** | `subnet_prefix.${first2(vmid)}.${last2(vmid)}/18` | `192.168.50.52/18` |
|
||
| **Hostname** | `${name_prefix}-${vmid}` | `lxc-5052` |
|
||
| **Cores** | Fixed | `2` |
|
||
| **RAM** | Fixed | `2048` MB |
|
||
| **Disk** | Fixed | `8` GB |
|
||
|
||
**IP Derivation Detail:**
|
||
```
|
||
vmid = 5052
|
||
first2(vmid) = 50 (digits 3-4)
|
||
last2(vmid) = 52 (digits 5-6)
|
||
IPv4 = 192.168.50.52/18
|
||
```
|
||
|
||
This keeps VMID and IPv4 tightly coupled — **VMID is the single source of truth** for IP assignment. All IPs fall within the fleet `/18` subnet (`192.168.0.0/18`).
|
||
|
||
### 2.3 Example Runs
|
||
|
||
```bash
|
||
# Create 4 LXCs: lxc-5050 → lxc-5053
|
||
# IPs: 192.168.50.50 → 192.168.50.53
|
||
TF_VAR_vmid_base=5050 TF_VAR_lxc_count=4 ./run.sh apply -auto-approve
|
||
|
||
# Create 2 LXCs starting at 5100
|
||
# IPs: 192.168.51.00, 192.168.51.01
|
||
TF_VAR_vmid_base=5100 TF_VAR_lxc_count=2 ./run.sh apply -auto-approve
|
||
|
||
# Create 7 LXCs at vmid_base=931 (validated POC run)
|
||
TF_VAR_vmid_base=931 TF_VAR_lxc_count=7 ./run.sh apply -auto-approve
|
||
```
|
||
|
||
## 2. Architecture
|
||
|
||
### 2.1 Docker Image
|
||
|
||
**Base:** `hashicorp/terraform:latest` with `bpg/proxmox` provider downloaded at container init
|
||
**Provider:** `bpg/proxmox` v0.70.0
|
||
**Pattern:** Lazy automator — local workspace mounted into container, credentials via `terraform.auto.tfvars`
|
||
|
||
```dockerfile
|
||
FROM hashicorp/terraform:latest
|
||
WORKDIR /workspace
|
||
COPY run.sh /usr/local/bin/run
|
||
RUN chmod +x /usr/local/bin/run
|
||
ENTRYPOINT ["bash"]
|
||
```
|
||
|
||
### 2.2 Credential Model
|
||
|
||
Native Terraform variable loading via `terraform.auto.tfvars` (no Docker env-file mapping):
|
||
|
||
```hcl
|
||
# terraform/terraform.auto.tfvars
|
||
pm_api_url = "https://192.168.7.33:8006/api2/json"
|
||
pm_api_token_id = "root@pam!terraform"
|
||
pm_api_token_secret = "<secret>"
|
||
```
|
||
|
||
PVE API token created on MK33: `root@pam!terraform`. Token stored in fleet credential store.
|
||
|
||
### 2.3 Runtime Parameterization (Phase 2)
|
||
|
||
| Parameter | Example | Effect |
|
||
|-----------|---------|--------|
|
||
| `count` | `4` | Number of LXCs to create |
|
||
| `vmid_base` | `5050` | Starting VMID |
|
||
|
||
Auto-derived per LXC (index `i` from 0 to `count-1`):
|
||
- **VMID:** `vmid_base + i`
|
||
- **Name:** `lxc-${vmid}`
|
||
- **IPv4:** `192.168.${first2digits(vmid)}.${last2digits(vmid)}/18`
|
||
|
||
### 2.4 LXC Configuration (Validated)
|
||
|
||
- **OS:** Debian 12 (`debian-12-standard_12.2-1_amd64.tar.zst`)
|
||
- **CPU:** 1 vCPU
|
||
- **RAM:** 2048 MB
|
||
- **Storage:** 8GB rootfs on `local` directory (test phase)
|
||
- **Network:** Static IPv4, gateway `192.168.18.1`, subnet `/18`
|
||
- **DNS:** `192.168.7.7`, `192.168.18.1`, `1.1.1.1`
|
||
- **Privilege:** Unprivileged (`unprivileged = true`)
|
||
- **Features:** Nesting enabled (`features { nesting = true }`)
|
||
|
||
### 2.5 User / SSH (Tested)
|
||
|
||
```hcl
|
||
initialization {
|
||
user_account {
|
||
username = "jarvis"
|
||
password = "<fleet_linux_pass>" # Required for console login verification
|
||
keys = [file("artemis_key.pub")]
|
||
}
|
||
}
|
||
```
|
||
|
||
## 3. Phase Breakdown
|
||
|
||
### Phase 1 — Single LXC (Plan/Build/Destroy) ✅ COMPLETE
|
||
|
||
**Completed:** 2026-06-04 on MK33 (pve-swarm, cluster node 33)
|
||
|
||
**Results:**
|
||
- `Dockerfile` — simplified to official `hashicorp/terraform:latest` image
|
||
- `docker-compose.yml` — workspace mount, no env-file credential mapping
|
||
- `run.sh` — wrapper for `terraform plan/apply/destroy`
|
||
- `terraform/providers.tf` — `bpg/proxmox` v0.70.0
|
||
- `terraform/main.tf` — single LXC resource (VMID 5050)
|
||
- `terraform/terraform.auto.tfvars` — native Terraform credential loading
|
||
|
||
**Validated:**
|
||
```bash
|
||
./run.sh plan # ✅ Validated
|
||
./run.sh apply # ✅ Created lxc-5050 (debian-12, 192.168.50.50/18)
|
||
./run.sh destroy # ✅ Clean teardown
|
||
```
|
||
|
||
**Key fixes discovered during testing:**
|
||
- Storage pool: `local-lvm` missing → used `local` (Directory)
|
||
- Template path: `nas-ct-stor:vztmpl/` (NFS shared templates)
|
||
- Unprivileged required: `unprivileged = true` + `features { nesting = true }`
|
||
- Password injection: `user_account.password` required for console login verification
|
||
|
||
### Phase 2 — Modular + Bulk Creation ✅ VALIDATED
|
||
|
||
**Completed:** 2026-06-05 on MK33 (pve-swarm)
|
||
|
||
**Results:**
|
||
- `modules/lxc/` — reusable LXC module with `proxmox_virtual_environment_container` resource
|
||
- `main.tf` — `for_each` over module with `lxc_count` parameterization
|
||
- `run.sh` — forwards `TF_VAR_*` environment variables into Docker container
|
||
|
||
**Validated at multiple scales:**
|
||
|
||
| Test | Command | Result |
|
||
|------|---------|--------|
|
||
| 4 LXCs at vmid_base=3550 | `TF_VAR_lxc_count=4 TF_VAR_vmid_base=3550 ./run.sh apply` | ✅ All created; 1 transient 500 error on start (PVE task queue race), container existed and operational despite error |
|
||
| 7 LXCs at vmid_base=931 | `TF_VAR_lxc_count=7 TF_VAR_vmid_base=931 ./run.sh apply` | ✅ All 7 created successfully, no errors, ~14–16s per container |
|
||
| 7 LXCs destroy | `./run.sh destroy -auto-approve` | ✅ All 7 destroyed cleanly in ~8s each |
|
||
|
||
**Key runtime behavior discovered:**
|
||
- `terraform.auto.tfvars` outranks `TF_VAR_*` environment variables — dynamic variables must **not** be set in `.tfvars`
|
||
- `-auto-approve` required on Dockerized terraform (no interactive TTY for confirmation)
|
||
- Parallel creation (default) works at N=7; transient race condition observed at N=4 (PVE task queue, not terraform logic)
|
||
- All containers receive SSH key + password via `initialization.user_account` block
|
||
|
||
## 4. File Structure
|
||
|
||
```
|
||
~/docker/terraform-pve/
|
||
├── Dockerfile
|
||
├── docker-compose.yml
|
||
├── run.sh
|
||
├── terraform/
|
||
│ ├── .terraform/
|
||
│ ├── main.tf
|
||
│ ├── providers.tf
|
||
│ ├── terraform.auto.tfvars # Credentials (not committed)
|
||
│ ├── terraform.tfstate
|
||
│ ├── variables.tf
|
||
│ └── artemis_key.pub
|
||
```
|
||
|
||
## 5. Resolved Decisions
|
||
|
||
| Decision | Chosen | Notes |
|
||
|----------|--------|-------|
|
||
| Debian template | **12** | `debian-12-standard_12.2-1_amd64.tar.zst` on `nas-ct-stor` |
|
||
| Gateway | **192.168.18.1** | Router IP for 192.168.0.0/18 subnet |
|
||
| DNS | **192.168.7.7, 192.168.18.1, 1.1.1.1** | Technitium primary + fallback |
|
||
| SSH key | **artemis_key.pub** | Already registered fleet-wide |
|
||
| Storage (Phase 1) | **local** | `local-lvm` missing on nodes; migrate to `truenas-nfs` in Phase 2 |
|
||
| Privilege | **Unprivileged** | `unprivileged = true` with `nesting = true` for systemd 252 |
|
||
| Credential loading | **terraform.auto.tfvars** | Native Terraform pattern; no Docker env-file complexity |
|
||
|
||
## 6. Fleet Notes
|
||
|
||
- PVE API token: `root@pam!terraform` (Secret: fleet credential store)
|
||
- PVE root password: `proxmox12` (fleet credential store)
|
||
- Cluster: `pve-swarm` (MK33, MK34, MK39)
|
||
- Template storage: `nas-ct-stor` (NFS from TrueNAS)
|
||
- Disk storage (test): `local`
|
||
- **Code location:** `~/docker/terraform-pve/` — local only, not in any Gitea repo |