Files
documentation/PRDs/terraform-lxc-deployment-batch.md
F.R.I.D.A.Y. 3f0e36c8bb Promote all operational PRDs to Deployed status
- terraform-lxc-deployment.md: Deployed (Phase 1 single-LXC baseline)
- terraform-lxc-deployment-batch.md: Deployed (Phase 2 batch/dynamic template, validated N=4/N=7)
- ansible-base-testing.md: Deployed (base testing environment, validated fleet ping/playbook)
- ansible-playbook.md: Deployed (NFS client role, validated MK7 + Swarm workers)

All four PRDs now in PRDs/ with status Deployed.
2026-06-05 08:55:27 -04:00

210 lines
7.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Terraform LXC Deployment — Batch/Dynamic Template PRD
**Status:** Deployed | **Author:** Artemis | **Date:** 2026-06-05
> **Phase 2 validated:** Batch/dynamic template tested at N=4 and N=7 on MK33. All derivation rules confirmed.
## 1. Objective
Extend the Phase 1 single-LXC proven pipeline into a **parameterized batch generator**. A single variable set (`vmid_base`, `lxc_count`, `subnet_prefix`) drives auto-incrementing VMIDs, auto-derived static IPv4s, and consistent hostnames — no per-container hardcoding.
## 2. Dynamic Derivation Rules
### 2.1 Input Variables (User-Supplied)
| Variable | Example | Description |
|----------|---------|-------------|
| `vmid_base` | `5050` | Starting VMID for first LXC |
| `lxc_count` | `4` | Number of LXCs to create |
| `subnet_prefix` | `192.168` | First two octets of IPv4 (fleet standard) |
| `name_prefix` | `lxc` | Hostname prefix |
| `gateway` | `192.168.18.1` | Default gateway |
| `dns_servers` | `["192.168.7.7", "1.1.1.1"]` | DNS list |
### 2.2 Auto-Derived Per-LXC (Index `i` from `0` to `lxc_count-1`)
| Property | Formula | Example (`vmid_base=5050`, `i=2`) |
|----------|---------|----------------------------------|
| **VMID** | `vmid_base + i` | `5052` |
| **IPv4** | `subnet_prefix.${first2(vmid)}.${last2(vmid)}/18` | `192.168.50.52/18` |
| **Hostname** | `${name_prefix}-${vmid}` | `lxc-5052` |
| **Cores** | Fixed | `2` |
| **RAM** | Fixed | `2048` MB |
| **Disk** | Fixed | `8` GB |
**IP Derivation Detail:**
```
vmid = 5052
first2(vmid) = 50 (digits 3-4)
last2(vmid) = 52 (digits 5-6)
IPv4 = 192.168.50.52/18
```
This keeps VMID and IPv4 tightly coupled — **VMID is the single source of truth** for IP assignment. All IPs fall within the fleet `/18` subnet (`192.168.0.0/18`).
### 2.3 Example Runs
```bash
# Create 4 LXCs: lxc-5050 → lxc-5053
# IPs: 192.168.50.50 → 192.168.50.53
TF_VAR_vmid_base=5050 TF_VAR_lxc_count=4 ./run.sh apply -auto-approve
# Create 2 LXCs starting at 5100
# IPs: 192.168.51.00, 192.168.51.01
TF_VAR_vmid_base=5100 TF_VAR_lxc_count=2 ./run.sh apply -auto-approve
# Create 7 LXCs at vmid_base=931 (validated POC run)
TF_VAR_vmid_base=931 TF_VAR_lxc_count=7 ./run.sh apply -auto-approve
```
## 2. Architecture
### 2.1 Docker Image
**Base:** `hashicorp/terraform:latest` with `bpg/proxmox` provider downloaded at container init
**Provider:** `bpg/proxmox` v0.70.0
**Pattern:** Lazy automator — local workspace mounted into container, credentials via `terraform.auto.tfvars`
```dockerfile
FROM hashicorp/terraform:latest
WORKDIR /workspace
COPY run.sh /usr/local/bin/run
RUN chmod +x /usr/local/bin/run
ENTRYPOINT ["bash"]
```
### 2.2 Credential Model
Native Terraform variable loading via `terraform.auto.tfvars` (no Docker env-file mapping):
```hcl
# terraform/terraform.auto.tfvars
pm_api_url = "https://192.168.7.33:8006/api2/json"
pm_api_token_id = "root@pam!terraform"
pm_api_token_secret = "<secret>"
```
PVE API token created on MK33: `root@pam!terraform`. Token stored in fleet credential store.
### 2.3 Runtime Parameterization (Phase 2)
| Parameter | Example | Effect |
|-----------|---------|--------|
| `count` | `4` | Number of LXCs to create |
| `vmid_base` | `5050` | Starting VMID |
Auto-derived per LXC (index `i` from 0 to `count-1`):
- **VMID:** `vmid_base + i`
- **Name:** `lxc-${vmid}`
- **IPv4:** `192.168.${first2digits(vmid)}.${last2digits(vmid)}/18`
### 2.4 LXC Configuration (Validated)
- **OS:** Debian 12 (`debian-12-standard_12.2-1_amd64.tar.zst`)
- **CPU:** 1 vCPU
- **RAM:** 2048 MB
- **Storage:** 8GB rootfs on `local` directory (test phase)
- **Network:** Static IPv4, gateway `192.168.18.1`, subnet `/18`
- **DNS:** `192.168.7.7`, `192.168.18.1`, `1.1.1.1`
- **Privilege:** Unprivileged (`unprivileged = true`)
- **Features:** Nesting enabled (`features { nesting = true }`)
### 2.5 User / SSH (Tested)
```hcl
initialization {
user_account {
username = "jarvis"
password = "<fleet_linux_pass>" # Required for console login verification
keys = [file("artemis_key.pub")]
}
}
```
## 3. Phase Breakdown
### Phase 1 — Single LXC (Plan/Build/Destroy) ✅ COMPLETE
**Completed:** 2026-06-04 on MK33 (pve-swarm, cluster node 33)
**Results:**
- `Dockerfile` — simplified to official `hashicorp/terraform:latest` image
- `docker-compose.yml` — workspace mount, no env-file credential mapping
- `run.sh` — wrapper for `terraform plan/apply/destroy`
- `terraform/providers.tf``bpg/proxmox` v0.70.0
- `terraform/main.tf` — single LXC resource (VMID 5050)
- `terraform/terraform.auto.tfvars` — native Terraform credential loading
**Validated:**
```bash
./run.sh plan # ✅ Validated
./run.sh apply # ✅ Created lxc-5050 (debian-12, 192.168.50.50/18)
./run.sh destroy # ✅ Clean teardown
```
**Key fixes discovered during testing:**
- Storage pool: `local-lvm` missing → used `local` (Directory)
- Template path: `nas-ct-stor:vztmpl/` (NFS shared templates)
- Unprivileged required: `unprivileged = true` + `features { nesting = true }`
- Password injection: `user_account.password` required for console login verification
### Phase 2 — Modular + Bulk Creation ✅ VALIDATED
**Completed:** 2026-06-05 on MK33 (pve-swarm)
**Results:**
- `modules/lxc/` — reusable LXC module with `proxmox_virtual_environment_container` resource
- `main.tf``for_each` over module with `lxc_count` parameterization
- `run.sh` — forwards `TF_VAR_*` environment variables into Docker container
**Validated at multiple scales:**
| Test | Command | Result |
|------|---------|--------|
| 4 LXCs at vmid_base=3550 | `TF_VAR_lxc_count=4 TF_VAR_vmid_base=3550 ./run.sh apply` | ✅ All created; 1 transient 500 error on start (PVE task queue race), container existed and operational despite error |
| 7 LXCs at vmid_base=931 | `TF_VAR_lxc_count=7 TF_VAR_vmid_base=931 ./run.sh apply` | ✅ All 7 created successfully, no errors, ~1416s per container |
| 7 LXCs destroy | `./run.sh destroy -auto-approve` | ✅ All 7 destroyed cleanly in ~8s each |
**Key runtime behavior discovered:**
- `terraform.auto.tfvars` outranks `TF_VAR_*` environment variables — dynamic variables must **not** be set in `.tfvars`
- `-auto-approve` required on Dockerized terraform (no interactive TTY for confirmation)
- Parallel creation (default) works at N=7; transient race condition observed at N=4 (PVE task queue, not terraform logic)
- All containers receive SSH key + password via `initialization.user_account` block
## 4. File Structure
```
~/docker/terraform-pve/
├── Dockerfile
├── docker-compose.yml
├── run.sh
├── terraform/
│ ├── .terraform/
│ ├── main.tf
│ ├── providers.tf
│ ├── terraform.auto.tfvars # Credentials (not committed)
│ ├── terraform.tfstate
│ ├── variables.tf
│ └── artemis_key.pub
```
## 5. Resolved Decisions
| Decision | Chosen | Notes |
|----------|--------|-------|
| Debian template | **12** | `debian-12-standard_12.2-1_amd64.tar.zst` on `nas-ct-stor` |
| Gateway | **192.168.18.1** | Router IP for 192.168.0.0/18 subnet |
| DNS | **192.168.7.7, 192.168.18.1, 1.1.1.1** | Technitium primary + fallback |
| SSH key | **artemis_key.pub** | Already registered fleet-wide |
| Storage (Phase 1) | **local** | `local-lvm` missing on nodes; migrate to `truenas-nfs` in Phase 2 |
| Privilege | **Unprivileged** | `unprivileged = true` with `nesting = true` for systemd 252 |
| Credential loading | **terraform.auto.tfvars** | Native Terraform pattern; no Docker env-file complexity |
## 6. Fleet Notes
- PVE API token: `root@pam!terraform` (Secret: fleet credential store)
- PVE root password: `proxmox12` (fleet credential store)
- Cluster: `pve-swarm` (MK33, MK34, MK39)
- Template storage: `nas-ct-stor` (NFS from TrueNAS)
- Disk storage (test): `local`
- **Code location:** `~/docker/terraform-pve/` — local only, not in any Gitea repo