Files
documentation/PRDs/terraform-lxc-deployment-batch.md
F.R.I.D.A.Y. 3f0e36c8bb Promote all operational PRDs to Deployed status
- terraform-lxc-deployment.md: Deployed (Phase 1 single-LXC baseline)
- terraform-lxc-deployment-batch.md: Deployed (Phase 2 batch/dynamic template, validated N=4/N=7)
- ansible-base-testing.md: Deployed (base testing environment, validated fleet ping/playbook)
- ansible-playbook.md: Deployed (NFS client role, validated MK7 + Swarm workers)

All four PRDs now in PRDs/ with status Deployed.
2026-06-05 08:55:27 -04:00

7.8 KiB
Raw Permalink Blame History

Terraform LXC Deployment — Batch/Dynamic Template PRD

Status: Deployed | Author: Artemis | Date: 2026-06-05

Phase 2 validated: Batch/dynamic template tested at N=4 and N=7 on MK33. All derivation rules confirmed.

1. Objective

Extend the Phase 1 single-LXC proven pipeline into a parameterized batch generator. A single variable set (vmid_base, lxc_count, subnet_prefix) drives auto-incrementing VMIDs, auto-derived static IPv4s, and consistent hostnames — no per-container hardcoding.

2. Dynamic Derivation Rules

2.1 Input Variables (User-Supplied)

Variable Example Description
vmid_base 5050 Starting VMID for first LXC
lxc_count 4 Number of LXCs to create
subnet_prefix 192.168 First two octets of IPv4 (fleet standard)
name_prefix lxc Hostname prefix
gateway 192.168.18.1 Default gateway
dns_servers ["192.168.7.7", "1.1.1.1"] DNS list

2.2 Auto-Derived Per-LXC (Index i from 0 to lxc_count-1)

Property Formula Example (vmid_base=5050, i=2)
VMID vmid_base + i 5052
IPv4 subnet_prefix.${first2(vmid)}.${last2(vmid)}/18 192.168.50.52/18
Hostname ${name_prefix}-${vmid} lxc-5052
Cores Fixed 2
RAM Fixed 2048 MB
Disk Fixed 8 GB

IP Derivation Detail:

vmid = 5052
first2(vmid) = 50    (digits 3-4)
last2(vmid)  = 52    (digits 5-6)
IPv4 = 192.168.50.52/18

This keeps VMID and IPv4 tightly coupled — VMID is the single source of truth for IP assignment. All IPs fall within the fleet /18 subnet (192.168.0.0/18).

2.3 Example Runs

# Create 4 LXCs: lxc-5050 → lxc-5053
# IPs: 192.168.50.50 → 192.168.50.53
TF_VAR_vmid_base=5050 TF_VAR_lxc_count=4 ./run.sh apply -auto-approve

# Create 2 LXCs starting at 5100
# IPs: 192.168.51.00, 192.168.51.01
TF_VAR_vmid_base=5100 TF_VAR_lxc_count=2 ./run.sh apply -auto-approve

# Create 7 LXCs at vmid_base=931 (validated POC run)
TF_VAR_vmid_base=931 TF_VAR_lxc_count=7 ./run.sh apply -auto-approve

2. Architecture

2.1 Docker Image

Base: hashicorp/terraform:latest with bpg/proxmox provider downloaded at container init Provider: bpg/proxmox v0.70.0 Pattern: Lazy automator — local workspace mounted into container, credentials via terraform.auto.tfvars

FROM hashicorp/terraform:latest
WORKDIR /workspace
COPY run.sh /usr/local/bin/run
RUN chmod +x /usr/local/bin/run
ENTRYPOINT ["bash"]

2.2 Credential Model

Native Terraform variable loading via terraform.auto.tfvars (no Docker env-file mapping):

# terraform/terraform.auto.tfvars
pm_api_url      = "https://192.168.7.33:8006/api2/json"
pm_api_token_id = "root@pam!terraform"
pm_api_token_secret = "<secret>"

PVE API token created on MK33: root@pam!terraform. Token stored in fleet credential store.

2.3 Runtime Parameterization (Phase 2)

Parameter Example Effect
count 4 Number of LXCs to create
vmid_base 5050 Starting VMID

Auto-derived per LXC (index i from 0 to count-1):

  • VMID: vmid_base + i
  • Name: lxc-${vmid}
  • IPv4: 192.168.${first2digits(vmid)}.${last2digits(vmid)}/18

2.4 LXC Configuration (Validated)

  • OS: Debian 12 (debian-12-standard_12.2-1_amd64.tar.zst)
  • CPU: 1 vCPU
  • RAM: 2048 MB
  • Storage: 8GB rootfs on local directory (test phase)
  • Network: Static IPv4, gateway 192.168.18.1, subnet /18
  • DNS: 192.168.7.7, 192.168.18.1, 1.1.1.1
  • Privilege: Unprivileged (unprivileged = true)
  • Features: Nesting enabled (features { nesting = true })

2.5 User / SSH (Tested)

initialization {
  user_account {
    username = "jarvis"
    password = "<fleet_linux_pass>"  # Required for console login verification
    keys     = [file("artemis_key.pub")]
  }
}

3. Phase Breakdown

Phase 1 — Single LXC (Plan/Build/Destroy) COMPLETE

Completed: 2026-06-04 on MK33 (pve-swarm, cluster node 33)

Results:

  • Dockerfile — simplified to official hashicorp/terraform:latest image
  • docker-compose.yml — workspace mount, no env-file credential mapping
  • run.sh — wrapper for terraform plan/apply/destroy
  • terraform/providers.tfbpg/proxmox v0.70.0
  • terraform/main.tf — single LXC resource (VMID 5050)
  • terraform/terraform.auto.tfvars — native Terraform credential loading

Validated:

./run.sh plan    # ✅ Validated
./run.sh apply   # ✅ Created lxc-5050 (debian-12, 192.168.50.50/18)
./run.sh destroy # ✅ Clean teardown

Key fixes discovered during testing:

  • Storage pool: local-lvm missing → used local (Directory)
  • Template path: nas-ct-stor:vztmpl/ (NFS shared templates)
  • Unprivileged required: unprivileged = true + features { nesting = true }
  • Password injection: user_account.password required for console login verification

Phase 2 — Modular + Bulk Creation VALIDATED

Completed: 2026-06-05 on MK33 (pve-swarm)

Results:

  • modules/lxc/ — reusable LXC module with proxmox_virtual_environment_container resource
  • main.tffor_each over module with lxc_count parameterization
  • run.sh — forwards TF_VAR_* environment variables into Docker container

Validated at multiple scales:

Test Command Result
4 LXCs at vmid_base=3550 TF_VAR_lxc_count=4 TF_VAR_vmid_base=3550 ./run.sh apply All created; 1 transient 500 error on start (PVE task queue race), container existed and operational despite error
7 LXCs at vmid_base=931 TF_VAR_lxc_count=7 TF_VAR_vmid_base=931 ./run.sh apply All 7 created successfully, no errors, ~1416s per container
7 LXCs destroy ./run.sh destroy -auto-approve All 7 destroyed cleanly in ~8s each

Key runtime behavior discovered:

  • terraform.auto.tfvars outranks TF_VAR_* environment variables — dynamic variables must not be set in .tfvars
  • -auto-approve required on Dockerized terraform (no interactive TTY for confirmation)
  • Parallel creation (default) works at N=7; transient race condition observed at N=4 (PVE task queue, not terraform logic)
  • All containers receive SSH key + password via initialization.user_account block

4. File Structure

~/docker/terraform-pve/
├── Dockerfile
├── docker-compose.yml
├── run.sh
├── terraform/
│   ├── .terraform/
│   ├── main.tf
│   ├── providers.tf
│   ├── terraform.auto.tfvars   # Credentials (not committed)
│   ├── terraform.tfstate
│   ├── variables.tf
│   └── artemis_key.pub

5. Resolved Decisions

Decision Chosen Notes
Debian template 12 debian-12-standard_12.2-1_amd64.tar.zst on nas-ct-stor
Gateway 192.168.18.1 Router IP for 192.168.0.0/18 subnet
DNS 192.168.7.7, 192.168.18.1, 1.1.1.1 Technitium primary + fallback
SSH key artemis_key.pub Already registered fleet-wide
Storage (Phase 1) local local-lvm missing on nodes; migrate to truenas-nfs in Phase 2
Privilege Unprivileged unprivileged = true with nesting = true for systemd 252
Credential loading terraform.auto.tfvars Native Terraform pattern; no Docker env-file complexity

6. Fleet Notes

  • PVE API token: root@pam!terraform (Secret: fleet credential store)
  • PVE root password: proxmox12 (fleet credential store)
  • Cluster: pve-swarm (MK33, MK34, MK39)
  • Template storage: nas-ct-stor (NFS from TrueNAS)
  • Disk storage (test): local
  • Code location: ~/docker/terraform-pve/ — local only, not in any Gitea repo