Files
documentation/PRD Drafts/ARCHIVED-terraform-proxmox-lxc-automation.md
F.R.I.D.A.Y. c1bb49d51a Terraform LXC PRD: promote validated draft to PRDs, archive stale F.R.I.D.A.Y. draft
- terraform-lxc-deployment.md → PRDs/ (validated, tested, canonical)
- terraform-proxmox-lxc-automation.md → ARCHIVED- (superseded by live POC)
- Matches Phase 1 POC results from terraform-pve repo
2026-06-04 22:58:19 -04:00

20 KiB
Raw Permalink Blame History

PRD: Terraform LXC Automation for Proxmox VE 9.2

Status: Draft — Pending Commander Bobby Review
Author: F.R.I.D.A.Y.
Date: 2026-06-01
Provider: bpg/proxmox (actively maintained, 11M+ downloads)
Target: Proxmox VE 9.2 / Debian Trixie


1. Purpose & Scope

This PRD defines the architecture, configuration patterns, and operational workflow for automating LXC container lifecycle management on Proxmox VE 9.2 clusters using Terraform and the actively maintained bpg/proxmox provider.

In scope:

  • Terraform provider configuration and authentication
  • LXC resource definitions (proxmox_virtual_environment_container)
  • Cloud-init / template-based provisioning
  • Network configuration (static IP, DHCP, bridge)
  • Storage allocation (rootfs on any PVE backend)
  • State management and CI/CD integration patterns

Out of scope:

  • VM (QEMU/KVM) provisioning
  • PVE cluster topology changes
  • Backup/restore automation (separate PRD)

2. Success Criteria

# Criterion How Verified
1 A single terraform apply creates a working LXC with SSH access ssh root@<lxc-ip> succeeds
2 LXCs are provisioned from official cloud-image templates Template downloaded via proxmox_virtual_environment_download_file
3 Network is configurable per-LXC (DHCP or static CIDR) ip addr inside container matches TF config
4 Rootfs lives on user-selected storage (not hardcoded to local-lvm) pvesm status shows volume on target datastore
5 State is stored remotely (S3-compatible or Terraform Cloud) terraform state list works from any machine
6 Destroy and recreate is idempotent terraform destroy && terraform apply yields identical result

3. Provider Selection

Why bpg/proxmox (not telmate/proxmox)

Provider Maintenance Downloads LXC Support Notes
bpg/proxmox Active (v0.108.0, June 2026) 11.8M+ Full Community-tier, comprehensive docs, supports PVE 9.x
telmate/proxmox Stale (last release ~2023) Legacy Partial Deprecated; lacks PVE 9.x features

Decision: Use bpg/proxmox exclusively. The telmate provider is unmaintained and incompatible with PVE 9.2 API changes.

Provider block (minimum):

terraform {
  required_providers {
    proxmox = {
      source  = "bpg/proxmox"
      version = "~> 0.108"
    }
  }
}

provider "proxmox" {
  endpoint = "https://192.168.7.33:8006/"
  username = "root@pam"
  password = var.proxmox_password  # or PROXMOX_VE_PASSWORD env var
  insecure = true                  # self-signed TLS
}

4. Authentication Matrix

Method Use Case Config Security
API Token Production, CI/CD api_token = "root@pam!mytoken=abc123…" Highest — revocable, fine-grained
Username/Password Development, one-offs username = "root@pam", password = "…" Medium — password in env
Auth Ticket TOTP-enabled accounts Pre-authenticate, pass ticket High — short-lived

Recommendation for Iron Legion:

  • Development: Use PROXMOX_VE_PASSWORD environment variable
  • CI/CD (future): Create a PVE API token with PVEFarmAdmin or custom role, store in CI secrets

5. Sample Project Structure

terraform-proxmox-lxc/
├── README.md
├── main.tf                 # Provider + backend config
├── variables.tf            # Input variables
├── terraform.tfvars.example  # Sample values (gitignored)
├── outputs.tf              # Useful outputs (IPs, IDs)
├── versions.tf             # Required providers + TF version
├── modules/
│   └── lxc/
│       ├── main.tf         # proxmox_virtual_environment_container resource
│       ├── variables.tf    # Module inputs
│       └── outputs.tf      # Module outputs
├── environments/
│   ├── dev/
│   │   ├── main.tf         # Calls modules with dev vars
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       └── terraform.tfvars
└── templates/
    └── ubuntu-25.04-cloudimg.yaml   # Cloud-init user-data (optional)

Key Files

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    proxmox = {
      source  = "bpg/proxmox"
      version = "~> 0.108"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.6"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
  }

  # Remote state — S3-compatible (Minio, Garage, AWS S3)
  backend "s3" {
    bucket       = "iron-legion-terraform"
    key          = "proxmox-lxc/terraform.tfstate"
    region       = "us-east-1"
    endpoint     = "https://s3.nb.bobbysh.me"
    use_path_style = true

    # Skip AWS-specific validations for self-hosted S3
    skip_credentials_validation = true
    skip_metadata_api_check     = true
    skip_region_validation      = true
    skip_requesting_account_id  = true
  }
}

variables.tf

variable "proxmox_endpoint" {
  description = "PVE API URL"
  type        = string
  default     = "https://192.168.7.33:8006/"
}

variable "proxmox_node" {
  description = "Target PVE node name"
  type        = string
  default     = "mk33"
}

variable "ssh_public_key" {
  description = "SSH public key for root access"
  type        = string
}

variable "lxc_configs" {
  description = "Map of LXC configurations"
  type = map(object({
    vm_id        = number
    hostname     = string
    cores        = optional(number, 2)
    memory       = optional(number, 2048)
    disk_size    = optional(number, 8)
    datastore_id = optional(string, "local-lvm")
    ip_address   = optional(string, "dhcp")
    gateway      = optional(string, null)
    template_url = optional(string, "https://mirrors.servercentral.com/ubuntu-cloud-images/releases/25.04/release/ubuntu-25.04-server-cloudimg-amd64-root.tar.xz")
    features = optional(object({
      nesting = optional(bool, true)
      fuse    = optional(bool, false)
      keyctl  = optional(bool, false)
    }), {})
  }))
}

modules/lxc/main.tf

resource "proxmox_virtual_environment_download_file" "lxc_template" {
  for_each = var.lxc_configs

  content_type = "vztmpl"
  datastore_id = "local"
  node_name    = var.proxmox_node
  url          = each.value.template_url
  file_name    = "${each.key}-template.tar.xz"
  overwrite    = false
}

resource "proxmox_virtual_environment_container" "lxc" {
  for_each = var.lxc_configs

  node_name    = var.proxmox_node
  vm_id        = each.value.vm_id
  description  = "Managed by Terraform — ${each.key}"

  unprivileged = true

  features {
    nesting = each.value.features.nesting
    fuse    = each.value.features.fuse
    keyctl  = each.value.features.keyctl
  }

  cpu {
    cores  = each.value.cores
    units  = 1024
  }

  memory {
    dedicated = each.value.memory
    swap      = 0
  }

  disk {
    datastore_id = each.value.datastore_id
    size         = each.value.disk_size
  }

  initialization {
    hostname = each.value.hostname

    ip_config {
      ipv4 {
        address = each.value.ip_address
        gateway = each.value.gateway
      }
    }

    user_account {
      keys     = [var.ssh_public_key]
      password = random_password.lxc_root[each.key].result
    }
  }

  network_interface {
    name   = "veth0"
    bridge = "vmbr0"
  }

  operating_system {
    template_file_id = proxmox_virtual_environment_download_file.lxc_template[each.key].id
    type             = "ubuntu"
  }

  startup {
    order      = "3"
    up_delay   = "60"
    down_delay = "60"
  }

  depends_on = [proxmox_virtual_environment_download_file.lxc_template]
}

resource "random_password" "lxc_root" {
  for_each = var.lxc_configs

  length           = 16
  special          = true
  override_special = "_%@"
}

modules/lxc/variables.tf

variable "proxmox_node" {
  type = string
}

variable "ssh_public_key" {
  type = string
}

variable "lxc_configs" {
  type = map(object({
    vm_id        = number
    hostname     = string
    cores        = optional(number, 2)
    memory       = optional(number, 2048)
    disk_size    = optional(number, 8)
    datastore_id = optional(string, "local-lvm")
    ip_address   = optional(string, "dhcp")
    gateway      = optional(string, null)
    template_url = optional(string)
    features = optional(object({
      nesting = optional(bool, true)
      fuse    = optional(bool, false)
      keyctl  = optional(bool, false)
    }), {})
  }))
}

modules/lxc/outputs.tf

output "lxc_ids" {
  description = "Map of LXC names to VM IDs"
  value       = { for k, v in proxmox_virtual_environment_container.lxc : k => v.vm_id }
}

output "lxc_ips" {
  description = "Map of LXC names to IPv4 addresses"
  value       = { for k, v in proxmox_virtual_environment_container.lxc : k => v.ipv4 }
}

output "lxc_passwords" {
  description = "Map of LXC names to root passwords (sensitive)"
  value       = { for k, v in random_password.lxc_root : k => v.result }
  sensitive   = true
}

environments/dev/main.tf

module "dev_lxcs" {
  source = "../../modules/lxc"

  proxxmox_node  = "mk33"
  ssh_public_key = file("~/.ssh/id_ed25519.pub")

  lxc_configs = {
    "dev-nextcloud" = {
      vm_id        = 2100
      hostname     = "dev-nextcloud"
      cores        = 4
      memory       = 4096
      disk_size    = 16
      datastore_id = "local-zfs"
      ip_address   = "192.168.7.100/24"
      gateway      = "192.168.7.1"
    }
    "dev-vaultwarden" = {
      vm_id        = 2101
      hostname     = "dev-vaultwarden"
      cores        = 2
      memory       = 2048
      disk_size    = 8
      datastore_id = "local-zfs"
      ip_address   = "192.168.7.101/24"
      gateway      = "192.168.7.1"
    }
  }
}

6. Resource Reference — proxmox_virtual_environment_container

Critical Arguments

Block Key Required Default Description
node_name PVE node to create on
vm_id Unique numeric ID (100999999999)
unprivileged true Run as unprivileged container
features nesting false Enable nested containers (needed for Docker-in-LXC)
features fuse false Enable FUSE mounts
cpu cores 1 vCPU cores
memory dedicated 512 RAM in MB
disk datastore_id local Storage pool for rootfs
disk size 4 Rootfs size in GB
initialization hostname DNS-compatible hostname
initialization.ip_config.ipv4 address CIDR or dhcp
initialization.ip_config.ipv4 gateway Required for static IP
initialization.user_account keys SSH authorized_keys
network_interface name veth0
network_interface bridge vmbr0 Bridge to attach
operating_system template_file_id Downloaded template or local:vztmpl/…
operating_system type unmanaged ubuntu, debian, alpine, etc.

Important Notes

  • Template download uses proxmox_virtual_environment_download_file — caches template per-node, avoids re-download
  • Cloud-init is embedded in the initialization block — no separate cloud-init drive needed for LXC
  • Nesting = true is required for any LXC running Docker or systemd-nspawn
  • Datastore is backend-agnostic: local-lvm, local-zfs, tank-zfs, ceph-rbd, NFS, etc. all work

7. Data Sources

Use data sources to query existing infrastructure without managing it:

data "proxmox_virtual_environment_datastores" "available" {
  node_name = var.proxmox_node
}

data "proxmox_virtual_environment_nodes" "cluster" {}

data "proxmox_virtual_environment_container" "existing" {
  node_name = var.proxmox_node  # or specify target node explicitly
  vm_id     = 2001
}

Common use cases:

  • Validate a datastore exists before creating a disk
  • Read an existing LXCs IP to populate a DNS record (Technitium)
  • List nodes for multi-node placement logic

8. State Management

Iron Legion already runs self-hosted services. A Garage or Minio instance on a fleet storage node (e.g., Neo) can serve as the Terraform state backend:

terraform {
  backend "s3" {
    bucket       = "iron-legion-terraform"
    key          = "proxmox-lxc/dev.tfstate"
    region       = "us-east-1"
    endpoint     = "https://s3.nb.bobbysh.me"
    use_path_style = true

    skip_credentials_validation = true
    skip_metadata_api_check     = true
    skip_region_validation      = true
    skip_requesting_account_id  = true
  }
}

State Locking (Critical for Team Use)

Add a DynamoDB-compatible table or use a native locking mechanism. If S3 backend does not support locking, wrap terraform apply in a CI pipeline that serializes runs.


Optional: Atlantis Web UI for Terraform PR Automation

What Atlantis Is

Atlantis is a self-hosted web application that listens for webhook events from Git repositories and runs terraform plan / terraform apply automatically inside PR/MR workflows. It posts plan output back to the PR as comments, enforces approval gates, and locks workspaces to prevent concurrent applies.

Can Atlantis Manage LXC Resources via bpg/proxmox?

Yes. Atlantis is a Terraform orchestration layer, not a provider. It supports any Terraform provider including bpg/proxmox. The workflow is:

  1. Developer opens a PR adding/modifying .tf files defining LXC containers
  2. Atlantis receives the webhook and runs terraform plan in a isolated directory
  3. Plan output posted as a PR comment — team reviews before approval
  4. After approval (or atlantis apply comment), Atlantis runs terraform apply

Atlantis Docker Compose (Self-Hosted)

services:
  atlantis:
    image: ghcr.io/runatlantis/atlantis:latest
    ports:
      - "4141:4141"
    volumes:
      - ${HOME}/.ssh:/home/atlantis/.ssh:ro           # Git SSH key
      - /var/run/docker.sock:/var/run/docker.sock:ro # if using Docker TF provider
      - atlantis-data:/home/atlantis/.atlantis
    environment:
      ATLANTIS_GH_USER: "iron-legion-bot"              # or ATLANTIS_GITLAB_USER / ATLANTIS_GITEA_USER
      ATLANTIS_GH_TOKEN: "${ATLANTIS_GH_TOKEN}"        # personal access token
      ATLANTIS_REPO_ALLOWLIST: "github.com/Iron-Legion/*"
      ATLANTIS_GH_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
      # For Gitea:
      # ATLANTIS_GITEA_USER: "iron-legion-bot"
      # ATLANTIS_GITEA_TOKEN: "${GITEA_TOKEN}"
      # ATLANTIS_GITEA_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
    command: server
    restart: unless-stopped

    # Optional: Redis for distributed locking in multi-replica setups
    # redis:
    #   image: redis:8-alpine
    #   volumes:
    #     - redis-data:/data
    #   restart: always

volumes:
  atlantis-data:
    driver: local

Key Features

  • Plan Comments: Every PR gets an auto-generated terraform plan comment
  • Apply Locking: One apply at a time per workspace; concurrent PRs queue
  • Policy Checks: Integrate OPA (Open Policy Agent) or custom scripts to block non-compliant changes
  • Custom Workflows: Define per-repo or per-directory workflows (e.g., plan-only for dev, auto-apply for staging)
  • Self-Hosted SCM: Native webhook support for GitHub, GitLab, Bitbucket, and Gitea

Resource Footprint

  • Atlantis container: ~100200 MB RAM, minimal CPU
  • Optional Redis: ~20 MB RAM
  • Total: fits comfortably on any Iron Legion node (MK7, MK3342, Neo)

Gitea Integration Notes

  • Atlantis supports Gitea via the --gitea-user, --gitea-token, --gitea-webhook-secret flags
  • Must expose Atlantis endpoint to Gitea (Tailscale funnel, reverse proxy, or LAN if Gitea is in-network)
  • Webhook URL: http://atlantis-host:4141/events

9. Operational Workflow

Day 0 — Bootstrap

# 1. Clone repo
git clone ssh://git@100.99.123.16:2222/Iron-Legion/terraform-proxmox-lxc.git
cd terraform-proxmox-lxc/environments/dev

# 2. Set credentials
export PROXMOX_VE_PASSWORD="your-pve-password"
# OR for API token:
export PROXMOX_VE_API_TOKEN="root@pam!mytoken=abc123"

# 3. Initialize
terraform init

# 4. Plan
terraform plan -out=tfplan

# 5. Apply
terraform apply tfplan

Day N — Add a Container

  1. Add entry to lxc_configs map in environments/dev/main.tf
  2. terraform plan — review VM ID collision, IP conflict, storage capacity
  3. terraform apply
  4. Verify: ssh root@<new-ip>

Day N — Destroy a Container

  1. Remove entry from lxc_configs map
  2. terraform apply — resource destroyed
  3. Or: terraform destroy -target='module.dev_lxcs.proxmox_virtual_environment_container.lxc["dev-nextcloud"]'

10. Risks & Mitigations

Risk Likelihood Impact Mitigation
VM ID collision Medium High Maintain a fleet-wide VM ID registry; use proxmox_virtual_environment_vms data source to check
IP overlap with DHCP pool Medium High Reserve static IPs in Technitium DNS; use dns data source to verify
Template download fails (slow mirror) Low Medium Pre-seed templates on PVE nodes; use pvesm to verify before apply
State file corruption Low Critical S3 versioning + periodic terraform state pull backups
Privilege escalation via privileged LXC Low High Default unprivileged = true; explicit override required
Provider breaking change Medium Medium Pin provider version ~> 0.108; test upgrades in dev environment first

11. Open Questions

  1. Do we pre-create cloud-image templates on each PVE node, or let Terraform download per-node?

    • Per-node: slower first deploy, but self-contained
    • Pre-seeded: faster, requires manual pvesm or Ansible step
  2. Should LXCs register themselves in Technitium DNS via Terraform, or rely on DHCP + DNS integration?

    • Terraform can call a dns_a_record module (if Technitium provider exists)
    • Or: use PVE's built-in DHCP + DNSMASQ if configured
  3. CI/CD pipeline: GitHub Actions runner, or local Gitea Actions on the fleet SCM host?

    • Gitea Actions keeps secrets in-network
    • GitHub Actions requires Tailscale funnel or external exposure
  4. Do we want a dedicated LXC "Terraform runner" inside the cluster, or run from Artemis/operator workstation?

    • In-cluster runner: always has LAN access to PVE API
    • External: requires Tailscale or VPN for API reachability

12. Appendix

B. Useful PVE CLI Commands (for verification)

# List containers on a node
pct list

# List templates
pvesm list local --content vztmpl

# Check datastore usage
pvesm status

# Enter a container
pct enter <vm_id>

C. Terraform Commands Reference

terraform init          # Download providers, configure backend
terraform validate      # Syntax check
terraform plan          # Preview changes
terraform apply         # Execute changes
terraform destroy       # Tear down everything
terraform state list    # Show managed resources
terraform state show <addr>  # Show one resource's attributes
terraform output        # Display output values
terraform fmt -recursive  # Format all .tf files

End of PRD. Ready for Commander Bobby review and approval.