diff --git a/PRD Drafts/terraform-proxmox-lxc-automation.md b/PRD Drafts/terraform-proxmox-lxc-automation.md new file mode 100644 index 0000000..4f91b5f --- /dev/null +++ b/PRD Drafts/terraform-proxmox-lxc-automation.md @@ -0,0 +1,563 @@ +# PRD: Terraform LXC Automation for Proxmox VE 9.2 + +**Status:** Draft — Pending Commander Bobby Review +**Author:** F.R.I.D.A.Y. +**Date:** 2026-06-01 +**Provider:** `bpg/proxmox` (actively maintained, 11M+ downloads) +**Target:** Proxmox VE 9.2 / Debian Trixie + +--- + +## 1. Purpose & Scope + +This PRD defines the architecture, configuration patterns, and operational workflow for automating LXC container lifecycle management on Proxmox VE 9.2 clusters using Terraform and the actively maintained `bpg/proxmox` provider. + +**In scope:** +- Terraform provider configuration and authentication +- LXC resource definitions (`proxmox_virtual_environment_container`) +- Cloud-init / template-based provisioning +- Network configuration (static IP, DHCP, bridge) +- Storage allocation (rootfs on any PVE backend) +- State management and CI/CD integration patterns + +**Out of scope:** +- VM (QEMU/KVM) provisioning +- PVE cluster topology changes +- Backup/restore automation (separate PRD) + +--- + +## 2. Success Criteria + +| # | Criterion | How Verified | +|---|-----------|-------------| +| 1 | A single `terraform apply` creates a working LXC with SSH access | `ssh root@` succeeds | +| 2 | LXCs are provisioned from official cloud-image templates | Template downloaded via `proxmox_virtual_environment_download_file` | +| 3 | Network is configurable per-LXC (DHCP or static CIDR) | `ip addr` inside container matches TF config | +| 4 | Rootfs lives on user-selected storage (not hardcoded to `local-lvm`) | `pvesm status` shows volume on target datastore | +| 5 | State is stored remotely (S3-compatible or Terraform Cloud) | `terraform state list` works from any machine | +| 6 | Destroy and recreate is idempotent | `terraform destroy && terraform apply` yields identical result | + +--- + +## 3. Provider Selection + +### Why `bpg/proxmox` (not `telmate/proxmox`) + +| Provider | Maintenance | Downloads | LXC Support | Notes | +|----------|-------------|-----------|-------------|-------| +| `bpg/proxmox` | ✅ Active (v0.108.0, June 2026) | 11.8M+ | Full | Community-tier, comprehensive docs, supports PVE 9.x | +| `telmate/proxmox` | ❌ Stale (last release ~2023) | Legacy | Partial | Deprecated; lacks PVE 9.x features | + +**Decision:** Use `bpg/proxmox` exclusively. The `telmate` provider is unmaintained and incompatible with PVE 9.2 API changes. + +**Provider block (minimum):** +```hcl +terraform { + required_providers { + proxmox = { + source = "bpg/proxmox" + version = "~> 0.108" + } + } +} + +provider "proxmox" { + endpoint = "https://192.168.7.7:8006/" + username = "root@pam" + password = var.proxmox_password # or PROXMOX_VE_PASSWORD env var + insecure = true # self-signed TLS +} +``` + +--- + +## 4. Authentication Matrix + +| Method | Use Case | Config | Security | +|--------|----------|--------|----------| +| **API Token** | Production, CI/CD | `api_token = "root@pam!mytoken=abc123…"` | Highest — revocable, fine-grained | +| **Username/Password** | Development, one-offs | `username = "root@pam"`, `password = "…"` | Medium — password in env | +| **Auth Ticket** | TOTP-enabled accounts | Pre-authenticate, pass ticket | High — short-lived | + +**Recommendation for Iron Legion:** +- **Development:** Use `PROXMOX_VE_PASSWORD` environment variable +- **CI/CD (future):** Create a PVE API token with `PVEFarmAdmin` or custom role, store in CI secrets + +--- + +## 5. Sample Project Structure + +``` +terraform-proxmox-lxc/ +├── README.md +├── main.tf # Provider + backend config +├── variables.tf # Input variables +├── terraform.tfvars.example # Sample values (gitignored) +├── outputs.tf # Useful outputs (IPs, IDs) +├── versions.tf # Required providers + TF version +├── modules/ +│ └── lxc/ +│ ├── main.tf # proxmox_virtual_environment_container resource +│ ├── variables.tf # Module inputs +│ └── outputs.tf # Module outputs +├── environments/ +│ ├── dev/ +│ │ ├── main.tf # Calls modules with dev vars +│ │ └── terraform.tfvars +│ └── prod/ +│ ├── main.tf +│ └── terraform.tfvars +└── templates/ + └── ubuntu-25.04-cloudimg.yaml # Cloud-init user-data (optional) +``` + +### Key Files + +#### `versions.tf` +```hcl +terraform { + required_version = ">= 1.5.0" + + required_providers { + proxmox = { + source = "bpg/proxmox" + version = "~> 0.108" + } + random = { + source = "hashicorp/random" + version = "~> 3.6" + } + tls = { + source = "hashicorp/tls" + version = "~> 4.0" + } + } + + # Remote state — S3-compatible (Minio, Garage, AWS S3) + backend "s3" { + bucket = "iron-legion-terraform" + key = "proxmox-lxc/terraform.tfstate" + region = "us-east-1" + endpoint = "https://s3.nb.bobbysh.me" + use_path_style = true + + # Skip AWS-specific validations for self-hosted S3 + skip_credentials_validation = true + skip_metadata_api_check = true + skip_region_validation = true + skip_requesting_account_id = true + } +} +``` + +#### `variables.tf` +```hcl +variable "proxmox_endpoint" { + description = "PVE API URL" + type = string + default = "https://192.168.7.7:8006/" +} + +variable "proxmox_node" { + description = "Target PVE node name" + type = string + default = "mk7" +} + +variable "ssh_public_key" { + description = "SSH public key for root access" + type = string +} + +variable "lxc_configs" { + description = "Map of LXC configurations" + type = map(object({ + vm_id = number + hostname = string + cores = optional(number, 2) + memory = optional(number, 2048) + disk_size = optional(number, 8) + datastore_id = optional(string, "local-lvm") + ip_address = optional(string, "dhcp") + gateway = optional(string, null) + template_url = optional(string, "https://mirrors.servercentral.com/ubuntu-cloud-images/releases/25.04/release/ubuntu-25.04-server-cloudimg-amd64-root.tar.xz") + features = optional(object({ + nesting = optional(bool, true) + fuse = optional(bool, false) + keyctl = optional(bool, false) + }), {}) + })) +} +``` + +#### `modules/lxc/main.tf` +```hcl +resource "proxmox_virtual_environment_download_file" "lxc_template" { + for_each = var.lxc_configs + + content_type = "vztmpl" + datastore_id = "local" + node_name = var.proxmox_node + url = each.value.template_url + file_name = "${each.key}-template.tar.xz" + overwrite = false +} + +resource "proxmox_virtual_environment_container" "lxc" { + for_each = var.lxc_configs + + node_name = var.proxmox_node + vm_id = each.value.vm_id + description = "Managed by Terraform — ${each.key}" + + unprivileged = true + + features { + nesting = each.value.features.nesting + fuse = each.value.features.fuse + keyctl = each.value.features.keyctl + } + + cpu { + cores = each.value.cores + units = 1024 + } + + memory { + dedicated = each.value.memory + swap = 0 + } + + disk { + datastore_id = each.value.datastore_id + size = each.value.disk_size + } + + initialization { + hostname = each.value.hostname + + ip_config { + ipv4 { + address = each.value.ip_address + gateway = each.value.gateway + } + } + + user_account { + keys = [var.ssh_public_key] + password = random_password.lxc_root[each.key].result + } + } + + network_interface { + name = "veth0" + bridge = "vmbr0" + } + + operating_system { + template_file_id = proxmox_virtual_environment_download_file.lxc_template[each.key].id + type = "ubuntu" + } + + startup { + order = "3" + up_delay = "60" + down_delay = "60" + } + + depends_on = [proxmox_virtual_environment_download_file.lxc_template] +} + +resource "random_password" "lxc_root" { + for_each = var.lxc_configs + + length = 16 + special = true + override_special = "_%@" +} +``` + +#### `modules/lxc/variables.tf` +```hcl +variable "proxmox_node" { + type = string +} + +variable "ssh_public_key" { + type = string +} + +variable "lxc_configs" { + type = map(object({ + vm_id = number + hostname = string + cores = optional(number, 2) + memory = optional(number, 2048) + disk_size = optional(number, 8) + datastore_id = optional(string, "local-lvm") + ip_address = optional(string, "dhcp") + gateway = optional(string, null) + template_url = optional(string) + features = optional(object({ + nesting = optional(bool, true) + fuse = optional(bool, false) + keyctl = optional(bool, false) + }), {}) + })) +} +``` + +#### `modules/lxc/outputs.tf` +```hcl +output "lxc_ids" { + description = "Map of LXC names to VM IDs" + value = { for k, v in proxmox_virtual_environment_container.lxc : k => v.vm_id } +} + +output "lxc_ips" { + description = "Map of LXC names to IPv4 addresses" + value = { for k, v in proxmox_virtual_environment_container.lxc : k => v.ipv4 } +} + +output "lxc_passwords" { + description = "Map of LXC names to root passwords (sensitive)" + value = { for k, v in random_password.lxc_root : k => v.result } + sensitive = true +} +``` + +#### `environments/dev/main.tf` +```hcl +module "dev_lxcs" { + source = "../../modules/lxc" + + proxxmox_node = "mk7" + ssh_public_key = file("~/.ssh/id_ed25519.pub") + + lxc_configs = { + "dev-nextcloud" = { + vm_id = 2100 + hostname = "dev-nextcloud" + cores = 4 + memory = 4096 + disk_size = 16 + datastore_id = "local-zfs" + ip_address = "192.168.7.100/24" + gateway = "192.168.7.1" + } + "dev-vaultwarden" = { + vm_id = 2101 + hostname = "dev-vaultwarden" + cores = 2 + memory = 2048 + disk_size = 8 + datastore_id = "local-zfs" + ip_address = "192.168.7.101/24" + gateway = "192.168.7.1" + } + } +} +``` + +--- + +## 6. Resource Reference — `proxmox_virtual_environment_container` + +### Critical Arguments + +| Block | Key | Required | Default | Description | +|-------|-----|----------|---------|-------------| +| — | `node_name` | ✅ | — | PVE node to create on | +| — | `vm_id` | ✅ | — | Unique numeric ID (100–999999999) | +| — | `unprivileged` | ❌ | `true` | Run as unprivileged container | +| `features` | `nesting` | ❌ | `false` | Enable nested containers (needed for Docker-in-LXC) | +| `features` | `fuse` | ❌ | `false` | Enable FUSE mounts | +| `cpu` | `cores` | ❌ | `1` | vCPU cores | +| `memory` | `dedicated` | ❌ | `512` | RAM in MB | +| `disk` | `datastore_id` | ❌ | `local` | Storage pool for rootfs | +| `disk` | `size` | ❌ | `4` | Rootfs size in GB | +| `initialization` | `hostname` | ✅ | — | DNS-compatible hostname | +| `initialization.ip_config.ipv4` | `address` | ✅ | — | CIDR or `dhcp` | +| `initialization.ip_config.ipv4` | `gateway` | ❌ | — | Required for static IP | +| `initialization.user_account` | `keys` | ❌ | — | SSH authorized_keys | +| `network_interface` | `name` | ✅ | — | `veth0` | +| `network_interface` | `bridge` | ❌ | `vmbr0` | Bridge to attach | +| `operating_system` | `template_file_id` | ✅ | — | Downloaded template or `local:vztmpl/…` | +| `operating_system` | `type` | ❌ | `unmanaged` | `ubuntu`, `debian`, `alpine`, etc. | + +### Important Notes +- **Template download** uses `proxmox_virtual_environment_download_file` — caches template per-node, avoids re-download +- **Cloud-init** is embedded in the `initialization` block — no separate cloud-init drive needed for LXC +- **Nesting = true** is required for any LXC running Docker or systemd-nspawn +- **Datastore** is backend-agnostic: `local-lvm`, `local-zfs`, `tank-zfs`, `ceph-rbd`, NFS, etc. all work + +--- + +## 7. Data Sources + +Use data sources to query existing infrastructure without managing it: + +```hcl +data "proxmox_virtual_environment_datastores" "available" { + node_name = "mk7" +} + +data "proxmox_virtual_environment_nodes" "cluster" {} + +data "proxmox_virtual_environment_container" "existing" { + node_name = "mk7" + vm_id = 2001 +} +``` + +**Common use cases:** +- Validate a datastore exists before creating a disk +- Read an existing LXC’s IP to populate a DNS record (Technitium) +- List nodes for multi-node placement logic + +--- + +## 8. State Management + +### Recommended: S3-Compatible Backend + +Iron Legion already runs self-hosted services. A Garage or Minio instance on Neo/MK7 can serve as the Terraform state backend: + +```hcl +terraform { + backend "s3" { + bucket = "iron-legion-terraform" + key = "proxmox-lxc/dev.tfstate" + region = "us-east-1" + endpoint = "https://s3.nb.bobbysh.me" + use_path_style = true + + skip_credentials_validation = true + skip_metadata_api_check = true + skip_region_validation = true + skip_requesting_account_id = true + } +} +``` + +### State Locking (Critical for Team Use) + +Add a DynamoDB-compatible table or use a native locking mechanism. If S3 backend does not support locking, wrap `terraform apply` in a CI pipeline that serializes runs. + +--- + +## 9. Operational Workflow + +### Day 0 — Bootstrap + +```bash +# 1. Clone repo +git clone ssh://git@100.99.123.16:2222/Iron-Legion/terraform-proxmox-lxc.git +cd terraform-proxmox-lxc/environments/dev + +# 2. Set credentials +export PROXMOX_VE_PASSWORD="your-pve-password" +# OR for API token: +export PROXMOX_VE_API_TOKEN="root@pam!mytoken=abc123" + +# 3. Initialize +terraform init + +# 4. Plan +terraform plan -out=tfplan + +# 5. Apply +terraform apply tfplan +``` + +### Day N — Add a Container + +1. Add entry to `lxc_configs` map in `environments/dev/main.tf` +2. `terraform plan` — review VM ID collision, IP conflict, storage capacity +3. `terraform apply` +4. Verify: `ssh root@` + +### Day N — Destroy a Container + +1. Remove entry from `lxc_configs` map +2. `terraform apply` — resource destroyed +3. Or: `terraform destroy -target='module.dev_lxcs.proxmox_virtual_environment_container.lxc["dev-nextcloud"]'` + +--- + +## 10. Risks & Mitigations + +| Risk | Likelihood | Impact | Mitigation | +|------|------------|--------|------------| +| VM ID collision | Medium | High | Maintain a fleet-wide VM ID registry; use `proxmox_virtual_environment_vms` data source to check | +| IP overlap with DHCP pool | Medium | High | Reserve static IPs in Technitium DNS; use `dns` data source to verify | +| Template download fails (slow mirror) | Low | Medium | Pre-seed templates on PVE nodes; use `pvesm` to verify before `apply` | +| State file corruption | Low | Critical | S3 versioning + periodic `terraform state pull` backups | +| Privilege escalation via privileged LXC | Low | High | Default `unprivileged = true`; explicit override required | +| Provider breaking change | Medium | Medium | Pin provider version `~> 0.108`; test upgrades in dev environment first | + +--- + +## 11. Open Questions + +1. **Do we pre-create cloud-image templates on each PVE node, or let Terraform download per-node?** + - Per-node: slower first deploy, but self-contained + - Pre-seeded: faster, requires manual `pvesm` or Ansible step + +2. **Should LXCs register themselves in Technitium DNS via Terraform, or rely on DHCP + DNS integration?** + - Terraform can call a `dns_a_record` module (if Technitium provider exists) + - Or: use PVE's built-in DHCP + DNSMASQ if configured + +3. **CI/CD pipeline: GitHub Actions runner, or local Gitea Actions on Neo?** + - Gitea Actions keeps secrets in-network + - GitHub Actions requires Tailscale funnel or external exposure + +4. **Do we want a dedicated LXC "Terraform runner" inside the cluster, or run from Artemis/operator workstation?** + - In-cluster runner: always has LAN access to PVE API + - External: requires Tailscale or VPN for API reachability + +--- + +## 12. Appendix + +### A. Provider Documentation Links + +- **Registry:** https://registry.terraform.io/providers/bpg/proxmox/latest +- **GitHub:** https://github.com/bpg/terraform-provider-proxmox +- **LXC Resource Docs:** https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_container +- **Download File Resource:** https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_download_file + +### B. Useful PVE CLI Commands (for verification) + +```bash +# List containers on a node +pct list + +# List templates +pvesm list local --content vztmpl + +# Check datastore usage +pvesm status + +# Enter a container +pct enter +``` + +### C. Terraform Commands Reference + +```bash +terraform init # Download providers, configure backend +terraform validate # Syntax check +terraform plan # Preview changes +terraform apply # Execute changes +terraform destroy # Tear down everything +terraform state list # Show managed resources +terraform state show # Show one resource's attributes +terraform output # Display output values +terraform fmt -recursive # Format all .tf files +``` + +--- + +*End of PRD. Ready for Commander Bobby review and approval.*