Terraform LXC PRD: promote validated draft to PRDs, archive stale F.R.I.D.A.Y. draft

- terraform-lxc-deployment.md → PRDs/ (validated, tested, canonical)
- terraform-proxmox-lxc-automation.md → ARCHIVED- (superseded by live POC)
- Matches Phase 1 POC results from terraform-pve repo
This commit is contained in:
F.R.I.D.A.Y.
2026-06-04 22:58:19 -04:00
parent bc8d7c8449
commit c1bb49d51a
2 changed files with 156 additions and 0 deletions

View File

@@ -0,0 +1,635 @@
# PRD: Terraform LXC Automation for Proxmox VE 9.2
**Status:** Draft — Pending Commander Bobby Review
**Author:** F.R.I.D.A.Y.
**Date:** 2026-06-01
**Provider:** `bpg/proxmox` (actively maintained, 11M+ downloads)
**Target:** Proxmox VE 9.2 / Debian Trixie
---
## 1. Purpose & Scope
This PRD defines the architecture, configuration patterns, and operational workflow for automating LXC container lifecycle management on Proxmox VE 9.2 clusters using Terraform and the actively maintained `bpg/proxmox` provider.
**In scope:**
- Terraform provider configuration and authentication
- LXC resource definitions (`proxmox_virtual_environment_container`)
- Cloud-init / template-based provisioning
- Network configuration (static IP, DHCP, bridge)
- Storage allocation (rootfs on any PVE backend)
- State management and CI/CD integration patterns
**Out of scope:**
- VM (QEMU/KVM) provisioning
- PVE cluster topology changes
- Backup/restore automation (separate PRD)
---
## 2. Success Criteria
| # | Criterion | How Verified |
|---|-----------|-------------|
| 1 | A single `terraform apply` creates a working LXC with SSH access | `ssh root@<lxc-ip>` succeeds |
| 2 | LXCs are provisioned from official cloud-image templates | Template downloaded via `proxmox_virtual_environment_download_file` |
| 3 | Network is configurable per-LXC (DHCP or static CIDR) | `ip addr` inside container matches TF config |
| 4 | Rootfs lives on user-selected storage (not hardcoded to `local-lvm`) | `pvesm status` shows volume on target datastore |
| 5 | State is stored remotely (S3-compatible or Terraform Cloud) | `terraform state list` works from any machine |
| 6 | Destroy and recreate is idempotent | `terraform destroy && terraform apply` yields identical result |
---
## 3. Provider Selection
### Why `bpg/proxmox` (not `telmate/proxmox`)
| Provider | Maintenance | Downloads | LXC Support | Notes |
|----------|-------------|-----------|-------------|-------|
| `bpg/proxmox` | ✅ Active (v0.108.0, June 2026) | 11.8M+ | Full | Community-tier, comprehensive docs, supports PVE 9.x |
| `telmate/proxmox` | ❌ Stale (last release ~2023) | Legacy | Partial | Deprecated; lacks PVE 9.x features |
**Decision:** Use `bpg/proxmox` exclusively. The `telmate` provider is unmaintained and incompatible with PVE 9.2 API changes.
**Provider block (minimum):**
```hcl
terraform {
required_providers {
proxmox = {
source = "bpg/proxmox"
version = "~> 0.108"
}
}
}
provider "proxmox" {
endpoint = "https://192.168.7.33:8006/"
username = "root@pam"
password = var.proxmox_password # or PROXMOX_VE_PASSWORD env var
insecure = true # self-signed TLS
}
```
---
## 4. Authentication Matrix
| Method | Use Case | Config | Security |
|--------|----------|--------|----------|
| **API Token** | Production, CI/CD | `api_token = "root@pam!mytoken=abc123…"` | Highest — revocable, fine-grained |
| **Username/Password** | Development, one-offs | `username = "root@pam"`, `password = "…"` | Medium — password in env |
| **Auth Ticket** | TOTP-enabled accounts | Pre-authenticate, pass ticket | High — short-lived |
**Recommendation for Iron Legion:**
- **Development:** Use `PROXMOX_VE_PASSWORD` environment variable
- **CI/CD (future):** Create a PVE API token with `PVEFarmAdmin` or custom role, store in CI secrets
---
## 5. Sample Project Structure
```
terraform-proxmox-lxc/
├── README.md
├── main.tf # Provider + backend config
├── variables.tf # Input variables
├── terraform.tfvars.example # Sample values (gitignored)
├── outputs.tf # Useful outputs (IPs, IDs)
├── versions.tf # Required providers + TF version
├── modules/
│ └── lxc/
│ ├── main.tf # proxmox_virtual_environment_container resource
│ ├── variables.tf # Module inputs
│ └── outputs.tf # Module outputs
├── environments/
│ ├── dev/
│ │ ├── main.tf # Calls modules with dev vars
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ └── terraform.tfvars
└── templates/
└── ubuntu-25.04-cloudimg.yaml # Cloud-init user-data (optional)
```
### Key Files
#### `versions.tf`
```hcl
terraform {
required_version = ">= 1.5.0"
required_providers {
proxmox = {
source = "bpg/proxmox"
version = "~> 0.108"
}
random = {
source = "hashicorp/random"
version = "~> 3.6"
}
tls = {
source = "hashicorp/tls"
version = "~> 4.0"
}
}
# Remote state — S3-compatible (Minio, Garage, AWS S3)
backend "s3" {
bucket = "iron-legion-terraform"
key = "proxmox-lxc/terraform.tfstate"
region = "us-east-1"
endpoint = "https://s3.nb.bobbysh.me"
use_path_style = true
# Skip AWS-specific validations for self-hosted S3
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
skip_requesting_account_id = true
}
}
```
#### `variables.tf`
```hcl
variable "proxmox_endpoint" {
description = "PVE API URL"
type = string
default = "https://192.168.7.33:8006/"
}
variable "proxmox_node" {
description = "Target PVE node name"
type = string
default = "mk33"
}
variable "ssh_public_key" {
description = "SSH public key for root access"
type = string
}
variable "lxc_configs" {
description = "Map of LXC configurations"
type = map(object({
vm_id = number
hostname = string
cores = optional(number, 2)
memory = optional(number, 2048)
disk_size = optional(number, 8)
datastore_id = optional(string, "local-lvm")
ip_address = optional(string, "dhcp")
gateway = optional(string, null)
template_url = optional(string, "https://mirrors.servercentral.com/ubuntu-cloud-images/releases/25.04/release/ubuntu-25.04-server-cloudimg-amd64-root.tar.xz")
features = optional(object({
nesting = optional(bool, true)
fuse = optional(bool, false)
keyctl = optional(bool, false)
}), {})
}))
}
```
#### `modules/lxc/main.tf`
```hcl
resource "proxmox_virtual_environment_download_file" "lxc_template" {
for_each = var.lxc_configs
content_type = "vztmpl"
datastore_id = "local"
node_name = var.proxmox_node
url = each.value.template_url
file_name = "${each.key}-template.tar.xz"
overwrite = false
}
resource "proxmox_virtual_environment_container" "lxc" {
for_each = var.lxc_configs
node_name = var.proxmox_node
vm_id = each.value.vm_id
description = "Managed by Terraform — ${each.key}"
unprivileged = true
features {
nesting = each.value.features.nesting
fuse = each.value.features.fuse
keyctl = each.value.features.keyctl
}
cpu {
cores = each.value.cores
units = 1024
}
memory {
dedicated = each.value.memory
swap = 0
}
disk {
datastore_id = each.value.datastore_id
size = each.value.disk_size
}
initialization {
hostname = each.value.hostname
ip_config {
ipv4 {
address = each.value.ip_address
gateway = each.value.gateway
}
}
user_account {
keys = [var.ssh_public_key]
password = random_password.lxc_root[each.key].result
}
}
network_interface {
name = "veth0"
bridge = "vmbr0"
}
operating_system {
template_file_id = proxmox_virtual_environment_download_file.lxc_template[each.key].id
type = "ubuntu"
}
startup {
order = "3"
up_delay = "60"
down_delay = "60"
}
depends_on = [proxmox_virtual_environment_download_file.lxc_template]
}
resource "random_password" "lxc_root" {
for_each = var.lxc_configs
length = 16
special = true
override_special = "_%@"
}
```
#### `modules/lxc/variables.tf`
```hcl
variable "proxmox_node" {
type = string
}
variable "ssh_public_key" {
type = string
}
variable "lxc_configs" {
type = map(object({
vm_id = number
hostname = string
cores = optional(number, 2)
memory = optional(number, 2048)
disk_size = optional(number, 8)
datastore_id = optional(string, "local-lvm")
ip_address = optional(string, "dhcp")
gateway = optional(string, null)
template_url = optional(string)
features = optional(object({
nesting = optional(bool, true)
fuse = optional(bool, false)
keyctl = optional(bool, false)
}), {})
}))
}
```
#### `modules/lxc/outputs.tf`
```hcl
output "lxc_ids" {
description = "Map of LXC names to VM IDs"
value = { for k, v in proxmox_virtual_environment_container.lxc : k => v.vm_id }
}
output "lxc_ips" {
description = "Map of LXC names to IPv4 addresses"
value = { for k, v in proxmox_virtual_environment_container.lxc : k => v.ipv4 }
}
output "lxc_passwords" {
description = "Map of LXC names to root passwords (sensitive)"
value = { for k, v in random_password.lxc_root : k => v.result }
sensitive = true
}
```
#### `environments/dev/main.tf`
```hcl
module "dev_lxcs" {
source = "../../modules/lxc"
proxxmox_node = "mk33"
ssh_public_key = file("~/.ssh/id_ed25519.pub")
lxc_configs = {
"dev-nextcloud" = {
vm_id = 2100
hostname = "dev-nextcloud"
cores = 4
memory = 4096
disk_size = 16
datastore_id = "local-zfs"
ip_address = "192.168.7.100/24"
gateway = "192.168.7.1"
}
"dev-vaultwarden" = {
vm_id = 2101
hostname = "dev-vaultwarden"
cores = 2
memory = 2048
disk_size = 8
datastore_id = "local-zfs"
ip_address = "192.168.7.101/24"
gateway = "192.168.7.1"
}
}
}
```
---
## 6. Resource Reference — `proxmox_virtual_environment_container`
### Critical Arguments
| Block | Key | Required | Default | Description |
|-------|-----|----------|---------|-------------|
| — | `node_name` | ✅ | — | PVE node to create on |
| — | `vm_id` | ✅ | — | Unique numeric ID (100999999999) |
| — | `unprivileged` | ❌ | `true` | Run as unprivileged container |
| `features` | `nesting` | ❌ | `false` | Enable nested containers (needed for Docker-in-LXC) |
| `features` | `fuse` | ❌ | `false` | Enable FUSE mounts |
| `cpu` | `cores` | ❌ | `1` | vCPU cores |
| `memory` | `dedicated` | ❌ | `512` | RAM in MB |
| `disk` | `datastore_id` | ❌ | `local` | Storage pool for rootfs |
| `disk` | `size` | ❌ | `4` | Rootfs size in GB |
| `initialization` | `hostname` | ✅ | — | DNS-compatible hostname |
| `initialization.ip_config.ipv4` | `address` | ✅ | — | CIDR or `dhcp` |
| `initialization.ip_config.ipv4` | `gateway` | ❌ | — | Required for static IP |
| `initialization.user_account` | `keys` | ❌ | — | SSH authorized_keys |
| `network_interface` | `name` | ✅ | — | `veth0` |
| `network_interface` | `bridge` | ❌ | `vmbr0` | Bridge to attach |
| `operating_system` | `template_file_id` | ✅ | — | Downloaded template or `local:vztmpl/…` |
| `operating_system` | `type` | ❌ | `unmanaged` | `ubuntu`, `debian`, `alpine`, etc. |
### Important Notes
- **Template download** uses `proxmox_virtual_environment_download_file` — caches template per-node, avoids re-download
- **Cloud-init** is embedded in the `initialization` block — no separate cloud-init drive needed for LXC
- **Nesting = true** is required for any LXC running Docker or systemd-nspawn
- **Datastore** is backend-agnostic: `local-lvm`, `local-zfs`, `tank-zfs`, `ceph-rbd`, NFS, etc. all work
---
## 7. Data Sources
Use data sources to query existing infrastructure without managing it:
```hcl
data "proxmox_virtual_environment_datastores" "available" {
node_name = var.proxmox_node
}
data "proxmox_virtual_environment_nodes" "cluster" {}
data "proxmox_virtual_environment_container" "existing" {
node_name = var.proxmox_node # or specify target node explicitly
vm_id = 2001
}
```
**Common use cases:**
- Validate a datastore exists before creating a disk
- Read an existing LXCs IP to populate a DNS record (Technitium)
- List nodes for multi-node placement logic
---
## 8. State Management
### Recommended: S3-Compatible Backend
Iron Legion already runs self-hosted services. A Garage or Minio instance on a fleet storage node (e.g., Neo) can serve as the Terraform state backend:
```hcl
terraform {
backend "s3" {
bucket = "iron-legion-terraform"
key = "proxmox-lxc/dev.tfstate"
region = "us-east-1"
endpoint = "https://s3.nb.bobbysh.me"
use_path_style = true
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
skip_requesting_account_id = true
}
}
```
### State Locking (Critical for Team Use)
Add a DynamoDB-compatible table or use a native locking mechanism. If S3 backend does not support locking, wrap `terraform apply` in a CI pipeline that serializes runs.
---
## Optional: Atlantis Web UI for Terraform PR Automation
### What Atlantis Is
Atlantis is a self-hosted web application that listens for webhook events from Git repositories and runs `terraform plan` / `terraform apply` automatically inside PR/MR workflows. It posts plan output back to the PR as comments, enforces approval gates, and locks workspaces to prevent concurrent applies.
### Can Atlantis Manage LXC Resources via `bpg/proxmox`?
**Yes.** Atlantis is a Terraform orchestration layer, not a provider. It supports any Terraform provider including `bpg/proxmox`. The workflow is:
1. Developer opens a PR adding/modifying `.tf` files defining LXC containers
2. Atlantis receives the webhook and runs `terraform plan` in a isolated directory
3. Plan output posted as a PR comment — team reviews before approval
4. After approval (or `atlantis apply` comment), Atlantis runs `terraform apply`
### Atlantis Docker Compose (Self-Hosted)
```yaml
services:
atlantis:
image: ghcr.io/runatlantis/atlantis:latest
ports:
- "4141:4141"
volumes:
- ${HOME}/.ssh:/home/atlantis/.ssh:ro # Git SSH key
- /var/run/docker.sock:/var/run/docker.sock:ro # if using Docker TF provider
- atlantis-data:/home/atlantis/.atlantis
environment:
ATLANTIS_GH_USER: "iron-legion-bot" # or ATLANTIS_GITLAB_USER / ATLANTIS_GITEA_USER
ATLANTIS_GH_TOKEN: "${ATLANTIS_GH_TOKEN}" # personal access token
ATLANTIS_REPO_ALLOWLIST: "github.com/Iron-Legion/*"
ATLANTIS_GH_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
# For Gitea:
# ATLANTIS_GITEA_USER: "iron-legion-bot"
# ATLANTIS_GITEA_TOKEN: "${GITEA_TOKEN}"
# ATLANTIS_GITEA_WEBHOOK_SECRET: "${WEBHOOK_SECRET}"
command: server
restart: unless-stopped
# Optional: Redis for distributed locking in multi-replica setups
# redis:
# image: redis:8-alpine
# volumes:
# - redis-data:/data
# restart: always
volumes:
atlantis-data:
driver: local
```
### Key Features
- **Plan Comments:** Every PR gets an auto-generated `terraform plan` comment
- **Apply Locking:** One apply at a time per workspace; concurrent PRs queue
- **Policy Checks:** Integrate OPA (Open Policy Agent) or custom scripts to block non-compliant changes
- **Custom Workflows:** Define per-repo or per-directory workflows (e.g., plan-only for dev, auto-apply for staging)
- **Self-Hosted SCM:** Native webhook support for GitHub, GitLab, Bitbucket, **and Gitea**
### Resource Footprint
- Atlantis container: ~100200 MB RAM, minimal CPU
- Optional Redis: ~20 MB RAM
- Total: fits comfortably on any Iron Legion node (MK7, MK3342, Neo)
### Gitea Integration Notes
- Atlantis supports Gitea via the `--gitea-user`, `--gitea-token`, `--gitea-webhook-secret` flags
- Must expose Atlantis endpoint to Gitea (Tailscale funnel, reverse proxy, or LAN if Gitea is in-network)
- Webhook URL: `http://atlantis-host:4141/events`
---
## 9. Operational Workflow
### Day 0 — Bootstrap
```bash
# 1. Clone repo
git clone ssh://git@100.99.123.16:2222/Iron-Legion/terraform-proxmox-lxc.git
cd terraform-proxmox-lxc/environments/dev
# 2. Set credentials
export PROXMOX_VE_PASSWORD="your-pve-password"
# OR for API token:
export PROXMOX_VE_API_TOKEN="root@pam!mytoken=abc123"
# 3. Initialize
terraform init
# 4. Plan
terraform plan -out=tfplan
# 5. Apply
terraform apply tfplan
```
### Day N — Add a Container
1. Add entry to `lxc_configs` map in `environments/dev/main.tf`
2. `terraform plan` — review VM ID collision, IP conflict, storage capacity
3. `terraform apply`
4. Verify: `ssh root@<new-ip>`
### Day N — Destroy a Container
1. Remove entry from `lxc_configs` map
2. `terraform apply` — resource destroyed
3. Or: `terraform destroy -target='module.dev_lxcs.proxmox_virtual_environment_container.lxc["dev-nextcloud"]'`
---
## 10. Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| VM ID collision | Medium | High | Maintain a fleet-wide VM ID registry; use `proxmox_virtual_environment_vms` data source to check |
| IP overlap with DHCP pool | Medium | High | Reserve static IPs in Technitium DNS; use `dns` data source to verify |
| Template download fails (slow mirror) | Low | Medium | Pre-seed templates on PVE nodes; use `pvesm` to verify before `apply` |
| State file corruption | Low | Critical | S3 versioning + periodic `terraform state pull` backups |
| Privilege escalation via privileged LXC | Low | High | Default `unprivileged = true`; explicit override required |
| Provider breaking change | Medium | Medium | Pin provider version `~> 0.108`; test upgrades in dev environment first |
---
## 11. Open Questions
1. **Do we pre-create cloud-image templates on each PVE node, or let Terraform download per-node?**
- Per-node: slower first deploy, but self-contained
- Pre-seeded: faster, requires manual `pvesm` or Ansible step
2. **Should LXCs register themselves in Technitium DNS via Terraform, or rely on DHCP + DNS integration?**
- Terraform can call a `dns_a_record` module (if Technitium provider exists)
- Or: use PVE's built-in DHCP + DNSMASQ if configured
3. **CI/CD pipeline: GitHub Actions runner, or local Gitea Actions on the fleet SCM host?**
- Gitea Actions keeps secrets in-network
- GitHub Actions requires Tailscale funnel or external exposure
4. **Do we want a dedicated LXC "Terraform runner" inside the cluster, or run from Artemis/operator workstation?**
- In-cluster runner: always has LAN access to PVE API
- External: requires Tailscale or VPN for API reachability
---
## 12. Appendix
### A. Provider Documentation Links
- **Registry:** https://registry.terraform.io/providers/bpg/proxmox/latest
- **GitHub:** https://github.com/bpg/terraform-provider-proxmox
- **LXC Resource Docs:** https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_container
- **Download File Resource:** https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_download_file
### B. Useful PVE CLI Commands (for verification)
```bash
# List containers on a node
pct list
# List templates
pvesm list local --content vztmpl
# Check datastore usage
pvesm status
# Enter a container
pct enter <vm_id>
```
### C. Terraform Commands Reference
```bash
terraform init # Download providers, configure backend
terraform validate # Syntax check
terraform plan # Preview changes
terraform apply # Execute changes
terraform destroy # Tear down everything
terraform state list # Show managed resources
terraform state show <addr> # Show one resource's attributes
terraform output # Display output values
terraform fmt -recursive # Format all .tf files
```
---
*End of PRD. Ready for Commander Bobby review and approval.*