Files
documentation/PRD Drafts/pve-three-node-ha-cluster.md

4.8 KiB
Raw Blame History

PVE 3-Node HA Cluster for Iron Legion

Status: Draft | Author: Artemis | Date: 2026-06-04

1. Objective

Configure MK33, MK34, and MK39 as a Proxmox VE 3-node cluster with shared NFS storage from TrueNAS. Enable manual live migration of VMs/LXCs between nodes, and optionally automatic HA failover for critical workloads.

2. Current State

Node CPU RAM Storage Role
MK33 (Silver Centurion) Intel N150 4c/4t 16GB Local SSD PVE HA
MK34 (Southpaw) Intel N150 4c/4t 16GB Local SSD PVE HA
MK39 (Gemini) Intel N150 4c/4t 16GB Local SSD PVE HA (spare)
TrueNAS SCALE 4c 11GB HDD pool NFS server

All nodes on 192.168.0.0/18. TrueNAS at 192.168.16.254.

3. Architecture

3.1 Cluster Model: Proxmox 3-Node Cluster (No Ceph)

MK33 (192.168.7.33) ──┐
                       ├─ Corosync Ring ── Shared NFS (TrueNAS)
MK34 (192.168.7.34) ──┤
                       │
MK39 (192.168.7.39) ──┘
  • Quorum: 3-node cluster = 2 votes needed for quorum. If one node dies, remaining 2 form quorum.
  • Shared storage: TrueNAS NFSv4.2 export /mnt/Ice/Backup
  • HA manager: Proxmox HA services (pve-ha-crm, pve-ha-lrm) for automatic restart

3.2 Storage Flow

Build on local disk → Test workload → Shutdown → Move disk to NFS → Restart on NFS
                                          ↓
If node fails: HA manager detects → Restarts VM/LXC on surviving node (same NFS disk)

3.3 Workload Planning

Type Count per node Resources each
VM (general) 1 4 vCPU, 4096 MB RAM
LXC (lightweight) 510 1 vCPU, 5121024 MB RAM

Total per node estimated: 914 vCPUs (but N100 is 4c/4t — LXCs share cores opportunistically via cgroups) Total RAM per node: VM 4GB + 5×1GB LXCs = ~9GB allocated, 7GB headroom

4. Pros vs Cons

Pros:

  • Unified web UI for all 3 nodes from any one node
  • Live migration of VMs/LXCs between nodes (zero downtime)
  • Automatic HA failover for critical VMs/LXCs
  • Quorum maintained with 2 of 3 nodes online
  • Shared NFS storage means VMs are portable across nodes

Cons:

  • Corosync ring traffic adds minor network overhead
  • If 2 nodes fail simultaneously, quorum lost, cluster stops
  • HA failover is restart (brief downtime), not live migration
  • N100 CPU is modest — 3 VMs + 15 LXCs across cluster is tight but workable

4.2 Standalone Nodes (Current)

Pros:

  • Simple, no cluster complexity
  • Node failure doesn't affect others
  • No Corosync network overhead

Cons:

  • No live migration — moving a VM requires export/import
  • No automatic failover — manual intervention if node dies
  • 3 separate web UIs to manage

5. Implementation Plan

Phase 1: Cluster Formation

  1. Add all 3 nodes to /etc/hosts on each node (or DNS via Technitium)
  2. On MK33: pvecm create iron-legion
  3. On MK34/MK39: pvecm add 192.168.7.33
  4. Verify: pvecm status shows 3 nodes, quorum 2/3

Phase 2: NFS Storage Setup

  1. Ensure TrueNAS exports /mnt/Ice/Backup with:
    • NFSv4.2
    • maproot or mapall to root (PVE nodes need root access)
    • ACL allows 192.168.0.0/18
  2. On PVE Datacenter → Storage → Add → NFS:
    • ID: truenas-backup
    • Server: 192.168.16.254
    • Export: /mnt/Ice/Backup
    • Content: images,rootdir
  3. Verify storage shows on all 3 nodes

Phase 3: HA Configuration

  1. Proxmox HA → Add groups:
    • critical: nodes mk33,mk34,mk39 (any node)
    • local-only: single-node constraint for local-disk VMs
  2. For each VM/LXC on NFS storage:
    • Datacenter → HA → Add → Select VM → Group critical → Start on any
  3. Start fencing daemon if IPMI/ watchdog available (optional for N100)

Phase 4: Workload Migration Testing

  1. Build a test LXC on local storage
  2. Migrate disk to NFS: Move disk → target truenas-backup
  3. Verify LXC starts from NFS
  4. Test live migration: right-click → Migrate → select target node
  5. Test HA failover: power off source node, verify restart on surviving node

6. Open Questions

  1. Do we need HA fencing? (IPMI not available on N100 — watchdog only)
  2. Should we reserve one node as "management" and only run LXCs on two?
  3. What's the Tailscale story — do we bind Corosync to LAN only or also Tailscale?

7. Decision Points

Decision Option A Option B
Cluster type 3-node with quorum (recommended) 2-node + witness (not recommended)
HA level Manual migration only Full HA with auto-restart
Storage NFS only (current) Add local Ceph later
Resource reserve 1 node mostly idle Distribute evenly

Awaiting Commander Bobby review and approval.