Files
documentation/PRD Drafts/pve-three-node-ha-cluster.md

139 lines
4.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PVE 3-Node HA Cluster for Iron Legion
**Status:** Draft | **Author:** Artemis | **Date:** 2026-06-04
## 1. Objective
Configure MK33, MK34, and MK39 as a Proxmox VE 3-node cluster with shared NFS storage from TrueNAS. Enable manual live migration of VMs/LXCs between nodes, and optionally automatic HA failover for critical workloads.
## 2. Current State
| Node | CPU | RAM | Storage | Role |
|------|-----|-----|---------|------|
| MK33 (Silver Centurion) | Intel N150 4c/4t | 16GB | Local SSD | PVE HA |
| MK34 (Southpaw) | Intel N150 4c/4t | 16GB | Local SSD | PVE HA |
| MK39 (Gemini) | Intel N150 4c/4t | 16GB | Local SSD | PVE HA (spare)
| TrueNAS SCALE | 4c | 11GB | HDD pool | NFS server |
All nodes on `192.168.0.0/18`. TrueNAS at `192.168.16.254`.
## 3. Architecture
### 3.1 Cluster Model: Proxmox 3-Node Cluster (No Ceph)
```
MK33 (192.168.7.33) ──┐
├─ Corosync Ring ── Shared NFS (TrueNAS)
MK34 (192.168.7.34) ──┤
MK39 (192.168.7.39) ──┘
```
- **Quorum:** 3-node cluster = 2 votes needed for quorum. If one node dies, remaining 2 form quorum.
- **Shared storage:** TrueNAS NFSv4.2 export `/mnt/Ice/Backup`
- **HA manager:** Proxmox HA services (`pve-ha-crm`, `pve-ha-lrm`) for automatic restart
### 3.2 Storage Flow
```
Build on local disk → Test workload → Shutdown → Move disk to NFS → Restart on NFS
If node fails: HA manager detects → Restarts VM/LXC on surviving node (same NFS disk)
```
### 3.3 Workload Planning
| Type | Count per node | Resources each |
|------|---------------|----------------|
| VM (general) | 1 | 4 vCPU, 4096 MB RAM |
| LXC (lightweight) | 510 | 1 vCPU, 5121024 MB RAM |
**Total per node estimated:** 914 vCPUs (but N100 is 4c/4t — LXCs share cores opportunistically via cgroups)
**Total RAM per node:** VM 4GB + 5×1GB LXCs = ~9GB allocated, 7GB headroom
## 4. Pros vs Cons
### 4.1 3-Node Cluster (Recommended)
**Pros:**
- Unified web UI for all 3 nodes from any one node
- Live migration of VMs/LXCs between nodes (zero downtime)
- Automatic HA failover for critical VMs/LXCs
- Quorum maintained with 2 of 3 nodes online
- Shared NFS storage means VMs are portable across nodes
**Cons:**
- Corosync ring traffic adds minor network overhead
- If 2 nodes fail simultaneously, quorum lost, cluster stops
- HA failover is restart (brief downtime), not live migration
- N100 CPU is modest — 3 VMs + 15 LXCs across cluster is tight but workable
### 4.2 Standalone Nodes (Current)
**Pros:**
- Simple, no cluster complexity
- Node failure doesn't affect others
- No Corosync network overhead
**Cons:**
- No live migration — moving a VM requires export/import
- No automatic failover — manual intervention if node dies
- 3 separate web UIs to manage
## 5. Implementation Plan
### Phase 1: Cluster Formation
1. Add all 3 nodes to `/etc/hosts` on each node (or DNS via Technitium)
2. On MK33: `pvecm create iron-legion`
3. On MK34/MK39: `pvecm add 192.168.7.33`
4. Verify: `pvecm status` shows 3 nodes, quorum 2/3
### Phase 2: NFS Storage Setup
1. Ensure TrueNAS exports `/mnt/Ice/Backup` with:
- NFSv4.2
- `maproot` or `mapall` to `root` (PVE nodes need root access)
- ACL allows `192.168.0.0/18`
2. On PVE Datacenter → Storage → Add → NFS:
- ID: `truenas-backup`
- Server: `192.168.16.254`
- Export: `/mnt/Ice/Backup`
- Content: `images,rootdir`
3. Verify storage shows on all 3 nodes
### Phase 3: HA Configuration
1. Proxmox HA → Add groups:
- `critical`: nodes mk33,mk34,mk39 (any node)
- `local-only`: single-node constraint for local-disk VMs
2. For each VM/LXC on NFS storage:
- Datacenter → HA → Add → Select VM → Group `critical` → Start on any
3. Start fencing daemon if IPMI/ watchdog available (optional for N100)
### Phase 4: Workload Migration Testing
1. Build a test LXC on local storage
2. Migrate disk to NFS: `Move disk` → target `truenas-backup`
3. Verify LXC starts from NFS
4. Test live migration: right-click → Migrate → select target node
5. Test HA failover: power off source node, verify restart on surviving node
## 6. Open Questions
1. Do we need HA fencing? (IPMI not available on N100 — watchdog only)
2. Should we reserve one node as "management" and only run LXCs on two?
3. What's the Tailscale story — do we bind Corosync to LAN only or also Tailscale?
## 7. Decision Points
| Decision | Option A | Option B |
|----------|----------|----------|
| Cluster type | 3-node with quorum (recommended) | 2-node + witness (not recommended) |
| HA level | Manual migration only | Full HA with auto-restart |
| Storage | NFS only (current) | Add local Ceph later |
| Resource reserve | 1 node mostly idle | Distribute evenly |
---
**Awaiting Commander Bobby review and approval.**