Files
documentation/PRD Drafts/fleet-user-standard.md
2026-06-03 09:30:16 -04:00

133 lines
4.9 KiB
Markdown

# Fleet User Standard PRD
**Status:** Draft — Pending Commander Bobby Review
**Author:** Artemis
**Date:** 2026-06-03
---
## 1. Purpose & Scope
This PRD defines the **canonical user account standard** for all Iron Legion fleet nodes. It eliminates UID/GID mismatches that cause permission failures in bind-mounted containers (VS Code: Server, Paperclip, etc.) and ensures every node behaves identically for automation.
**In scope:**
- Canonical user `jarvis` — UID/GID, groups, home directory
- Container `PUID`/`PGID` mapping rules
- Provisioning enforcement (MAAS autoinstall, Ansible, manual install)
- Migration path for non-compliant nodes (MK7, Nebuchadnezzar)
**Out of scope:**
- Service-specific runtime users inside containers
- TrueNAS / external appliance user models (already documented separately)
---
## 2. Success Criteria
| # | Criterion | How Verified |
|---|-----------|-------------|
| 1 | Every fleet node has `jarvis` at UID 1000 / GID 1000 | `id jarvis` returns `uid=1000` |
| 2 | No node has a competing UID 1000 user (e.g. "ubuntu") | `awk -F: '$3==1000 {print $1}' /etc/passwd` returns only "jarvis" |
| 3 | Container compose files use `PUID=1000` / `PGID=1000` without node-specific overrides | `grep -r 'PUID' /opt/iron-legion/docker-swarm/` |
| 4 | MAAS/cloud-init autoinstall scripts create jarvis FIRST at UID 1000 | Inspect autoinstall user-data |
| 5 | Nebuchadnezzar + MK7 migrated to compliant state | Re-run audit script |
---
## 3. The Standard
### 3.1 Canonical User: `jarvis`
```yaml
username: jarvis
uid: 1000
gid: 1000
home: /home/jarvis
shell: /bin/bash
groups: [sudo, docker] # node-local groups added post-provision
ssh_key_source: ~/.ssh/artemis_key.pub # deployed at provision time
```
### 3.2 Container Mapping Rule
All LinuxServer.io and similar images MUST use:
```yaml
environment:
- PUID=1000
- PGID=1000
```
**No exceptions.** If a node cannot satisfy this, the node is non-compliant and must be migrated — not the compose.
### 3.3 Provisioning Enforcement
| Provision Method | Enforcement |
|----------------|-------------|
| **Manual install** | `useradd -m -u 1000 -s /bin/bash jarvis` before any other human user |
| **MAAS autoinstall** | Subiquity `identity` section MUST target `jarvis:1000` **before** cloud-init creates "ubuntu" |
| **Ansible playbook** | `ansible.builtin.user:` with `uid: 1000`, `name: jarvis` |
| **Docker host (Nebuchadnezzar)** | Base image or `useradd` in Dockerfile prior to app user creation |
---
## 4. Fleet Audit Results (Current State)
| Node | jarvis UID | Competing UID 1000 | Status |
|------|-----------|-------------------|--------|
| artemis | 1000 | None | ✅ Compliant |
| mark44 | 1000 | None | ✅ Compliant |
| mark5 | 1000 | None | ✅ Compliant |
| mk42 | 1000 | None | ✅ Compliant |
| shield | 1000 | None | ✅ Compliant |
| igor | 1000 | None | ✅ Compliant |
| truenas | 1000 | None | ✅ Compliant |
| **mk7** | **1001** | **ubuntu 1000** | ⚠️ **Non-compliant** |
| **nebuchadnezzar** | **1002** | **ubuntu 1000, caddy 1001** | ⚠️ **Non-compliant** |
**Root cause:** MK7 and Nebuchadnezzar were provisioned via cloud-init/MAAS, which created "ubuntu" at UID 1000 before jarvis was added. All manually-built nodes are clean.
---
## 5. Remediation Plan
### 5.1 MK7
1. Remove or reassign `ubuntu` user (UID 1000 → 65534 or delete)
2. Change `jarvis` UID from 1001 → 1000
3. `chown -R jarvis:jarvis /home/jarvis`
4. Update VS Code: Server container ownership: `chown -R jarvis:jarvis /home/jarvis/.vscode-ssh`
5. Verify compose still works with `PUID=1000`
### 5.2 Nebuchadnezzar
1. Remove or reassign `ubuntu` user
2. Remove or reassign `caddy` user (or shift to UID > 2000)
3. Change `jarvis` UID from 1002 → 1000
4. `chown -R jarvis:jarvis /home/jarvis`
5. Audit any container bind mounts for ownership drift
---
## 6. Open Questions
1. **Should we document this in the MAAS curtin preseed** so new PXE-built nodes are auto-compliant?
2. **Should we add a fleet-wide Ansible user-enforcement task** that fails the playbook if UID 1000 ≠ jarvis?
3. **Is TrueNAS user model** (jarvis=1000, jumpbox=3000, bobby=3001) the exception we keep, or do we align TrueNAS too?
---
## 7. Gitea Branch Protection Setup (For Draft → Canon Workflow)
To enforce peer review for PRDs and all documentation:
1. **Gitea UI** → Iron-Legion/documentation → Settings → Branches → `main`**Add Protection Rule**
2. Enable:
-**Enable branch protection**
-**Require pull request reviews** → Minimum approvers: **1**
-**Dismiss stale approvals when new commits are pushed**
-**Block merge if required reviewers not approved**
3. This forces every PR to have at least one human review before merge.
Once enabled:
- Draft PRDs go to `PRD Drafts/` via fork + PR
- Approved PRDs get moved to `PRDs/` (canonical) in the approval commit
- All operational docs follow the same fork → PR → review → merge flow