Add fleet operational reports
- mk7-service-restoration-report.md: Restored Swarm stacks after relocation, fixed NTP drift, rejoined MK-42 as worker - netbird-evaluation-report.md: Full evaluation of self-hosted Netbird control plane for tailscale coexistence/replacement Author: F.R.I.D.A.Y.
This commit is contained in:
149
reports/mk7-service-restoration-report.md
Normal file
149
reports/mk7-service-restoration-report.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# MK7 Service Restoration Report
|
||||
|
||||
**Date:** 2026-06-01
|
||||
**Author:** F.R.I.D.A.Y.
|
||||
**Status:** All services restored online
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
MK7 (Swarm Manager, 192.168.7.7) had all Docker Swarm stacks stopped after physical relocation. Only `pegaprox` stack remained running from a previous manual deployment. Primary services (Traefik, Technitium, Portainer, n8n, Homepage, Beszel, Dozzle, Authelia, Prometheus, node-exporter) were all offline.
|
||||
|
||||
---
|
||||
|
||||
## Root Causes
|
||||
|
||||
1. **Primary cause:** MK7 was physically relocated, Docker Swarm services were intentionally stopped during migration and never restarted.
|
||||
2. **Secondary cause (Authelia failure):** When services were redeployed, Authelia crashed due to NTP clock synchronization failure. `systemd-timesyncd` was pointing to stale NTP server `192.168.128.33` (Shield PXE DHCP drift), causing certificate validity checks to fail.
|
||||
3. **Network config drift:** `/etc/systemd/timesyncd.conf.d/` contained a cloud-init NTP config pointing to the wrong IP.
|
||||
|
||||
---
|
||||
|
||||
## Actions Taken
|
||||
|
||||
### Phase 1: Service Redeployment
|
||||
|
||||
Located compose files at `/opt/iron-legion/docker-swarm/` and individually deployed all stacks:
|
||||
|
||||
```bash
|
||||
# Deployed stacks
|
||||
docker stack deploy -c traefik/compose.yml traefik
|
||||
docker stack deploy -c portainer/compose.yml portainer
|
||||
docker stack deploy -c technitium/compose.yml technitium
|
||||
docker stack deploy -c homepage/compose.yml homepage
|
||||
docker stack deploy -c n8n/n8n-stack.yml n8n
|
||||
docker stack deploy -c beszel/compose.yml beszel
|
||||
docker stack deploy -c dozzle/compose.yml dozzle
|
||||
docker stack deploy -c authelia/compose.yml authelia
|
||||
docker stack deploy -c prometheus/compose.yml prometheus
|
||||
docker stack deploy -c node-exporter/compose.yml node-exporter
|
||||
```
|
||||
|
||||
All stacks converged successfully.
|
||||
|
||||
### Phase 2: NTP / Authelia Fix
|
||||
|
||||
**Problem identified:** Authelia container logs showed:
|
||||
```
|
||||
error="the system clock is not synchronized accurately enough with the configured NTP server" provider=ntp
|
||||
```
|
||||
|
||||
**Investigation:**
|
||||
```bash
|
||||
systemctl status systemd-timesyncd
|
||||
# Status: "Connecting to time server 192.168.128.33:123"
|
||||
```
|
||||
|
||||
**Fix applied:**
|
||||
```bash
|
||||
# Removed stale cloud-init NTP config
|
||||
rm -f /etc/systemd/timesyncd.conf.d/*.conf
|
||||
|
||||
# Reset timesyncd to default (uses pool.ntp.org fallbacks)
|
||||
echo '[Time]' | sudo tee /etc/systemd/timesyncd.conf
|
||||
sudo systemctl restart systemd-timesyncd
|
||||
|
||||
# Verified sync
|
||||
timedatectl status | grep "System clock synchronized: yes"
|
||||
```
|
||||
|
||||
**Result:** `System clock synchronized: yes` — Authelia restarted successfully.
|
||||
|
||||
### Phase 3: MK-42 Worker Node Reintegration
|
||||
|
||||
**Discovery:** MK-42 (192.168.0.196) was online and had Docker installed but Swarm was inactive.
|
||||
|
||||
**Action:**
|
||||
```bash
|
||||
# On MK-42
|
||||
ssh jarvis@192.168.0.196
|
||||
docker swarm leave --force # Not in swarm, just confirming
|
||||
docker swarm join --token SWMTKN-1-5po7nh34gige4jj7psqyc2pe8puf66yvpzvq3o4suy2kzqa5om-7tobwwhz2tvmo7wmg5yk7m5jd 192.168.7.7:2377
|
||||
```
|
||||
|
||||
**Result:** MK-42 joined Swarm as a worker node. Now available for workload scheduling.
|
||||
|
||||
---
|
||||
|
||||
## Final Service Status
|
||||
|
||||
| Stack | Service | Status | Replicas | Notes |
|
||||
|-------|---------|--------|----------|-------|
|
||||
| traefik | traefik | ✅ Running | 1/1 | Global mode on manager, healthy |
|
||||
| portainer | portainer | ✅ Running | 1/1 | Replicated on manager |
|
||||
| technitium | technitium | ✅ Running | 1/1 | Ports 53/5380 exposed (host mode) |
|
||||
| homepage | homepage | ✅ Running | 1/1 | Replicated on manager |
|
||||
| n8n | postgres | ✅ Running | 1/1 | Healthy |
|
||||
| n8n | pgadmin | ✅ Running | 1/1 | — |
|
||||
| n8n | n8n | ✅ Running | 1/1 | Healthy |
|
||||
| beszel | beszel-hub | ✅ Running | 1/1 | Port 8090 exposed |
|
||||
| dozzle | dozzle | ✅ Running | 1/1 | Port 8081 exposed |
|
||||
| authelia | authelia | ✅ Running | 1/1 | After NTP fix |
|
||||
| prometheus | prometheus | ✅ Running | 1/1 | — |
|
||||
| node-exporter | node-exporter | ✅ Running | 1/1 | Global mode |
|
||||
| pegaprox | pegaprox | ✅ Running | 1/1 | Already running (unchanged) |
|
||||
|
||||
**Swarm nodes:**
|
||||
| ID | Hostname | Status | Availability | Manager |
|
||||
|----|----------|--------|--------------|---------|
|
||||
| x6xr2s6... | mark-vii.ai.home | Ready | Active | Leader |
|
||||
| x46ce7y... | mk-42 | Ready | Active | — (Worker) |
|
||||
|
||||
---
|
||||
|
||||
## Health Checks Verified
|
||||
|
||||
```bash
|
||||
❯ curl -s http://localhost:8080/ping → OK (Traefik)
|
||||
❯ curl -s http://localhost:9000/api/status → {"Version":"2.39.2",...} (Portainer)
|
||||
❯ curl -s http://localhost:5380 → Technitium HTML (DNS UI)
|
||||
❯ curl -s http://localhost:8090 → Beszel HTML
|
||||
❯ curl -s http://localhost:5678/healthz → OK (n8n)
|
||||
❯ curl -s http://localhost:8081/api/health → OK (Dozzle)
|
||||
```
|
||||
|
||||
All services responding on expected ports.
|
||||
|
||||
---
|
||||
|
||||
## File Changes on MK7
|
||||
|
||||
| File | Action | Reason |
|
||||
|------|--------|--------|
|
||||
| `/etc/systemd/timesyncd.conf.d/*.conf` | Deleted | Stale cloud-init NTP config pointing to wrong IP |
|
||||
| `/etc/systemd/timesyncd.conf` | Reset to `[Time]` only | Restore default NTP behavior |
|
||||
| `/opt/iron-legion/docker-swarm/deploy.sh` | Modified | Removed reference to missing `adguard` stack (not deployed) |
|
||||
|
||||
---
|
||||
|
||||
## Notes for Future Operations
|
||||
|
||||
1. **NTP drift on relocated nodes:** Always verify `timedatectl status` after moving hardware. Cloud-init may inject stale NTP configs.
|
||||
2. **AdGuard removed:** The `deploy.sh` previously referenced an `adguard` stack that no longer exists (AdGuard was removed in favor of Technitium's built-in blocking). The script was updated to skip it.
|
||||
3. **MK-42 as Swarm worker:** MK-42 is now available for container scheduling but has not been labeled for specific workloads. If you want PVE services on it, consider deploying a VM first or using it as a bare Swarm worker.
|
||||
4. **No Tailscale on MK-42:** As requested, MK-42 joins via LAN IP only. No Tailscale client installed.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-06-01 by F.R.I.D.A.Y.*
|
||||
344
reports/netbird-evaluation-report.md
Normal file
344
reports/netbird-evaluation-report.md
Normal file
@@ -0,0 +1,344 @@
|
||||
# Netbird Self-Hosted Control Plane — Evaluation Report
|
||||
|
||||
**Author:** F.R.I.D.A.Y. ( Hermes Agent )
|
||||
**Date:** 2026-05-31
|
||||
**Status:** Draft — for Commander review before deployment
|
||||
**Scope:** Evaluate Netbird self-hosted control plane as a potential replacement or complement to Tailscale mesh networking for the Iron Legion fleet.
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Netbird is an open-source, WireGuard-based mesh VPN that provides peer-to-peer connectivity with a centralized management plane. As of v0.71.4 (May 2026), it now offers **two deployment models** for self-hosting:
|
||||
|
||||
1. **Quickstart (single-container, recommended for new deployments)** — Combined management + signal + relay in one `netbird-server` container with embedded Dex IdP. ~5-minute setup via `getting-started.sh` with built-in Traefik and automatic TLS.
|
||||
2. **Advanced (multi-container, legacy but supported)** — Separate services (management, signal, coturn, relay, dashboard) configured via `management.json` and `docker-compose.yml`.
|
||||
|
||||
**Key finding:** Netbird now supports running **behind an existing reverse proxy** (Traefik, Nginx, Caddy) as a first-class deployment option. This is significant for the Iron Legion because MK7 already runs Traefik for `*.ai.home` services — we can integrate Netbird without adding a new public-facing edge.
|
||||
|
||||
---
|
||||
|
||||
## What Netbird Offers (vs. Tailscale)
|
||||
|
||||
| Feature | Tailscale | Netbird |
|
||||
|---------|-----------|---------|
|
||||
| Underlay protocol | WireGuard | WireGuard |
|
||||
| Control plane | Tailscale Co. cloud | **Self-hostable** |
|
||||
| NAT traversal | DERP relays (cloud-hosted) | Self-hosted Coturn + Relay |
|
||||
| Identity provider | Tailscale accounts / SSO via Auth0, etc. | **Embedded Dex** / Any OIDC IdP |
|
||||
| Network routes | ✅ | ✅ |
|
||||
| DNS split-brain | MagicDNS | Network-wide DNS |
|
||||
| Reverse proxy / funnel | Tailscale Funnel (public) | **Built-in reverse proxy via Netbird Proxy** |
|
||||
| Access controls | ACL policies | **Group + peer policies** |
|
||||
| Linux clients | ✅ | ✅ |
|
||||
| Windows | ✅ | ✅ |
|
||||
| Mobile (iOS/Android) | ✅ | ✅ |
|
||||
| Browser client | ❌ | ✅ |
|
||||
| Open-source | Client only | **Fully open-source** |
|
||||
|
||||
**For the Iron Legion:** The primary advantage of Netbird is **full ownership of the control plane**. Tailscale depends on Tailscale Inc. infrastructure for coordination and DERP relays; Netbird brings both under our control.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Quickstart (v0.29+, Recommended)
|
||||
|
||||
```
|
||||
[Public Internet]
|
||||
|
|
||||
+-- TCP 80/443 --> Traefik (built-in or external)
|
||||
| |
|
||||
| +-- Dashboard UI (web)
|
||||
| +-- Management API (gRPC over HTTPS)
|
||||
| +-- Signal (gRPC over HTTPS, HTTP/2 ALPN)
|
||||
| +-- Relay (WebSocket over HTTPS)
|
||||
|
|
||||
+-- UDP 3478 --> Coturn (STUN/TURN)
|
||||
|
|
||||
+-- UDP 49152-65535 --> TURN relay ports (legacy)
|
||||
```
|
||||
|
||||
**Combined server container** (`netbird-server`) consolidates:
|
||||
- Management Service — peer orchestration, ACLs, routes, DNS
|
||||
- Signal Service — WebRTC signaling for direct WireGuard connections
|
||||
- Relay Service — WebSocket relay for fallback when direct p2p fails
|
||||
- Embedded Dex — built-in identity provider (local users + external OIDC)
|
||||
- Dashboard — web management UI
|
||||
|
||||
**New in v0.29:** Management and Signal share port 443 via HTTP/2 ALPN. Previously required separate ports (33073 for management gRPC, 10000 for signal gRPC, 33080 for relay).
|
||||
|
||||
### Advanced (legacy multi-container)
|
||||
|
||||
- `management` — API server + dashboard
|
||||
- `signal` — WebRTC signaling
|
||||
- `relay` — WebSocket fallback relay
|
||||
- `coturn` — TURN/STUN server
|
||||
- `dashboard` — React UI
|
||||
- External IdP required (or Dex deployed separately)
|
||||
|
||||
**Iron Legion recommendation:** Use the **Quickstart model** unless there's a hard requirement for a separate IdP (Authelia, Keycloak, etc.) that cannot run alongside the embedded Dex.
|
||||
|
||||
---
|
||||
|
||||
## Deployment Options for Iron Legion
|
||||
|
||||
### Option A: Docker Swarm on MK7 (Recommended for Low Friction)
|
||||
|
||||
Deploy Netbird as a Docker Swarm stack on MK7, using the **existing Traefik** as the reverse proxy.
|
||||
|
||||
**Pros:**
|
||||
- Already running Swarm + Traefik on MK7
|
||||
- No new VM or LXC to provision
|
||||
- Can share `traefik-public` network
|
||||
- Traefik handles TLS certs via internal CA or Let's Encrypt
|
||||
|
||||
**Cons:**
|
||||
- MK7 is already the Swarm manager + DNS + proxy — adding mesh control plane means more load on the same node
|
||||
- If MK7 goes down, both the mesh *and* the Web UI/proxy go down
|
||||
|
||||
**Port mapping on MK7:**
|
||||
| Port | Protocol | Service |
|
||||
|------|----------|---------|
|
||||
| 80 | TCP | HTTP (redirect + ACME challenge) |
|
||||
| 443 | TCP | HTTPS (Dashboard, Management, Signal, Relay) |
|
||||
| 3478 | UDP | Coturn STUN/TURN |
|
||||
|
||||
> Note: v0.29+ consolidated ports reduce firewall complexity. If all clients run v0.29+, only need 80/443 + 3478. Legacy clients need 33073, 10000, 33080, and UDP 49152-65535.
|
||||
|
||||
### Option B: Dedicated LXC on Proxmox (Recommended for Resilience)
|
||||
|
||||
Deploy Netbird control plane as an LXC container on one of the Proxmox nodes (MK33/34/39/42), with port forwards via `iptables` or host networking.
|
||||
|
||||
**Pros:**
|
||||
- Isolated from Docker Swarm failures
|
||||
- Can colocate with MK7 for low latency but separate failure domain
|
||||
- Easier backups via Proxmox scheduled snapshot
|
||||
|
||||
**Cons:**
|
||||
- Requires provisioning an LXC first
|
||||
- Need to forward UDP 3478 + TCP 443 from host to container
|
||||
|
||||
**Recommended node:** MK39 (Gemini) — currently underutilized, stable node.
|
||||
|
||||
### Option C: PVE VM (Heavy, Overkill)
|
||||
|
||||
Full VM on Proxmox — unnecessary overhead for a coordination server.
|
||||
|
||||
**Verdict:** Option B (LXC on MK39) for resilience, or Option A (Swarm on MK7) if simplicity is preferred.
|
||||
|
||||
---
|
||||
|
||||
## Reverse Proxy Integration
|
||||
|
||||
The `getting-started.sh` script supports **6 reverse proxy modes**:
|
||||
|
||||
| Option | Reverse Proxy | Iron Legion Fit |
|
||||
|--------|-------------|------------------|
|
||||
| `[0]` | Built-in Traefik (new container) | Works but redundant — we already have Traefik |
|
||||
| `[1]` | External Traefik (labels only) | **Best fit for Option A** — generates Docker labels for existing Traefik |
|
||||
| `[2]` | Nginx (config template) | Not needed — already running Traefik |
|
||||
| `[3]` | Nginx Proxy Manager | Not needed |
|
||||
| `[4]` | External Caddy | Not needed |
|
||||
| `[5]` | Other/Manual | Fallback if Traefik ALPN doesn't work |
|
||||
|
||||
**Iron Legion choice:** Option `[1]` — "Existing Traefik" labels. This generates:
|
||||
- `traefik.enable=true`
|
||||
- `traefik.http.routers.netbird-<service>.rule=Host(...)`
|
||||
- `traefik.http.services.netbird-<service>.loadbalancer.server.port=...`
|
||||
- Labels for each endpoint: Dashboard (443), Management gRPC (443), Signal gRPC (443), Relay WebSocket (443)
|
||||
|
||||
### Required Traefik EntryPoints
|
||||
|
||||
Already configured on MK7 Traefik:
|
||||
- `web` (:80) — redirect to HTTPS
|
||||
- `websecure` (:443) — HTTPS + gRPC via HTTP/2
|
||||
- `traefik-dashboard` (:8080) — dashboard
|
||||
|
||||
**No new entrypoints needed.** All Netbird services multiplex over 443 via HTTP/2 ALPN.
|
||||
|
||||
---
|
||||
|
||||
## DNS Requirements
|
||||
|
||||
Netbird needs two DNS records:
|
||||
|
||||
| Type | Record | Points To |
|
||||
|------|--------|-----------|
|
||||
| A | `netbird.ai.home` | MK7 (192.168.7.7) or MK39 LXC IP |
|
||||
| CNAME | `*.netbird.ai.home` | `netbird.ai.home` |
|
||||
|
||||
The wildcard is required for Netbird Proxy — each exposed internal service gets a subdomain (e.g., `service.netbird.ai.home`).
|
||||
|
||||
**Technitium DNS update:** Add:
|
||||
- `netbird.ai.home` → A → 192.168.7.7 (or LXC IP if Option B)
|
||||
- `*.netbird.ai.home` → CNAME → `netbird.ai.home`
|
||||
|
||||
> Note: Netbird clients on the mesh resolve `*.netbird.selfhosted` internally. The `ai.home` DNS is only needed for the dashboard web UI and proxy subdomains.
|
||||
|
||||
---
|
||||
|
||||
## Authentication Strategy
|
||||
|
||||
Netbird Quickstart includes an **embedded Dex** identity provider with local user management. This is sufficient for Iron Legion's current needs.
|
||||
|
||||
**Two paths:**
|
||||
|
||||
### Path 1: Embedded Dex Only (Recommended for Review)
|
||||
- Local user accounts created via Netbird Dashboard
|
||||
- No dependence on external IdP
|
||||
- Username/password or personal access tokens
|
||||
- Can migrate to external IdP later without re-enrolling devices
|
||||
|
||||
### Path 2: Integrate with Existing Authelia (Future)
|
||||
- Authelia on MK7 supports OIDC (added in v4.38+)
|
||||
- Netbird can authenticate against Authelia as the IdP
|
||||
- Single sign-on across all fleet services
|
||||
- More complex setup — save for Phase 2
|
||||
|
||||
**Recommendation:** Start with Path 1 (embedded Dex). It's fully functional, requires zero extra infrastructure, and can be migrated to Authelia OIDC later.
|
||||
|
||||
---
|
||||
|
||||
## Tailscale Coexistence
|
||||
|
||||
Netbird and Tailscale **can run simultaneously** on the same nodes because they use different WireGuard interfaces and port ranges:
|
||||
- Tailscale: UDP 41641 (WireGuard), port 443/TCP (DERP)
|
||||
- Netbird: UDP 51820 (WireGuard), UDP 3478 (TURN), TCP 443 (management/signal)
|
||||
|
||||
**Potential conflicts:**
|
||||
- Both want UDP high-ports for NAT traversal — OS assigns ephemeral ports, typically fine
|
||||
- Both manipulate iptables/routing tables — could interfere with default routes
|
||||
- DNS resolution: Tailscale MagicDNS vs. Netbird DNS — whichever binds `/etc/resolv.conf` last wins
|
||||
|
||||
**Recommended coexistence strategy:**
|
||||
- Primary mesh: Tailscale (currently working, MagicDNS configured for `ai.home`)
|
||||
- Secondary / evaluation: Netbird on a subset of nodes
|
||||
- Use Netbird for specific access-control use cases (e.g., expose certain services via Netbird Proxy)
|
||||
- Do NOT set Netbird as default route unless Tailscale is decommissioned
|
||||
|
||||
---
|
||||
|
||||
## Netbird Proxy — Replacing Traefik?
|
||||
|
||||
**Commander question:** "Run alongside possibly replace Traefik as the reverse proxy"
|
||||
|
||||
**Answer:** Netbird Proxy is NOT a reverse proxy replacement for Traefik. It solves a **different problem**:
|
||||
|
||||
- **Traefik** (existing on MK7): Routes `*.ai.home` traffic *within* the LAN/WAN to Docker containers. It handles HTTP/HTTPS ingress for services like Portainer, PegaProx, Technitium, etc.
|
||||
- **Netbird Proxy**: Exposes internal Netbird mesh services *to the public internet* via subdomain routing, secured by Netbird's access policies. Think of it as a Tailscale Funnel equivalent.
|
||||
|
||||
**Example:**
|
||||
- `prometheus.internal.ai.home` is only reachable inside the LAN → traefik routes to Prometheus
|
||||
- `prometheus.netbird.ai.home` could be exposed to a remote user's laptop via Netbird Proxy with per-user ACLs
|
||||
|
||||
**Verdict:** Keep Traefik. Netbird Proxy complements it for selective external exposure, not replaces it.
|
||||
|
||||
---
|
||||
|
||||
## Resource Requirements
|
||||
|
||||
### Quickstart (single container)
|
||||
| Resource | Min | Recommended |
|
||||
|----------|-----|-------------|
|
||||
| CPU | 1 core | 2 cores |
|
||||
| RAM | 2 GB | 4 GB |
|
||||
| Disk | 10 GB | 20 GB |
|
||||
| Network | Public IP + DNS | Same |
|
||||
|
||||
### Advanced (multi-container)
|
||||
| Resource | Min | Recommended |
|
||||
|----------|-----|-------------|
|
||||
| CPU | 2 cores | 4 cores |
|
||||
| RAM | 4 GB | 8 GB |
|
||||
| Disk | 20 GB | 40 GB |
|
||||
| Network | Same | Same |
|
||||
|
||||
**Iron Legion:** Either MK7 (18 cores, 15 GB RAM) or a Proxmox LXC (easily provisioned with 4 GB RAM, 2 cores) are well within these limits.
|
||||
|
||||
---
|
||||
|
||||
## Deployment Effort Estimate
|
||||
|
||||
| Phase | Task | Time | Notes |
|
||||
|-------|------|------|-------|
|
||||
| P0 | Review this report | — | Commander decision point |
|
||||
| P1 | Add DNS records to Technitium | 15 min | `netbird.ai.home` + wildcard |
|
||||
| P2 | Deploy Netbird (Quickstart Option A or B) | 30 min | Run `getting-started.sh`, select option [1] or [0] |
|
||||
| P3 | Create first admin user via `/setup` | 5 min | Web browser |
|
||||
| P4 | Install Netbird client on test nodes | 20 min | 2-3 nodes for validation |
|
||||
| P5 | Configure network routes + ACLs | 45 min | Mirror Tailscale access patterns |
|
||||
| P6 | Evaluate coexistence vs. Tailscale replacement | Ongoing | 1-2 week trial period |
|
||||
|
||||
**Total hands-on time (if approved):** ~2 hours (+ evaluation period).
|
||||
|
||||
---
|
||||
|
||||
## Known Issues / Gotchas
|
||||
|
||||
1. **ALPN / HTTP/2 requirement:** Netbird v0.29+ consolidated ports require HTTP/2 + ALPN on the reverse proxy. Traefik supports this natively. Nginx requires explicit `http2` directive on `listen`.
|
||||
|
||||
2. **Legacy clients:** If any Iron Legion device runs an older Netbird client (< v0.29), you'll need the legacy ports (33073, 10000, 33080, UDP 49152-65535). Allfleet devices should use latest client.
|
||||
|
||||
3. **Coturn on cloud VMs:** Oracle Cloud and Hetzner require firewall rules for UDP 3478 beyond just VM-level. Not applicable for LAN but noted for future cloud expansion.
|
||||
|
||||
4. **First user setup:** The `/setup` page is **only accessible when zero users exist**. After first admin creation, it redirects to `/login`. To create additional admins, use Dashboard → Settings → Identity Providers or API with PAT.
|
||||
|
||||
5. **NTP dependency:** Authelia failed on MK7 due to unsynchronized clock (see MK7 restoration report). Netbird's management service also checks certificate validity — ensure NTP sync on the host.
|
||||
|
||||
6. **Wildcard DNS for Proxy:** If enabling Netbird Proxy, the wildcard CNAME is mandatory. Without it, exposed service subdomains won't resolve.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Immediate (Pre-Deployment)
|
||||
1. ✅ Commander reviews this report
|
||||
2. ✅ Decide Option A (Swarm on MK7) vs. Option B (LXC on MK39)
|
||||
3. ✅ If Option A: verify Traefik HTTP/2 ALPN is active
|
||||
|
||||
### Short-Term (If Approved)
|
||||
1. Deploy Netbird Quickstart with embedded Dex
|
||||
2. Add `netbird.ai.home` + wildcard to Technitium DNS
|
||||
3. Install clients on 2-3 test nodes (Cinnamint, Artemis, MK42)
|
||||
4. Mirror one Tailscale route in Netbird for comparison
|
||||
|
||||
### Long-Term (Evaluation After 2 Weeks)
|
||||
1. Compare latency/connection reliability vs. Tailscale
|
||||
2. Evaluate Netbird Proxy for selective external access
|
||||
3. Decide: coexist, replace Tailscale, or decommission Netbird
|
||||
4. If replacing: migrate MagicDNS zones to Netbird DNS, update all `.ai.home` client configs
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Netbird Docs (Self-Hosted Quickstart): https://docs.netbird.io/selfhosted/selfhosted-quickstart
|
||||
- Netbird Docs (Advanced Guide): https://docs.netbird.io/selfhosted/selfhosted-guide
|
||||
- GitHub (infrastructure files): https://github.com/netbirdio/netbird/tree/v0.71.4/infrastructure_files
|
||||
- Quickstart install script: `curl -fsSL https://github.com/netbirdio/netbird/releases/latest/download/getting-started.sh | bash`
|
||||
- Reverse Proxy Configuration: https://docs.netbird.io/selfhosted/reverse-proxy
|
||||
- Upgrade / Migration Guide: https://docs.netbird.io/selfhosted/maintenance
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Netbird vs Tailscale Detailed Comparison
|
||||
|
||||
| Aspect | Tailscale | Netbird Self-Hosted |
|
||||
|--------|-----------|---------------------|
|
||||
| Control plane ownership | ❌ Tailscale Inc. | ✅ Fully owned |
|
||||
| Relay ownership | ❌ Tailscale DERP | ✅ Self-hosted Coturn |
|
||||
| Cost | Free tier limited; enterprise paid | Free; unlimited |
|
||||
| Identity | External IdP or Tailscale | Embedded Dex or any OIDC |
|
||||
| Web dashboard | ✅ | ✅ (self-hosted) |
|
||||
| API | ✅ | ✅ (REST + gRPC) |
|
||||
| SCIM provisioning | ❌ (manual) | ✅ (Enterprise) |
|
||||
| Network segmentation / ACLs | Yes (JSON ACL) | Yes (groups + policies) |
|
||||
| Exit nodes | ✅ | ✅ |
|
||||
| Subnet routers | ✅ | ✅ |
|
||||
| Browser client | ❌ | ✅ (WebRTC-based) |
|
||||
| Mobile NAT busting | DERP | TURN + direct p2p |
|
||||
|
||||
---
|
||||
|
||||
*Report generated 2026-05-31 by F.R.I.D.A.Y. — awaiting Commander review.*
|
||||
Reference in New Issue
Block a user