Files
documentation/reports/netbird-evaluation-report.md
F.R.I.D.A.Y. 3da2689e4d Add fleet operational reports
- mk7-service-restoration-report.md: Restored Swarm stacks after relocation, fixed NTP drift, rejoined MK-42 as worker
- netbird-evaluation-report.md: Full evaluation of self-hosted Netbird control plane for tailscale coexistence/replacement

Author: F.R.I.D.A.Y.
2026-06-01 07:45:13 -04:00

15 KiB

Netbird Self-Hosted Control Plane — Evaluation Report

Author: F.R.I.D.A.Y. ( Hermes Agent ) Date: 2026-05-31 Status: Draft — for Commander review before deployment Scope: Evaluate Netbird self-hosted control plane as a potential replacement or complement to Tailscale mesh networking for the Iron Legion fleet.


Executive Summary

Netbird is an open-source, WireGuard-based mesh VPN that provides peer-to-peer connectivity with a centralized management plane. As of v0.71.4 (May 2026), it now offers two deployment models for self-hosting:

  1. Quickstart (single-container, recommended for new deployments) — Combined management + signal + relay in one netbird-server container with embedded Dex IdP. ~5-minute setup via getting-started.sh with built-in Traefik and automatic TLS.
  2. Advanced (multi-container, legacy but supported) — Separate services (management, signal, coturn, relay, dashboard) configured via management.json and docker-compose.yml.

Key finding: Netbird now supports running behind an existing reverse proxy (Traefik, Nginx, Caddy) as a first-class deployment option. This is significant for the Iron Legion because MK7 already runs Traefik for *.ai.home services — we can integrate Netbird without adding a new public-facing edge.


What Netbird Offers (vs. Tailscale)

Feature Tailscale Netbird
Underlay protocol WireGuard WireGuard
Control plane Tailscale Co. cloud Self-hostable
NAT traversal DERP relays (cloud-hosted) Self-hosted Coturn + Relay
Identity provider Tailscale accounts / SSO via Auth0, etc. Embedded Dex / Any OIDC IdP
Network routes
DNS split-brain MagicDNS Network-wide DNS
Reverse proxy / funnel Tailscale Funnel (public) Built-in reverse proxy via Netbird Proxy
Access controls ACL policies Group + peer policies
Linux clients
Windows
Mobile (iOS/Android)
Browser client
Open-source Client only Fully open-source

For the Iron Legion: The primary advantage of Netbird is full ownership of the control plane. Tailscale depends on Tailscale Inc. infrastructure for coordination and DERP relays; Netbird brings both under our control.


Architecture Overview

[Public Internet]
   |
   +-- TCP 80/443  --> Traefik (built-in or external)
   |                      |
   |                      +-- Dashboard UI (web)
   |                      +-- Management API (gRPC over HTTPS)
   |                      +-- Signal (gRPC over HTTPS, HTTP/2 ALPN)
   |                      +-- Relay (WebSocket over HTTPS)
   |
   +-- UDP 3478  --> Coturn (STUN/TURN)
   |
   +-- UDP 49152-65535 --> TURN relay ports (legacy)

Combined server container (netbird-server) consolidates:

  • Management Service — peer orchestration, ACLs, routes, DNS
  • Signal Service — WebRTC signaling for direct WireGuard connections
  • Relay Service — WebSocket relay for fallback when direct p2p fails
  • Embedded Dex — built-in identity provider (local users + external OIDC)
  • Dashboard — web management UI

New in v0.29: Management and Signal share port 443 via HTTP/2 ALPN. Previously required separate ports (33073 for management gRPC, 10000 for signal gRPC, 33080 for relay).

Advanced (legacy multi-container)

  • management — API server + dashboard
  • signal — WebRTC signaling
  • relay — WebSocket fallback relay
  • coturn — TURN/STUN server
  • dashboard — React UI
  • External IdP required (or Dex deployed separately)

Iron Legion recommendation: Use the Quickstart model unless there's a hard requirement for a separate IdP (Authelia, Keycloak, etc.) that cannot run alongside the embedded Dex.


Deployment Options for Iron Legion

Deploy Netbird as a Docker Swarm stack on MK7, using the existing Traefik as the reverse proxy.

Pros:

  • Already running Swarm + Traefik on MK7
  • No new VM or LXC to provision
  • Can share traefik-public network
  • Traefik handles TLS certs via internal CA or Let's Encrypt

Cons:

  • MK7 is already the Swarm manager + DNS + proxy — adding mesh control plane means more load on the same node
  • If MK7 goes down, both the mesh and the Web UI/proxy go down

Port mapping on MK7:

Port Protocol Service
80 TCP HTTP (redirect + ACME challenge)
443 TCP HTTPS (Dashboard, Management, Signal, Relay)
3478 UDP Coturn STUN/TURN

Note: v0.29+ consolidated ports reduce firewall complexity. If all clients run v0.29+, only need 80/443 + 3478. Legacy clients need 33073, 10000, 33080, and UDP 49152-65535.

Deploy Netbird control plane as an LXC container on one of the Proxmox nodes (MK33/34/39/42), with port forwards via iptables or host networking.

Pros:

  • Isolated from Docker Swarm failures
  • Can colocate with MK7 for low latency but separate failure domain
  • Easier backups via Proxmox scheduled snapshot

Cons:

  • Requires provisioning an LXC first
  • Need to forward UDP 3478 + TCP 443 from host to container

Recommended node: MK39 (Gemini) — currently underutilized, stable node.

Option C: PVE VM (Heavy, Overkill)

Full VM on Proxmox — unnecessary overhead for a coordination server.

Verdict: Option B (LXC on MK39) for resilience, or Option A (Swarm on MK7) if simplicity is preferred.


Reverse Proxy Integration

The getting-started.sh script supports 6 reverse proxy modes:

Option Reverse Proxy Iron Legion Fit
[0] Built-in Traefik (new container) Works but redundant — we already have Traefik
[1] External Traefik (labels only) Best fit for Option A — generates Docker labels for existing Traefik
[2] Nginx (config template) Not needed — already running Traefik
[3] Nginx Proxy Manager Not needed
[4] External Caddy Not needed
[5] Other/Manual Fallback if Traefik ALPN doesn't work

Iron Legion choice: Option [1] — "Existing Traefik" labels. This generates:

  • traefik.enable=true
  • traefik.http.routers.netbird-<service>.rule=Host(...)
  • traefik.http.services.netbird-<service>.loadbalancer.server.port=...
  • Labels for each endpoint: Dashboard (443), Management gRPC (443), Signal gRPC (443), Relay WebSocket (443)

Required Traefik EntryPoints

Already configured on MK7 Traefik:

  • web (:80) — redirect to HTTPS
  • websecure (:443) — HTTPS + gRPC via HTTP/2
  • traefik-dashboard (:8080) — dashboard

No new entrypoints needed. All Netbird services multiplex over 443 via HTTP/2 ALPN.


DNS Requirements

Netbird needs two DNS records:

Type Record Points To
A netbird.ai.home MK7 (192.168.7.7) or MK39 LXC IP
CNAME *.netbird.ai.home netbird.ai.home

The wildcard is required for Netbird Proxy — each exposed internal service gets a subdomain (e.g., service.netbird.ai.home).

Technitium DNS update: Add:

  • netbird.ai.home → A → 192.168.7.7 (or LXC IP if Option B)
  • *.netbird.ai.home → CNAME → netbird.ai.home

Note: Netbird clients on the mesh resolve *.netbird.selfhosted internally. The ai.home DNS is only needed for the dashboard web UI and proxy subdomains.


Authentication Strategy

Netbird Quickstart includes an embedded Dex identity provider with local user management. This is sufficient for Iron Legion's current needs.

Two paths:

  • Local user accounts created via Netbird Dashboard
  • No dependence on external IdP
  • Username/password or personal access tokens
  • Can migrate to external IdP later without re-enrolling devices

Path 2: Integrate with Existing Authelia (Future)

  • Authelia on MK7 supports OIDC (added in v4.38+)
  • Netbird can authenticate against Authelia as the IdP
  • Single sign-on across all fleet services
  • More complex setup — save for Phase 2

Recommendation: Start with Path 1 (embedded Dex). It's fully functional, requires zero extra infrastructure, and can be migrated to Authelia OIDC later.


Tailscale Coexistence

Netbird and Tailscale can run simultaneously on the same nodes because they use different WireGuard interfaces and port ranges:

  • Tailscale: UDP 41641 (WireGuard), port 443/TCP (DERP)
  • Netbird: UDP 51820 (WireGuard), UDP 3478 (TURN), TCP 443 (management/signal)

Potential conflicts:

  • Both want UDP high-ports for NAT traversal — OS assigns ephemeral ports, typically fine
  • Both manipulate iptables/routing tables — could interfere with default routes
  • DNS resolution: Tailscale MagicDNS vs. Netbird DNS — whichever binds /etc/resolv.conf last wins

Recommended coexistence strategy:

  • Primary mesh: Tailscale (currently working, MagicDNS configured for ai.home)
  • Secondary / evaluation: Netbird on a subset of nodes
  • Use Netbird for specific access-control use cases (e.g., expose certain services via Netbird Proxy)
  • Do NOT set Netbird as default route unless Tailscale is decommissioned

Netbird Proxy — Replacing Traefik?

Commander question: "Run alongside possibly replace Traefik as the reverse proxy"

Answer: Netbird Proxy is NOT a reverse proxy replacement for Traefik. It solves a different problem:

  • Traefik (existing on MK7): Routes *.ai.home traffic within the LAN/WAN to Docker containers. It handles HTTP/HTTPS ingress for services like Portainer, PegaProx, Technitium, etc.
  • Netbird Proxy: Exposes internal Netbird mesh services to the public internet via subdomain routing, secured by Netbird's access policies. Think of it as a Tailscale Funnel equivalent.

Example:

  • prometheus.internal.ai.home is only reachable inside the LAN → traefik routes to Prometheus
  • prometheus.netbird.ai.home could be exposed to a remote user's laptop via Netbird Proxy with per-user ACLs

Verdict: Keep Traefik. Netbird Proxy complements it for selective external exposure, not replaces it.


Resource Requirements

Quickstart (single container)

Resource Min Recommended
CPU 1 core 2 cores
RAM 2 GB 4 GB
Disk 10 GB 20 GB
Network Public IP + DNS Same

Advanced (multi-container)

Resource Min Recommended
CPU 2 cores 4 cores
RAM 4 GB 8 GB
Disk 20 GB 40 GB
Network Same Same

Iron Legion: Either MK7 (18 cores, 15 GB RAM) or a Proxmox LXC (easily provisioned with 4 GB RAM, 2 cores) are well within these limits.


Deployment Effort Estimate

Phase Task Time Notes
P0 Review this report Commander decision point
P1 Add DNS records to Technitium 15 min netbird.ai.home + wildcard
P2 Deploy Netbird (Quickstart Option A or B) 30 min Run getting-started.sh, select option [1] or [0]
P3 Create first admin user via /setup 5 min Web browser
P4 Install Netbird client on test nodes 20 min 2-3 nodes for validation
P5 Configure network routes + ACLs 45 min Mirror Tailscale access patterns
P6 Evaluate coexistence vs. Tailscale replacement Ongoing 1-2 week trial period

Total hands-on time (if approved): ~2 hours (+ evaluation period).


Known Issues / Gotchas

  1. ALPN / HTTP/2 requirement: Netbird v0.29+ consolidated ports require HTTP/2 + ALPN on the reverse proxy. Traefik supports this natively. Nginx requires explicit http2 directive on listen.

  2. Legacy clients: If any Iron Legion device runs an older Netbird client (< v0.29), you'll need the legacy ports (33073, 10000, 33080, UDP 49152-65535). Allfleet devices should use latest client.

  3. Coturn on cloud VMs: Oracle Cloud and Hetzner require firewall rules for UDP 3478 beyond just VM-level. Not applicable for LAN but noted for future cloud expansion.

  4. First user setup: The /setup page is only accessible when zero users exist. After first admin creation, it redirects to /login. To create additional admins, use Dashboard → Settings → Identity Providers or API with PAT.

  5. NTP dependency: Authelia failed on MK7 due to unsynchronized clock (see MK7 restoration report). Netbird's management service also checks certificate validity — ensure NTP sync on the host.

  6. Wildcard DNS for Proxy: If enabling Netbird Proxy, the wildcard CNAME is mandatory. Without it, exposed service subdomains won't resolve.


Recommendations

Immediate (Pre-Deployment)

  1. Commander reviews this report
  2. Decide Option A (Swarm on MK7) vs. Option B (LXC on MK39)
  3. If Option A: verify Traefik HTTP/2 ALPN is active

Short-Term (If Approved)

  1. Deploy Netbird Quickstart with embedded Dex
  2. Add netbird.ai.home + wildcard to Technitium DNS
  3. Install clients on 2-3 test nodes (Cinnamint, Artemis, MK42)
  4. Mirror one Tailscale route in Netbird for comparison

Long-Term (Evaluation After 2 Weeks)

  1. Compare latency/connection reliability vs. Tailscale
  2. Evaluate Netbird Proxy for selective external access
  3. Decide: coexist, replace Tailscale, or decommission Netbird
  4. If replacing: migrate MagicDNS zones to Netbird DNS, update all .ai.home client configs

References


Appendix: Netbird vs Tailscale Detailed Comparison

Aspect Tailscale Netbird Self-Hosted
Control plane ownership Tailscale Inc. Fully owned
Relay ownership Tailscale DERP Self-hosted Coturn
Cost Free tier limited; enterprise paid Free; unlimited
Identity External IdP or Tailscale Embedded Dex or any OIDC
Web dashboard (self-hosted)
API (REST + gRPC)
SCIM provisioning (manual) (Enterprise)
Network segmentation / ACLs Yes (JSON ACL) Yes (groups + policies)
Exit nodes
Subnet routers
Browser client (WebRTC-based)
Mobile NAT busting DERP TURN + direct p2p

Report generated 2026-05-31 by F.R.I.D.A.Y. — awaiting Commander review.