- mk7-service-restoration-report.md: Restored Swarm stacks after relocation, fixed NTP drift, rejoined MK-42 as worker - netbird-evaluation-report.md: Full evaluation of self-hosted Netbird control plane for tailscale coexistence/replacement Author: F.R.I.D.A.Y.
15 KiB
Netbird Self-Hosted Control Plane — Evaluation Report
Author: F.R.I.D.A.Y. ( Hermes Agent ) Date: 2026-05-31 Status: Draft — for Commander review before deployment Scope: Evaluate Netbird self-hosted control plane as a potential replacement or complement to Tailscale mesh networking for the Iron Legion fleet.
Executive Summary
Netbird is an open-source, WireGuard-based mesh VPN that provides peer-to-peer connectivity with a centralized management plane. As of v0.71.4 (May 2026), it now offers two deployment models for self-hosting:
- Quickstart (single-container, recommended for new deployments) — Combined management + signal + relay in one
netbird-servercontainer with embedded Dex IdP. ~5-minute setup viagetting-started.shwith built-in Traefik and automatic TLS. - Advanced (multi-container, legacy but supported) — Separate services (management, signal, coturn, relay, dashboard) configured via
management.jsonanddocker-compose.yml.
Key finding: Netbird now supports running behind an existing reverse proxy (Traefik, Nginx, Caddy) as a first-class deployment option. This is significant for the Iron Legion because MK7 already runs Traefik for *.ai.home services — we can integrate Netbird without adding a new public-facing edge.
What Netbird Offers (vs. Tailscale)
| Feature | Tailscale | Netbird |
|---|---|---|
| Underlay protocol | WireGuard | WireGuard |
| Control plane | Tailscale Co. cloud | Self-hostable |
| NAT traversal | DERP relays (cloud-hosted) | Self-hosted Coturn + Relay |
| Identity provider | Tailscale accounts / SSO via Auth0, etc. | Embedded Dex / Any OIDC IdP |
| Network routes | ✅ | ✅ |
| DNS split-brain | MagicDNS | Network-wide DNS |
| Reverse proxy / funnel | Tailscale Funnel (public) | Built-in reverse proxy via Netbird Proxy |
| Access controls | ACL policies | Group + peer policies |
| Linux clients | ✅ | ✅ |
| Windows | ✅ | ✅ |
| Mobile (iOS/Android) | ✅ | ✅ |
| Browser client | ❌ | ✅ |
| Open-source | Client only | Fully open-source |
For the Iron Legion: The primary advantage of Netbird is full ownership of the control plane. Tailscale depends on Tailscale Inc. infrastructure for coordination and DERP relays; Netbird brings both under our control.
Architecture Overview
Quickstart (v0.29+, Recommended)
[Public Internet]
|
+-- TCP 80/443 --> Traefik (built-in or external)
| |
| +-- Dashboard UI (web)
| +-- Management API (gRPC over HTTPS)
| +-- Signal (gRPC over HTTPS, HTTP/2 ALPN)
| +-- Relay (WebSocket over HTTPS)
|
+-- UDP 3478 --> Coturn (STUN/TURN)
|
+-- UDP 49152-65535 --> TURN relay ports (legacy)
Combined server container (netbird-server) consolidates:
- Management Service — peer orchestration, ACLs, routes, DNS
- Signal Service — WebRTC signaling for direct WireGuard connections
- Relay Service — WebSocket relay for fallback when direct p2p fails
- Embedded Dex — built-in identity provider (local users + external OIDC)
- Dashboard — web management UI
New in v0.29: Management and Signal share port 443 via HTTP/2 ALPN. Previously required separate ports (33073 for management gRPC, 10000 for signal gRPC, 33080 for relay).
Advanced (legacy multi-container)
management— API server + dashboardsignal— WebRTC signalingrelay— WebSocket fallback relaycoturn— TURN/STUN serverdashboard— React UI- External IdP required (or Dex deployed separately)
Iron Legion recommendation: Use the Quickstart model unless there's a hard requirement for a separate IdP (Authelia, Keycloak, etc.) that cannot run alongside the embedded Dex.
Deployment Options for Iron Legion
Option A: Docker Swarm on MK7 (Recommended for Low Friction)
Deploy Netbird as a Docker Swarm stack on MK7, using the existing Traefik as the reverse proxy.
Pros:
- Already running Swarm + Traefik on MK7
- No new VM or LXC to provision
- Can share
traefik-publicnetwork - Traefik handles TLS certs via internal CA or Let's Encrypt
Cons:
- MK7 is already the Swarm manager + DNS + proxy — adding mesh control plane means more load on the same node
- If MK7 goes down, both the mesh and the Web UI/proxy go down
Port mapping on MK7:
| Port | Protocol | Service |
|---|---|---|
| 80 | TCP | HTTP (redirect + ACME challenge) |
| 443 | TCP | HTTPS (Dashboard, Management, Signal, Relay) |
| 3478 | UDP | Coturn STUN/TURN |
Note: v0.29+ consolidated ports reduce firewall complexity. If all clients run v0.29+, only need 80/443 + 3478. Legacy clients need 33073, 10000, 33080, and UDP 49152-65535.
Option B: Dedicated LXC on Proxmox (Recommended for Resilience)
Deploy Netbird control plane as an LXC container on one of the Proxmox nodes (MK33/34/39/42), with port forwards via iptables or host networking.
Pros:
- Isolated from Docker Swarm failures
- Can colocate with MK7 for low latency but separate failure domain
- Easier backups via Proxmox scheduled snapshot
Cons:
- Requires provisioning an LXC first
- Need to forward UDP 3478 + TCP 443 from host to container
Recommended node: MK39 (Gemini) — currently underutilized, stable node.
Option C: PVE VM (Heavy, Overkill)
Full VM on Proxmox — unnecessary overhead for a coordination server.
Verdict: Option B (LXC on MK39) for resilience, or Option A (Swarm on MK7) if simplicity is preferred.
Reverse Proxy Integration
The getting-started.sh script supports 6 reverse proxy modes:
| Option | Reverse Proxy | Iron Legion Fit |
|---|---|---|
[0] |
Built-in Traefik (new container) | Works but redundant — we already have Traefik |
[1] |
External Traefik (labels only) | Best fit for Option A — generates Docker labels for existing Traefik |
[2] |
Nginx (config template) | Not needed — already running Traefik |
[3] |
Nginx Proxy Manager | Not needed |
[4] |
External Caddy | Not needed |
[5] |
Other/Manual | Fallback if Traefik ALPN doesn't work |
Iron Legion choice: Option [1] — "Existing Traefik" labels. This generates:
traefik.enable=truetraefik.http.routers.netbird-<service>.rule=Host(...)traefik.http.services.netbird-<service>.loadbalancer.server.port=...- Labels for each endpoint: Dashboard (443), Management gRPC (443), Signal gRPC (443), Relay WebSocket (443)
Required Traefik EntryPoints
Already configured on MK7 Traefik:
web(:80) — redirect to HTTPSwebsecure(:443) — HTTPS + gRPC via HTTP/2traefik-dashboard(:8080) — dashboard
No new entrypoints needed. All Netbird services multiplex over 443 via HTTP/2 ALPN.
DNS Requirements
Netbird needs two DNS records:
| Type | Record | Points To |
|---|---|---|
| A | netbird.ai.home |
MK7 (192.168.7.7) or MK39 LXC IP |
| CNAME | *.netbird.ai.home |
netbird.ai.home |
The wildcard is required for Netbird Proxy — each exposed internal service gets a subdomain (e.g., service.netbird.ai.home).
Technitium DNS update: Add:
netbird.ai.home→ A → 192.168.7.7 (or LXC IP if Option B)*.netbird.ai.home→ CNAME →netbird.ai.home
Note: Netbird clients on the mesh resolve
*.netbird.selfhostedinternally. Theai.homeDNS is only needed for the dashboard web UI and proxy subdomains.
Authentication Strategy
Netbird Quickstart includes an embedded Dex identity provider with local user management. This is sufficient for Iron Legion's current needs.
Two paths:
Path 1: Embedded Dex Only (Recommended for Review)
- Local user accounts created via Netbird Dashboard
- No dependence on external IdP
- Username/password or personal access tokens
- Can migrate to external IdP later without re-enrolling devices
Path 2: Integrate with Existing Authelia (Future)
- Authelia on MK7 supports OIDC (added in v4.38+)
- Netbird can authenticate against Authelia as the IdP
- Single sign-on across all fleet services
- More complex setup — save for Phase 2
Recommendation: Start with Path 1 (embedded Dex). It's fully functional, requires zero extra infrastructure, and can be migrated to Authelia OIDC later.
Tailscale Coexistence
Netbird and Tailscale can run simultaneously on the same nodes because they use different WireGuard interfaces and port ranges:
- Tailscale: UDP 41641 (WireGuard), port 443/TCP (DERP)
- Netbird: UDP 51820 (WireGuard), UDP 3478 (TURN), TCP 443 (management/signal)
Potential conflicts:
- Both want UDP high-ports for NAT traversal — OS assigns ephemeral ports, typically fine
- Both manipulate iptables/routing tables — could interfere with default routes
- DNS resolution: Tailscale MagicDNS vs. Netbird DNS — whichever binds
/etc/resolv.conflast wins
Recommended coexistence strategy:
- Primary mesh: Tailscale (currently working, MagicDNS configured for
ai.home) - Secondary / evaluation: Netbird on a subset of nodes
- Use Netbird for specific access-control use cases (e.g., expose certain services via Netbird Proxy)
- Do NOT set Netbird as default route unless Tailscale is decommissioned
Netbird Proxy — Replacing Traefik?
Commander question: "Run alongside possibly replace Traefik as the reverse proxy"
Answer: Netbird Proxy is NOT a reverse proxy replacement for Traefik. It solves a different problem:
- Traefik (existing on MK7): Routes
*.ai.hometraffic within the LAN/WAN to Docker containers. It handles HTTP/HTTPS ingress for services like Portainer, PegaProx, Technitium, etc. - Netbird Proxy: Exposes internal Netbird mesh services to the public internet via subdomain routing, secured by Netbird's access policies. Think of it as a Tailscale Funnel equivalent.
Example:
prometheus.internal.ai.homeis only reachable inside the LAN → traefik routes to Prometheusprometheus.netbird.ai.homecould be exposed to a remote user's laptop via Netbird Proxy with per-user ACLs
Verdict: Keep Traefik. Netbird Proxy complements it for selective external exposure, not replaces it.
Resource Requirements
Quickstart (single container)
| Resource | Min | Recommended |
|---|---|---|
| CPU | 1 core | 2 cores |
| RAM | 2 GB | 4 GB |
| Disk | 10 GB | 20 GB |
| Network | Public IP + DNS | Same |
Advanced (multi-container)
| Resource | Min | Recommended |
|---|---|---|
| CPU | 2 cores | 4 cores |
| RAM | 4 GB | 8 GB |
| Disk | 20 GB | 40 GB |
| Network | Same | Same |
Iron Legion: Either MK7 (18 cores, 15 GB RAM) or a Proxmox LXC (easily provisioned with 4 GB RAM, 2 cores) are well within these limits.
Deployment Effort Estimate
| Phase | Task | Time | Notes |
|---|---|---|---|
| P0 | Review this report | — | Commander decision point |
| P1 | Add DNS records to Technitium | 15 min | netbird.ai.home + wildcard |
| P2 | Deploy Netbird (Quickstart Option A or B) | 30 min | Run getting-started.sh, select option [1] or [0] |
| P3 | Create first admin user via /setup |
5 min | Web browser |
| P4 | Install Netbird client on test nodes | 20 min | 2-3 nodes for validation |
| P5 | Configure network routes + ACLs | 45 min | Mirror Tailscale access patterns |
| P6 | Evaluate coexistence vs. Tailscale replacement | Ongoing | 1-2 week trial period |
Total hands-on time (if approved): ~2 hours (+ evaluation period).
Known Issues / Gotchas
-
ALPN / HTTP/2 requirement: Netbird v0.29+ consolidated ports require HTTP/2 + ALPN on the reverse proxy. Traefik supports this natively. Nginx requires explicit
http2directive onlisten. -
Legacy clients: If any Iron Legion device runs an older Netbird client (< v0.29), you'll need the legacy ports (33073, 10000, 33080, UDP 49152-65535). Allfleet devices should use latest client.
-
Coturn on cloud VMs: Oracle Cloud and Hetzner require firewall rules for UDP 3478 beyond just VM-level. Not applicable for LAN but noted for future cloud expansion.
-
First user setup: The
/setuppage is only accessible when zero users exist. After first admin creation, it redirects to/login. To create additional admins, use Dashboard → Settings → Identity Providers or API with PAT. -
NTP dependency: Authelia failed on MK7 due to unsynchronized clock (see MK7 restoration report). Netbird's management service also checks certificate validity — ensure NTP sync on the host.
-
Wildcard DNS for Proxy: If enabling Netbird Proxy, the wildcard CNAME is mandatory. Without it, exposed service subdomains won't resolve.
Recommendations
Immediate (Pre-Deployment)
- ✅ Commander reviews this report
- ✅ Decide Option A (Swarm on MK7) vs. Option B (LXC on MK39)
- ✅ If Option A: verify Traefik HTTP/2 ALPN is active
Short-Term (If Approved)
- Deploy Netbird Quickstart with embedded Dex
- Add
netbird.ai.home+ wildcard to Technitium DNS - Install clients on 2-3 test nodes (Cinnamint, Artemis, MK42)
- Mirror one Tailscale route in Netbird for comparison
Long-Term (Evaluation After 2 Weeks)
- Compare latency/connection reliability vs. Tailscale
- Evaluate Netbird Proxy for selective external access
- Decide: coexist, replace Tailscale, or decommission Netbird
- If replacing: migrate MagicDNS zones to Netbird DNS, update all
.ai.homeclient configs
References
- Netbird Docs (Self-Hosted Quickstart): https://docs.netbird.io/selfhosted/selfhosted-quickstart
- Netbird Docs (Advanced Guide): https://docs.netbird.io/selfhosted/selfhosted-guide
- GitHub (infrastructure files): https://github.com/netbirdio/netbird/tree/v0.71.4/infrastructure_files
- Quickstart install script:
curl -fsSL https://github.com/netbirdio/netbird/releases/latest/download/getting-started.sh | bash - Reverse Proxy Configuration: https://docs.netbird.io/selfhosted/reverse-proxy
- Upgrade / Migration Guide: https://docs.netbird.io/selfhosted/maintenance
Appendix: Netbird vs Tailscale Detailed Comparison
| Aspect | Tailscale | Netbird Self-Hosted |
|---|---|---|
| Control plane ownership | ❌ Tailscale Inc. | ✅ Fully owned |
| Relay ownership | ❌ Tailscale DERP | ✅ Self-hosted Coturn |
| Cost | Free tier limited; enterprise paid | Free; unlimited |
| Identity | External IdP or Tailscale | Embedded Dex or any OIDC |
| Web dashboard | ✅ | ✅ (self-hosted) |
| API | ✅ | ✅ (REST + gRPC) |
| SCIM provisioning | ❌ (manual) | ✅ (Enterprise) |
| Network segmentation / ACLs | Yes (JSON ACL) | Yes (groups + policies) |
| Exit nodes | ✅ | ✅ |
| Subnet routers | ✅ | ✅ |
| Browser client | ❌ | ✅ (WebRTC-based) |
| Mobile NAT busting | DERP | TURN + direct p2p |
Report generated 2026-05-31 by F.R.I.D.A.Y. — awaiting Commander review.