# wl-webrtc Architecture Design **Date**: 2026-04-03 **Status**: Draft **Author**: Sisyphus (AI-assisted design) --- ## 1. Overview ### 1.1 Problem Statement Build a low-latency Wayland screen sharing server that captures the desktop via GPU, encodes with hardware acceleration (VAAPI/Vulkan), and streams to a browser for remote viewing and eventual remote control. ### 1.2 Goals - **Glass-to-glass latency < 50ms** on LAN - **GPU-accelerated pipeline**: capture + encode entirely on GPU, only encoded bitstream crosses to CPU - **Browser-only client**: no native app installation required - **Single binary deployment**: embedded web UI, no external dependencies - **Linux Wayland only**: no cross-platform abstraction needed - **Annex B mode**: encoder must emit in-band SPS/PPS with every keyframe via the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) — NOT `repeat_headers=1` (that option is libx264-only and does NOT exist for `h264_vaapi`) - **Annex B streaming**: encoder outputs Annex B (start-code-prefixed) NAL units with SPS/PPS injected per-IDR via `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`), browser decodes in Annex B mode via WebCodecs. Note: `repeat_headers=1` is a libx264-only option, NOT available for `h264_vaapi`. ### 1.3 Non-Goals (Phase 1) - Multi-client support (Phase 2) - Audio streaming (Phase 3) - Remote input injection (Phase 2) - Firefox support (Phase 3 — WebRTC fallback) - Adaptive bitrate (Phase 3) ### 1.4 Technology Stack | Component | Technology | Rationale | |-----------|-----------|-----------| | Screen capture | wayland-client + DMA-BUF | Zero-copy GPU capture via DMA-BUF | | GPU encoding | FFmpeg (ffmpeg-next) VAAPI/Vulkan | H.264/HEVC hardware encoding | | Transport | wtransport (WebTransport over HTTP/3) | Full HTTP/3 + WebTransport protocol, built on quinn + rustls | | Browser decode | WebCodecs VideoDecoder | Direct decode control, no MSE buffering | | Web UI | axum + rust-embed | Single binary, compile-time embedded static files | | Event loop | mio | Proven with Wayland file descriptor callbacks | | Async runtime | tokio | Required by wtransport, also powers axum | | Sync/async bridge | async_channel | Both sync send() and async recv(), bridges mio → tokio naturally | ### 1.5 Transport Decision: Why Not WebRTC WebRTC was evaluated and rejected as the primary transport for this use case: | Factor | WebRTC (webrtc-rs) | WebTransport + WebCodecs | |--------|-------------------|-------------------------| | Glass-to-glass latency | 30-110ms (unavoidable 20-60ms jitter buffer) | 12-38ms (no jitter buffer) | | Rust ecosystem | webrtc-rs v0.20.0-alpha, mid-rewrite | wtransport production-grade, built on quinn | | Protocol overhead | ICE/DTLS/SRTP/SDP — designed for P2P NAT traversal | QUIC TLS 1.3 — server-to-client, simpler | | Decode control | Browser controls jitter buffer, cannot opt out | Application controls every frame decode | | GPU data path | Sample { data: Bytes }, must copy to CPU | Same copy, but shorter pipeline | | Browser support | All browsers | Chrome/Edge only (Firefox lacks WebCodecs) | **Transport library choice**: We use the `wtransport` crate (v0.7) instead of raw `quinn` + `h3`. The browser's `WebTransport` API requires a full HTTP/3 server with the WebTransport extension (RFC 9297). Raw QUIC is NOT sufficient — there is no browser API for raw QUIC connections. The `wtransport` crate provides the complete protocol stack (HTTP/3 + WebTransport) built on top of `quinn` 0.11 and `rustls` 0.23, with support for datagrams, unidirectional streams, and bidirectional streams. WebRTC will be added as a Phase 3 fallback for Firefox compatibility. --- ## 2. Architecture ### 2.1 Thread Model ``` ┌─────────────────────────────────────────────────────────────┐ │ wl-webrtc process │ │ │ │ Main Thread (mio event loop) │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Wayland event queue dispatch │ │ │ │ Screen capture (DMA-BUF, zero-copy from compositor) │ │ │ │ GPU encode (FFmpeg VAAPI/Vulkan, sync calls) │ │ │ │ State machine transitions │ │ │ │ FPS limiting │ │ │ └──────────────────────┬───────────────────────────────┘ │ │ │ │ │ async_channel::bounded<16>(EncodedFrame) │ │ │ │ │ Tokio Runtime Thread Pool (2+ threads) │ │ ┌──────────────────────▼───────────────────────────────┐ │ │ │ wtransport WebTransport server │ │ │ │ HTTP/3 + WebTransport session management │ │ │ │ Frame distribution to connected clients │ │ │ │ axum HTTP server (Web UI + control API) │ │ │ │ rust-embed static file serving │ │ │ └──────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` **Design rationale**: - **Capture + encode on main thread**: GPU encoding is synchronous (3-8ms per frame at 30-60fps), doesn't block the mio event loop at these frame rates. This avoids cross-thread synchronization for the GPU pipeline. - **wtransport on tokio**: wtransport is built on quinn and tokio. axum requires tokio. Both coexist naturally. Both the WebTransport server and the HTTP static file server share the same tokio runtime. - **async_channel::bounded(16)**: Channel capacity of 16 frames provides ~260ms of buffer at 60fps — enough to absorb transport jitter without excessive latency. The sender uses `try_send()`: if the channel is full, the frame is dropped and logged. This is standard practice in real-time streaming — newer frames are always more valuable than older ones. `try_send()` returns `Err(TrySendError::Full(_))` on a full channel, which the main loop handles by discarding the frame. This avoids blocking the main mio event loop, which must remain responsive for Wayland event dispatch. **Do NOT use `send_blocking()`** on the mio thread — it would stall the capture pipeline if the transport consumer falls behind. ### 2.2 Module Dependency Graph ``` ┌──────────┐ │ main.rs │ entry point, CLI, orchestration └──┬──┬─┬──┘ │ │ │ ┌────────────┘ │ └────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌────────────┐ │ state.rs │ │ avhw.rs │ │ transport.rs│ │ StateMachine │ HW ctx │ │ QUIC server │ │ CaptureSource │ │ │ Sessions │ └──┬───┬────┘ └────┬─────┘ └────────────┘ │ │ │ ┌─────┘ └──────┐ │ ▼ ▼ ▼ ┌─────────┐ ┌────────────┐ ┌────────┐ │cap_wlr_ │ │cap_ext_ │ │filter.rs│ │screen │ │image_copy │ │ crop/ │ │copy │ │ │ │ scale/ │ └─────────┘ └────────────┘ │transpose│ └────────┘ ┌────────────┐ ┌──────────────┐ │transform.rs│ │signaling.rs │ │ coordinate │ │ axum + embed │ │ transform │ │ Web UI serve │ └────────────┘ └──────────────┘ ┌────────────┐ │fps_limit.rs│ └────────────┘ ``` **Dependency layers** (bottom-up): 1. `transform.rs`, `fps_limit.rs` — leaf modules, zero internal dependencies 2. `avhw.rs`, `filter.rs` — FFmpeg wrapper layer 3. `cap_wlr_screencopy.rs`, `cap_ext_image_copy.rs` — capture backends, depend on state + avhw 4. `state.rs` — state machine + CaptureSource trait 5. `transport.rs`, `signaling.rs` — network layer 6. `main.rs` — orchestration ### 2.3 Project File Structure ``` wl-webrtc/ ├── Cargo.toml ├── README.md ├── src/ │ ├── main.rs # ~300 lines — CLI, startup, orchestration │ ├── state.rs # ~600 lines — State, EncConstructionStage, InFlightSurface │ ├── avhw.rs # ~450 lines — FFmpeg HW device/frame contexts │ ├── filter.rs # ~200 lines — FFmpeg video filter graph │ ├── cap_wlr_screencopy.rs # ~170 lines — wlr-screencopy backend │ ├── cap_ext_image_copy.rs # ~240 lines — ext-image-copy-capture backend │ ├── transform.rs # ~220 lines — coordinate transforms │ ├── fps_limit.rs # ~130 lines — VRR-aware frame rate limiter │ ├── transport.rs # ~400 lines — QUIC/WebTransport server │ ├── signaling.rs # ~200 lines — axum HTTP + WebSocket control │ └── nalu.rs # ~150 lines — Annex B NAL unit splitting, framing protocol ├── static/ │ ├── index.html # Web UI shell │ ├── player.js # WebCodecs decoder + Canvas renderer │ └── style.css # Minimal styling └── protocols/ # Wayland protocol XML files ``` --- ## 3. Data Flow ### 3.1 Zero-Copy Capture Pipeline ``` GPU Frame Pool ─alloc()→ HW Surface ↓ av_hwframe_map → DMA-BUF fd ↓ zwp_linux_dmabuf → WlBuffer (fd shared) ↓ Compositor writes directly to GPU Surface ↓ FFmpeg VAAPI/Vulkan encode (GPU-internal) ↓ AVPacket.data (Annex B with 00 00 00 01 start codes) ↓ ← GPU→CPU copy via vaMapBuffer (unavoidable) Bytes::from(Vec) wrapper ↓ async_channel::bounded::send(EncodedFrame) // sync, non-blocking on main thread ``` ### 3.2 Transport Pipeline ``` async_channel::bounded::recv(EncodedFrame) ↓ Frame byte-splitting at MTU boundaries (not NAL-aligned) ↓ ┌─ Keyframe → QUIC reliable stream (guaranteed delivery) └─ Delta frame → QUIC datagram (unreliable, low latency) ↓ Quinn WebTransport send ↓ Browser WebTransport.receive() ↓ Frame reassembly (if fragmented) ↓ WebCodecs VideoDecoder.decode(EncodedVideoChunk) ↓ Canvas.drawImage(VideoFrame) ``` ### 3.3 Latency Budget | Stage | Latency | Notes | |-------|---------|-------| | Wayland capture (KMS/dmabuf) | 1-3ms | Zero-copy from compositor | | GPU encode (VAAPI H.264) | 3-8ms | Synchronous, main thread | | vaMapBuffer CPU copy | <1ms | Unavoidable GPU→CPU | | async_channel | <0.1ms | In-process | | QUIC datagram (LAN) | 1-10ms | LAN transit, merged with network | | WebCodecs decode | 2-5ms | Browser hardware decode | | Canvas render | 1-2ms | requestAnimationFrame | | **Total (LAN)** | **9-29ms** | Well under 50ms target (corrected: removed double-counted network transit) | ### 3.4 EncodedFrame Structure ```rust #[derive(Clone)] struct EncodedFrame { data: Bytes, // Annex B NALUs with start codes pts_us: i64, // Presentation timestamp (microseconds, for WebCodecs) duration: Duration, // Frame duration for timestamp calculation frame_type: FrameType, // Keyframe or Delta (matches transport framing) width: u32, // Frame width (may differ from capture on ROI) height: u32, // Frame height } ``` **Timestamp convention**: `pts_us` is in **microseconds** (not nanoseconds), matching WebCodecs' `EncodedVideoChunk.timestamp` requirement. The server tracks a monotonic PTS starting from 0, incrementing by `1_000_000 / fps` per frame. --- ## 4. State Machine ### 4.1 EncConstructionStage ``` ┌──────────────────┐ App start │ ProbingOutputs │ Discover Wayland outputs, │ └────────┬─────────┘ collect geometry info ▼ │ All outputs probed ┌───────────────┐ ▼ │ ProbingOutputs├──→ ┌──────────────────┐ └───────────────┘ │EverythingButFmt │ HW device ctx created, └────────┬─────────┘ encoder initialized │ negotiate_format() ▼ ┌───────────┐ ┌─────→│ Streaming │──── Active capture + encode + transport │ └─────┬─────┘ │ │ Output disconnected │ Format │ ┌──────────────┐ │ changed │ │OutputWentAway│ Keep enc + transport, │ │ └──────┬───────┘ drop capture objects └────────────┘ │ Same output reconnects ←───────────────────────┘ Intermediate transient exists at all transition arrows (mem::replace) ``` **Key design choice**: `Streaming` state holds both `EncState` (encoding pipeline) AND `TransportState` (active WebTransport sessions). On `OutputWentAway`, both are preserved — only capture objects are discarded. ### 4.2 InFlightSurface ``` None → AllocQueued → Allocd(Frame) → CopyQueued { surface, drm_map, frame, buffer } → None ``` 4-state enum with `assert!(matches!(...))` runtime guards. RAII cleanup on each state transition. Single-frame-in-flight constraint prevents buffer exhaustion. ### 4.3 TransportSessionState (new) ``` ┌───────────┐ connect ┌───────────┐ disconnect ┌───────────┐ │ Listening │ ──────────────→ │ Active │ ──────────────→ │ Closed │ │ (quinn │ │ (sending │ │ (cleanup) │ │ endpoint)│ │ frames) │ │ │ └───────────┘ └───────────┘ └───────────┘ ``` Multiple sessions can be `Active` simultaneously (Phase 2). Phase 1 supports exactly one. --- ## 5. Design Patterns The architecture employs several established software design patterns for managing complexity: | # | Pattern | Usage in wl-webrtc | |---|---------|-------------------| | 1 | Strategy Trait + Generic State | `CaptureSource` trait with `CapWlrScreencopy` / `CapExtImageCopy` backends | | 2 | Polymorphic Enum State Machine | `EncConstructionStage` — 5 variants with type-safe transitions | | 3 | Type-Safe Frame Lifecycle | `InFlightSurface` — 4-state enum with runtime guards | | 4 | Pin\ Self-Referential | Vulkan device context — for self-referential FFmpeg structs | | 5 | Independent Thread Pipe | tokio runtime replaces mpsc audio thread; same atomic flag pattern | | 6 | VRR-Aware Frame Rate Control | `FpsLimit` — one-frame-buffer delay for correct drop decisions | | 7 | Generic Dispatch 3-Layer | Wayland protocol dispatch — generic event handling | | 8 | Three-Stage Safe Construction | Incremental resource acquisition with partial state rollback | | 9 | Hot-Plug Auto-Recovery | `OutputWentAway` — preserve encoder/transport, rebuild capture | | 10 | Zero-Copy GPU Pipeline | DMA-BUF capture + GPU-internal encode, minimal CPU involvement | --- ## 6. Transport Protocol Design ### 6.1 WebTransport Connection Setup ``` Server generates self-signed TLS certificate (via wtransport built-in rcgen support) → wtransport::Endpoint::server(server_config, addr) → Browser: new WebTransport("https://server:PORT/wt") → wtransport handles full HTTP/3 + WebTransport handshake internally → Session established (datagrams + streams available) ``` **Transport library**: We use `wtransport` crate (v0.7) which provides a complete WebTransport-over-HTTP/3 server implementation built on top of `quinn` 0.11 and `rustls` 0.23. This handles all protocol details (HTTP/3 SETTINGS, CONNECT method with `:protocol = webtransport`, session management, datagram framing per RFC 9297). Raw `quinn` or `h3` would require building this protocol stack manually. ### 6.2 Frame Framing Protocol QUIC datagrams have a practical MTU of ~1200 bytes. A 1080p H.264 frame is typically 10KB-200KB. Application-level framing: ``` Datagram format: ┌──────────┬──────────┬──────────┬──────────┬──────────┬─────────────┐ │ type (1) │ frame_id │ pts_us │ seq_num │ total │ payload │ │ │ (4 bytes)│ (8 bytes)│ (2 bytes)│ (2 bytes)│ (variable) │ └──────────┴──────────┴──────────┴──────────┴──────────┴─────────────┘ type: 0x01 = Keyframe fragment (sent via reliable stream, not datagram) 0x02 = Delta frame fragment (sent via datagram) 0x03 = Keyframe complete (small enough for single datagram) 0x04 = Delta frame complete 0x10 = Codec config (SPS/PPS for H.264, VPS/SPS/PPS for HEVC) pts_us: Presentation timestamp in microseconds (i64, big-endian). Passed directly to WebCodecs EncodedVideoChunk.timestamp. For fragmented frames, every fragment carries the same pts_us. ``` **Key design decisions**: - **Keyframes via reliable WebTransport stream**: SPS/PPS + IDR data must not be lost. Use `session.open_uni().await` for reliable delivery. - **Delta frames via datagram**: Loss-tolerant. If a delta frame is lost, the decoder waits for the next keyframe. This avoids accumulated corruption. - **Frame reassembly in browser**: Buffer fragments by `frame_id`, reassemble when all `total` fragments arrive, decode complete frame. - **Timestamp in microseconds**: The fragment header carries `pts_us: i64` (presentation timestamp in microseconds) so the browser can pass it directly to `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp. ### 6.3 Codec Configuration Exchange The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. The browser configures the decoder in **Annex B mode** (no `description` at `configure()` time), and SPS/PPS arrive in-band with each keyframe. On session establishment, the server sends a codec configuration message over the reliable QUIC stream to inform the browser of the codec and dimensions: ```json { "type": "codec_config", "codec": "avc1.42E01F", "width": 1920, "height": 1080, "framerate": 60 } ``` Browser uses this to configure `VideoDecoder` — without `description`, which activates Annex B mode: ```javascript decoder.configure({ codec: config.codec, codedWidth: config.width, codedHeight: config.height, // NO description — Annex B mode. SPS/PPS arrive in-band with each keyframe. }); ``` **Why no AVCC description?** Per the WebCodecs AVC registration spec, providing `description` forces the decoder into AVC (length-prefixed) mode for ALL frames. Since our encoder outputs Annex B (start-code-prefixed), we must omit `description` and rely on in-band parameter sets guaranteed by the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). Note: `repeat_headers=1` is a libx264-only option — it does NOT work with `h264_vaapi`. **Timestamp handling**: The `FragmentHeader` carries both a `frame_id` (u32) for reassembly ordering and `pts_us` (i64) — the presentation timestamp in microseconds. The browser uses `pts_us` directly as `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp. Every fragment of a frame carries the same `pts_us` value so the browser can extract it from any fragment during reassembly. --- ## 7. Browser-Side Design ### 7.1 Web UI (static/index.html + player.js) Single-page application with minimal dependencies: ``` ┌──────────────────────────────────────┐ │ wl-webrtc │ │ ┌──────────────────────────────┐ │ │ │ │ │ │ │ (video) │ │ │ │ WebCodecs → drawImage │ │ │ │ │ │ │ └──────────────────────────────┘ │ │ Status: Connected | Latency: 23ms │ │ Resolution: 1920x1080 @ 60fps │ │ [Fullscreen] [Disconnect] │ └──────────────────────────────────────┘ ``` ### 7.2 WebCodecs Decoder Pipeline **CRITICAL: Annex B mode only.** Per the [W3C AVC WebCodecs Registration](https://w3c.github.io/webcodecs/avc_codec_registration.html#videodecoderconfig-description), if `description` is provided at `configure()` time, ALL subsequent `EncodedVideoChunk` data must be in AVC format (4-byte length-prefixed). If `description` is **absent**, the bitstream is assumed to be in Annex B format (start-code-prefixed). Since our encoder outputs Annex B, we must NOT provide `description`. The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. This enables the decoder to initialize from keyframe data alone. ```javascript // Simplified player.js flow const transport = new WebTransport("https://server:PORT/wt"); const decoder = new VideoDecoder({ output: (frame) => { ctx.drawImage(frame, 0, 0); frame.close(); }, error: (e) => console.error(e), }); // Configure WITHOUT description → Annex B mode. // SPS/PPS are delivered in-band with each keyframe (via h264_metadata BSF repeat_sps=1 repeat_pps=1 on encoder). decoder.configure({ codec: "avc1.42E01F", codedWidth: 1920, codedHeight: 1080, // NO description field — Annex B mode }); // Receive frames const reader = transport.datagrams.readable.getReader(); while (true) { const { value, done } = await reader.read(); if (done) break; const frame = reassembleFrame(value); if (frame.complete) { decoder.decode(new EncodedVideoChunk({ type: frame.isKeyframe ? "key" : "delta", timestamp: Number(frame.ptsUs), data: frame.data, // Annex B — valid because no description was provided })); } } ``` ### 7.3 No Annex B → AVCC Conversion Needed Because we configure the decoder in Annex B mode (no `description`), no format conversion is needed on the browser side. The server sends raw Annex B NAL units with start codes (`00 00 00 01`), and the decoder accepts them directly. The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are included in every IDR frame. Note: `repeat_headers=1` (and `-flags2 +repeat_headers`) are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. This ensures the decoder can re-initialize after any keyframe, even if it missed earlier configuration data. --- ## 8. Error Handling & Recovery ### 8.1 Display Hot-Plug 1. `wl_registry.global_remove` → set `output_went_away` flag 2. `on_copy_fail()` detects flag → transition to `OutputWentAway` 3. Preserve: encoder context, transport sessions, WebRTC connections 4. Discard: Wayland protocol objects (invalidated) 5. Wait for same-name output ("DP-1") to reappear 6. Create new `CaptureSource`, reuse old encoder, continue streaming ### 8.2 Network Disconnection - QUIC handles keepalive and retransmission internally - Client page refresh → new WebTransport session → server auto-starts sending current frame stream - Server is stateless per session — no recovery needed, just reconnect ### 8.3 Dynamic Format Change Capture format changes (resolution, rotation): 1. Rebuild: `frames_rgb`, `video_filter`, `enc_video`, `frames_yuv` 2. Preserve: `hw_device_ctx`, `transport_state` 3. Send new codec configuration to browser via reliable stream 4. Browser reconfigures `VideoDecoder` with new SPS/PPS and dimensions ### 8.4 Frame Loss Handling - Lost delta frame → decoder continues, minor artifact until next keyframe - Lost keyframe → decoder cannot continue → request keyframe from server via reliable stream - Server receives keyframe request → sets next input frame to `AV_PICTURE_TYPE_I` ### 8.5 Graceful Shutdown Shutdown is triggered by SIGINT/SIGTERM via `signal-hook` + `mio` integration: 1. Main loop sets `running = false` flag → stops queuing new captures 2. Wait for in-flight frame to complete (drain `InFlightSurface`) 3. Flush encoder (`avcodec_flush_buffers`) → drain remaining packets 4. Send final frames through channel 5. Drop `frame_tx` sender → signals EOF to transport 6. Transport server drains pending frames, sends GOAWAY to clients 7. `tokio::runtime::shutdown_background()` terminates async tasks 8. Drop Wayland protocol objects (compositor handles cleanup) 9. FFmpeg contexts freed via `Drop` implementations **Key concern**: Do NOT use blocking `send_blocking()` on the main thread — use `try_send()` so the main loop never stalls during shutdown. If the channel is full, the frame is dropped (acceptable during shutdown). **NOTE**: wayland-client 0.31 uses `Connection::connect_to_env()` and `GlobalList` instead of the old 0.29 API (`Display::connect_to_env()` / `GlobalManager::new()`). See plan Task 11 for correct API usage. ### 8.6 First Keyframe Delivery When a new WebTransport session is established, the client needs a keyframe before it can decode any delta frames. Two strategies: 1. **Force IDR on connect**: Set `AV_PICTURE_TYPE_I` on the next encoded frame when a new session is detected 2. **Buffer last keyframe**: Store the most recent keyframe in `TransportServer`, resend to new clients Phase 1 uses strategy 1 (force IDR) for simplicity. The transport server sets a `needs_keyframe: bool` flag on new sessions, which the encode loop checks. --- ## 9. Dependencies ```toml [dependencies] # Wayland screen capture wayland-client = "0.31" wayland-protocols = { version = "0.32", features = ["client", "unstable", "staging"] } wayland-protocols-wlr = { version = "0.3", features = ["client"] } drm-fourcc = "2" # GPU encoding ffmpeg-next = "8" # WebTransport (HTTP/3 + WebTransport protocol, built on quinn + rustls) wtransport = { version = "0.7", features = ["self-signed"] } # Web UI axum = { version = "0.8", features = ["ws"] } tower-http = { version = "0.6", features = ["cors"] } rust-embed = { version = "8", features = ["mime-guess"] } # Async runtime tokio = { version = "1", features = ["full"] } # Sync/async bridge (sync send() on mio thread, async recv() on tokio) async-channel = "2" # Event loop mio = "1" # Utilities clap = { version = "4", features = ["derive"] } tracing = "0.1" tracing-subscriber = "0.3" anyhow = "1" bytes = "1" serde = { version = "1", features = ["derive"] } serde_json = "1" signal-hook = { version = "0.3", features = ["iterator"] } base64 = "0.22" mime_guess = "2" ``` **Encoder configuration note**: The VAAPI H.264 encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS parameter sets are emitted in-band with every IDR frame. This is required for WebCodecs Annex B decode mode on the browser side. **Important**: `repeat_headers=1` and `-flags2 +repeat_headers` are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. --- ## 10. Implementation Phases ### Phase 1 — MVP: Screen → Browser Streaming | # | Module | Description | Estimated Effort | |---|--------|-------------|------------------| | 1 | `main.rs` | CLI args, startup sequence | Small | | 2 | `cap_*.rs` | Implement capture backends (wlr-screencopy + ext-image-copy) | Medium | | 3 | `avhw.rs` | Implement FFmpeg HW device/frame context management | Medium | | 4 | `filter.rs` | Implement GPU video filter graph | Small | | 5 | `transform.rs` | Implement coordinate transforms for Wayland outputs | Small | | 6 | `fps_limit.rs` | Implement VRR-aware frame rate limiter | Small | | 7 | `state.rs` | State machine adapted for transport | Medium | | 8 | `transport.rs` | QUIC server + frame distribution | Large (new code) | | 9 | `nalu.rs` | Annex B framing protocol | Small (new code) | | 10 | `signaling.rs` | axum server + static files | Small (new code) | | 11 | `static/*` | Browser Web UI + WebCodecs player | Medium (new code) | **Deliverable**: Run `wl-webrtc`, open `https://localhost:PORT` in Chrome, see live screen at <50ms latency. ### Phase 2 — Remote Input + Stability | # | Feature | Description | |---|---------|-------------| | 12 | Remote input | Browser mouse/keyboard → wlr-virtual-pointer/virtual-keyboard | | 13 | Hot-plug recovery | Display disconnect/reconnect | | 14 | Dynamic format | Resolution/rotation change handling | | 15 | Multi-client | Multiple simultaneous browser viewers | ### Phase 3 — Optimization + Compatibility | # | Feature | Description | |---|---------|-------------| | 16 | Adaptive bitrate | Network-aware VAAPI bit_rate adjustment | | 17 | Audio pipeline | Synchronous audio capture + encoding + transport | | 18 | WebRTC fallback | webrtc-rs path for Firefox compatibility | | 19 | Performance dashboard | Real-time stats in Web UI | --- ## 11. Open Questions 1. **ffmpeg-next vs direct VAAPI bindings**: ffmpeg-next adds FFI overhead but provides mature encoding pipeline. Direct vaapi-dmabuf bindings would be more Rust-native but much more implementation work. **Decision: ffmpeg-next for Phase 1, evaluate direct bindings in Phase 3.** NOTE: `ffmpeg-next` safe API does NOT wrap hardware contexts (`AVBufferRef`, `AVHWFramesContext`). Use raw `ffmpeg_next::ffi` directly for all HW context operations — see `wl-screenrec/src/avhw.rs` for the reference pattern. 2. **Frame fragmentation strategy**: Current design fragments large frames across QUIC datagrams at byte boundaries (not NAL-aligned). The framing protocol reassembles by `frame_id`, so a lost fragment invalidates the entire frame. Alternative: send all frames via reliable QUIC streams and accept slightly higher latency. **Decision: Start with datagrams for delta frames, measure latency, evaluate.** 3. **Self-signed certificate UX**: Browser will show SSL warning. Options: (a) accept for LAN, (b) guide user to trust CA, (c) use HTTP/2 prior knowledge. **Decision: Accept for Phase 1, add CA trust guide in Phase 2.** 4. **HEVC vs H.264 default**: H.264 has universal browser support. HEVC has better compression but spotty browser support. **Decision: H.264 default, HEVC as option flag.** 5. **WebCodecs bitstream format**: **Decision: Annex B mode (no `description` at configure time).** SPS/PPS are guaranteed in-band via the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). **Important**: The `repeat_headers=1` encoder option is libx264-only — it does NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. Per the W3C AVC WebCodecs Registration, providing `description` forces AVC (length-prefixed) mode for ALL subsequent frames. Since our encoder outputs Annex B, we must omit `description`.