feat: Phase 1 MVP with audit fixes — Wayland screen capture + VAAPI encoding
Phase 1 MVP implementation of wl-webrtc: Wayland screen capture tool with hardware-accelerated VAAPI H.264 encoding and WebTransport output. Includes all 9 runtime bug fixes from code audit (fix-audit-issues plan): CRITICAL: - C2: h264_metadata BSF with repeat_sps/repeat_pps in encode pipeline - C4: FpsLimit wired as timing gate in on_copy_complete HIGH: - C3+A2: DRM device discovery via dmabuf feedback MainDevice event, unified resolve_drm_path() helper (CLI > compositor > auto > fallback) - H2: Separate physical_size (mm) from mode_size (pixels) in wl_output - H1+A3: Multi-output warning + named-output-not-found error MEDIUM: - M5: tv_sec u32->u64 to avoid Y2106 timestamp truncation - M4: Guard against SHM Buffer event (DMA-BUF only) Key components: - src/avhw.rs: FFmpeg VAAPI encoder + filter graph + BSF pipeline - src/state.rs: Wayland event loop + output negotiation + screencopy - src/cap_wlr_screencopy.rs: wlr-screencopy capture source - src/fps_limit.rs: Frame rate limiting with configurable target - src/transform.rs: Frame format conversion utilities
This commit is contained in:
2518
docs/superpowers/plans/2026-04-03-wl-webrtc-phase1.md
Normal file
2518
docs/superpowers/plans/2026-04-03-wl-webrtc-phase1.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,619 @@
|
||||
# wl-webrtc Architecture Design
|
||||
|
||||
**Date**: 2026-04-03
|
||||
**Status**: Draft
|
||||
**Author**: Sisyphus (AI-assisted design)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
### 1.1 Problem Statement
|
||||
|
||||
Build a low-latency Wayland screen sharing server that captures the desktop via GPU, encodes with hardware acceleration (VAAPI/Vulkan), and streams to a browser for remote viewing and eventual remote control.
|
||||
|
||||
### 1.2 Goals
|
||||
|
||||
- **Glass-to-glass latency < 50ms** on LAN
|
||||
- **GPU-accelerated pipeline**: capture + encode entirely on GPU, only encoded bitstream crosses to CPU
|
||||
- **Browser-only client**: no native app installation required
|
||||
- **Single binary deployment**: embedded web UI, no external dependencies
|
||||
- **Linux Wayland only**: no cross-platform abstraction needed
|
||||
- **Annex B mode**: encoder must emit in-band SPS/PPS with every keyframe via the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) — NOT `repeat_headers=1` (that option is libx264-only and does NOT exist for `h264_vaapi`)
|
||||
- **Annex B streaming**: encoder outputs Annex B (start-code-prefixed) NAL units with SPS/PPS injected per-IDR via `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`), browser decodes in Annex B mode via WebCodecs. Note: `repeat_headers=1` is a libx264-only option, NOT available for `h264_vaapi`.
|
||||
|
||||
### 1.3 Non-Goals (Phase 1)
|
||||
|
||||
- Multi-client support (Phase 2)
|
||||
- Audio streaming (Phase 3)
|
||||
- Remote input injection (Phase 2)
|
||||
- Firefox support (Phase 3 — WebRTC fallback)
|
||||
- Adaptive bitrate (Phase 3)
|
||||
|
||||
### 1.4 Technology Stack
|
||||
|
||||
| Component | Technology | Rationale |
|
||||
|-----------|-----------|-----------|
|
||||
| Screen capture | wayland-client + DMA-BUF | Zero-copy GPU capture via DMA-BUF |
|
||||
| GPU encoding | FFmpeg (ffmpeg-next) VAAPI/Vulkan | H.264/HEVC hardware encoding |
|
||||
| Transport | wtransport (WebTransport over HTTP/3) | Full HTTP/3 + WebTransport protocol, built on quinn + rustls |
|
||||
| Browser decode | WebCodecs VideoDecoder | Direct decode control, no MSE buffering |
|
||||
| Web UI | axum + rust-embed | Single binary, compile-time embedded static files |
|
||||
| Event loop | mio | Proven with Wayland file descriptor callbacks |
|
||||
| Async runtime | tokio | Required by wtransport, also powers axum |
|
||||
| Sync/async bridge | async_channel | Both sync send() and async recv(), bridges mio → tokio naturally |
|
||||
|
||||
### 1.5 Transport Decision: Why Not WebRTC
|
||||
|
||||
WebRTC was evaluated and rejected as the primary transport for this use case:
|
||||
|
||||
| Factor | WebRTC (webrtc-rs) | WebTransport + WebCodecs |
|
||||
|--------|-------------------|-------------------------|
|
||||
| Glass-to-glass latency | 30-110ms (unavoidable 20-60ms jitter buffer) | 12-38ms (no jitter buffer) |
|
||||
| Rust ecosystem | webrtc-rs v0.20.0-alpha, mid-rewrite | wtransport production-grade, built on quinn |
|
||||
| Protocol overhead | ICE/DTLS/SRTP/SDP — designed for P2P NAT traversal | QUIC TLS 1.3 — server-to-client, simpler |
|
||||
| Decode control | Browser controls jitter buffer, cannot opt out | Application controls every frame decode |
|
||||
| GPU data path | Sample { data: Bytes }, must copy to CPU | Same copy, but shorter pipeline |
|
||||
| Browser support | All browsers | Chrome/Edge only (Firefox lacks WebCodecs) |
|
||||
|
||||
**Transport library choice**: We use the `wtransport` crate (v0.7) instead of raw `quinn` + `h3`. The browser's `WebTransport` API requires a full HTTP/3 server with the WebTransport extension (RFC 9297). Raw QUIC is NOT sufficient — there is no browser API for raw QUIC connections. The `wtransport` crate provides the complete protocol stack (HTTP/3 + WebTransport) built on top of `quinn` 0.11 and `rustls` 0.23, with support for datagrams, unidirectional streams, and bidirectional streams.
|
||||
|
||||
WebRTC will be added as a Phase 3 fallback for Firefox compatibility.
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
### 2.1 Thread Model
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ wl-webrtc process │
|
||||
│ │
|
||||
│ Main Thread (mio event loop) │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Wayland event queue dispatch │ │
|
||||
│ │ Screen capture (DMA-BUF, zero-copy from compositor) │ │
|
||||
│ │ GPU encode (FFmpeg VAAPI/Vulkan, sync calls) │ │
|
||||
│ │ State machine transitions │ │
|
||||
│ │ FPS limiting │ │
|
||||
│ └──────────────────────┬───────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ async_channel::bounded<16>(EncodedFrame) │
|
||||
│ │ │
|
||||
│ Tokio Runtime Thread Pool (2+ threads) │
|
||||
│ ┌──────────────────────▼───────────────────────────────┐ │
|
||||
│ │ wtransport WebTransport server │ │
|
||||
│ │ HTTP/3 + WebTransport session management │ │
|
||||
│ │ Frame distribution to connected clients │ │
|
||||
│ │ axum HTTP server (Web UI + control API) │ │
|
||||
│ │ rust-embed static file serving │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Design rationale**:
|
||||
|
||||
- **Capture + encode on main thread**: GPU encoding is synchronous (3-8ms per frame at 30-60fps), doesn't block the mio event loop at these frame rates. This avoids cross-thread synchronization for the GPU pipeline.
|
||||
- **wtransport on tokio**: wtransport is built on quinn and tokio. axum requires tokio. Both coexist naturally. Both the WebTransport server and the HTTP static file server share the same tokio runtime.
|
||||
- **async_channel::bounded(16)**: Channel capacity of 16 frames provides ~260ms of buffer at 60fps — enough to absorb transport jitter without excessive latency. The sender uses `try_send()`: if the channel is full, the frame is dropped and logged. This is standard practice in real-time streaming — newer frames are always more valuable than older ones. `try_send()` returns `Err(TrySendError::Full(_))` on a full channel, which the main loop handles by discarding the frame. This avoids blocking the main mio event loop, which must remain responsive for Wayland event dispatch. **Do NOT use `send_blocking()`** on the mio thread — it would stall the capture pipeline if the transport consumer falls behind.
|
||||
|
||||
### 2.2 Module Dependency Graph
|
||||
|
||||
```
|
||||
┌──────────┐
|
||||
│ main.rs │ entry point, CLI, orchestration
|
||||
└──┬──┬─┬──┘
|
||||
│ │ │
|
||||
┌────────────┘ │ └────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌────────────┐
|
||||
│ state.rs │ │ avhw.rs │ │ transport.rs│
|
||||
│ StateMachine │ HW ctx │ │ QUIC server │
|
||||
│ CaptureSource │ │ │ Sessions │
|
||||
└──┬───┬────┘ └────┬─────┘ └────────────┘
|
||||
│ │ │
|
||||
┌─────┘ └──────┐ │
|
||||
▼ ▼ ▼
|
||||
┌─────────┐ ┌────────────┐ ┌────────┐
|
||||
│cap_wlr_ │ │cap_ext_ │ │filter.rs│
|
||||
│screen │ │image_copy │ │ crop/ │
|
||||
│copy │ │ │ │ scale/ │
|
||||
└─────────┘ └────────────┘ │transpose│
|
||||
└────────┘
|
||||
┌────────────┐ ┌──────────────┐
|
||||
│transform.rs│ │signaling.rs │
|
||||
│ coordinate │ │ axum + embed │
|
||||
│ transform │ │ Web UI serve │
|
||||
└────────────┘ └──────────────┘
|
||||
┌────────────┐
|
||||
│fps_limit.rs│
|
||||
└────────────┘
|
||||
```
|
||||
|
||||
**Dependency layers** (bottom-up):
|
||||
|
||||
1. `transform.rs`, `fps_limit.rs` — leaf modules, zero internal dependencies
|
||||
2. `avhw.rs`, `filter.rs` — FFmpeg wrapper layer
|
||||
3. `cap_wlr_screencopy.rs`, `cap_ext_image_copy.rs` — capture backends, depend on state + avhw
|
||||
4. `state.rs` — state machine + CaptureSource trait
|
||||
5. `transport.rs`, `signaling.rs` — network layer
|
||||
6. `main.rs` — orchestration
|
||||
|
||||
### 2.3 Project File Structure
|
||||
|
||||
```
|
||||
wl-webrtc/
|
||||
├── Cargo.toml
|
||||
├── README.md
|
||||
├── src/
|
||||
│ ├── main.rs # ~300 lines — CLI, startup, orchestration
|
||||
│ ├── state.rs # ~600 lines — State<S>, EncConstructionStage, InFlightSurface
|
||||
│ ├── avhw.rs # ~450 lines — FFmpeg HW device/frame contexts
|
||||
│ ├── filter.rs # ~200 lines — FFmpeg video filter graph
|
||||
│ ├── cap_wlr_screencopy.rs # ~170 lines — wlr-screencopy backend
|
||||
│ ├── cap_ext_image_copy.rs # ~240 lines — ext-image-copy-capture backend
|
||||
│ ├── transform.rs # ~220 lines — coordinate transforms
|
||||
│ ├── fps_limit.rs # ~130 lines — VRR-aware frame rate limiter
|
||||
│ ├── transport.rs # ~400 lines — QUIC/WebTransport server
|
||||
│ ├── signaling.rs # ~200 lines — axum HTTP + WebSocket control
|
||||
│ └── nalu.rs # ~150 lines — Annex B NAL unit splitting, framing protocol
|
||||
├── static/
|
||||
│ ├── index.html # Web UI shell
|
||||
│ ├── player.js # WebCodecs decoder + Canvas renderer
|
||||
│ └── style.css # Minimal styling
|
||||
└── protocols/ # Wayland protocol XML files
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Data Flow
|
||||
|
||||
### 3.1 Zero-Copy Capture Pipeline
|
||||
|
||||
```
|
||||
GPU Frame Pool ─alloc()→ HW Surface
|
||||
↓
|
||||
av_hwframe_map → DMA-BUF fd
|
||||
↓
|
||||
zwp_linux_dmabuf → WlBuffer (fd shared)
|
||||
↓
|
||||
Compositor writes directly to GPU Surface
|
||||
↓
|
||||
FFmpeg VAAPI/Vulkan encode (GPU-internal)
|
||||
↓
|
||||
AVPacket.data (Annex B with 00 00 00 01 start codes)
|
||||
↓ ← GPU→CPU copy via vaMapBuffer (unavoidable)
|
||||
Bytes::from(Vec<u8>) wrapper
|
||||
↓
|
||||
async_channel::bounded::send(EncodedFrame) // sync, non-blocking on main thread
|
||||
```
|
||||
|
||||
### 3.2 Transport Pipeline
|
||||
|
||||
```
|
||||
async_channel::bounded::recv(EncodedFrame)
|
||||
↓
|
||||
Frame byte-splitting at MTU boundaries (not NAL-aligned)
|
||||
↓
|
||||
┌─ Keyframe → QUIC reliable stream (guaranteed delivery)
|
||||
└─ Delta frame → QUIC datagram (unreliable, low latency)
|
||||
↓
|
||||
Quinn WebTransport send
|
||||
↓
|
||||
Browser WebTransport.receive()
|
||||
↓
|
||||
Frame reassembly (if fragmented)
|
||||
↓
|
||||
WebCodecs VideoDecoder.decode(EncodedVideoChunk)
|
||||
↓
|
||||
Canvas.drawImage(VideoFrame)
|
||||
```
|
||||
|
||||
### 3.3 Latency Budget
|
||||
|
||||
| Stage | Latency | Notes |
|
||||
|-------|---------|-------|
|
||||
| Wayland capture (KMS/dmabuf) | 1-3ms | Zero-copy from compositor |
|
||||
| GPU encode (VAAPI H.264) | 3-8ms | Synchronous, main thread |
|
||||
| vaMapBuffer CPU copy | <1ms | Unavoidable GPU→CPU |
|
||||
| async_channel | <0.1ms | In-process |
|
||||
| QUIC datagram (LAN) | 1-10ms | LAN transit, merged with network |
|
||||
| WebCodecs decode | 2-5ms | Browser hardware decode |
|
||||
| Canvas render | 1-2ms | requestAnimationFrame |
|
||||
| **Total (LAN)** | **9-29ms** | Well under 50ms target (corrected: removed double-counted network transit) |
|
||||
|
||||
### 3.4 EncodedFrame Structure
|
||||
|
||||
```rust
|
||||
#[derive(Clone)]
|
||||
struct EncodedFrame {
|
||||
data: Bytes, // Annex B NALUs with start codes
|
||||
pts_us: i64, // Presentation timestamp (microseconds, for WebCodecs)
|
||||
duration: Duration, // Frame duration for timestamp calculation
|
||||
frame_type: FrameType, // Keyframe or Delta (matches transport framing)
|
||||
width: u32, // Frame width (may differ from capture on ROI)
|
||||
height: u32, // Frame height
|
||||
}
|
||||
```
|
||||
|
||||
**Timestamp convention**: `pts_us` is in **microseconds** (not nanoseconds), matching WebCodecs' `EncodedVideoChunk.timestamp` requirement. The server tracks a monotonic PTS starting from 0, incrementing by `1_000_000 / fps` per frame.
|
||||
|
||||
---
|
||||
|
||||
## 4. State Machine
|
||||
|
||||
### 4.1 EncConstructionStage
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
App start │ ProbingOutputs │ Discover Wayland outputs,
|
||||
│ └────────┬─────────┘ collect geometry info
|
||||
▼ │ All outputs probed
|
||||
┌───────────────┐ ▼
|
||||
│ ProbingOutputs├──→ ┌──────────────────┐
|
||||
└───────────────┘ │EverythingButFmt │ HW device ctx created,
|
||||
└────────┬─────────┘ encoder initialized
|
||||
│ negotiate_format()
|
||||
▼
|
||||
┌───────────┐
|
||||
┌─────→│ Streaming │──── Active capture + encode + transport
|
||||
│ └─────┬─────┘
|
||||
│ │ Output disconnected
|
||||
│ Format │ ┌──────────────┐
|
||||
│ changed │ │OutputWentAway│ Keep enc + transport,
|
||||
│ │ └──────┬───────┘ drop capture objects
|
||||
└────────────┘ │ Same output reconnects
|
||||
←───────────────────────┘
|
||||
|
||||
Intermediate transient exists at all transition arrows (mem::replace)
|
||||
```
|
||||
|
||||
**Key design choice**: `Streaming` state holds both `EncState` (encoding pipeline) AND `TransportState` (active WebTransport sessions). On `OutputWentAway`, both are preserved — only capture objects are discarded.
|
||||
|
||||
### 4.2 InFlightSurface
|
||||
|
||||
```
|
||||
None → AllocQueued → Allocd(Frame) → CopyQueued { surface, drm_map, frame, buffer } → None
|
||||
```
|
||||
|
||||
4-state enum with `assert!(matches!(...))` runtime guards. RAII cleanup on each state transition. Single-frame-in-flight constraint prevents buffer exhaustion.
|
||||
|
||||
### 4.3 TransportSessionState (new)
|
||||
|
||||
```
|
||||
┌───────────┐ connect ┌───────────┐ disconnect ┌───────────┐
|
||||
│ Listening │ ──────────────→ │ Active │ ──────────────→ │ Closed │
|
||||
│ (quinn │ │ (sending │ │ (cleanup) │
|
||||
│ endpoint)│ │ frames) │ │ │
|
||||
└───────────┘ └───────────┘ └───────────┘
|
||||
```
|
||||
|
||||
Multiple sessions can be `Active` simultaneously (Phase 2). Phase 1 supports exactly one.
|
||||
|
||||
---
|
||||
|
||||
## 5. Design Patterns
|
||||
|
||||
The architecture employs several established software design patterns for managing complexity:
|
||||
|
||||
| # | Pattern | Usage in wl-webrtc |
|
||||
|---|---------|-------------------|
|
||||
| 1 | Strategy Trait + Generic State | `CaptureSource` trait with `CapWlrScreencopy` / `CapExtImageCopy` backends |
|
||||
| 2 | Polymorphic Enum State Machine | `EncConstructionStage` — 5 variants with type-safe transitions |
|
||||
| 3 | Type-Safe Frame Lifecycle | `InFlightSurface` — 4-state enum with runtime guards |
|
||||
| 4 | Pin\<Box\> Self-Referential | Vulkan device context — for self-referential FFmpeg structs |
|
||||
| 5 | Independent Thread Pipe | tokio runtime replaces mpsc audio thread; same atomic flag pattern |
|
||||
| 6 | VRR-Aware Frame Rate Control | `FpsLimit<T>` — one-frame-buffer delay for correct drop decisions |
|
||||
| 7 | Generic Dispatch 3-Layer | Wayland protocol dispatch — generic event handling |
|
||||
| 8 | Three-Stage Safe Construction | Incremental resource acquisition with partial state rollback |
|
||||
| 9 | Hot-Plug Auto-Recovery | `OutputWentAway` — preserve encoder/transport, rebuild capture |
|
||||
| 10 | Zero-Copy GPU Pipeline | DMA-BUF capture + GPU-internal encode, minimal CPU involvement |
|
||||
|
||||
---
|
||||
|
||||
## 6. Transport Protocol Design
|
||||
|
||||
### 6.1 WebTransport Connection Setup
|
||||
|
||||
```
|
||||
Server generates self-signed TLS certificate (via wtransport built-in rcgen support)
|
||||
→ wtransport::Endpoint::server(server_config, addr)
|
||||
→ Browser: new WebTransport("https://server:PORT/wt")
|
||||
→ wtransport handles full HTTP/3 + WebTransport handshake internally
|
||||
→ Session established (datagrams + streams available)
|
||||
```
|
||||
|
||||
**Transport library**: We use `wtransport` crate (v0.7) which provides a complete WebTransport-over-HTTP/3 server implementation built on top of `quinn` 0.11 and `rustls` 0.23. This handles all protocol details (HTTP/3 SETTINGS, CONNECT method with `:protocol = webtransport`, session management, datagram framing per RFC 9297). Raw `quinn` or `h3` would require building this protocol stack manually.
|
||||
|
||||
### 6.2 Frame Framing Protocol
|
||||
|
||||
QUIC datagrams have a practical MTU of ~1200 bytes. A 1080p H.264 frame is typically 10KB-200KB. Application-level framing:
|
||||
|
||||
```
|
||||
Datagram format:
|
||||
┌──────────┬──────────┬──────────┬──────────┬──────────┬─────────────┐
|
||||
│ type (1) │ frame_id │ pts_us │ seq_num │ total │ payload │
|
||||
│ │ (4 bytes)│ (8 bytes)│ (2 bytes)│ (2 bytes)│ (variable) │
|
||||
└──────────┴──────────┴──────────┴──────────┴──────────┴─────────────┘
|
||||
|
||||
type:
|
||||
0x01 = Keyframe fragment (sent via reliable stream, not datagram)
|
||||
0x02 = Delta frame fragment (sent via datagram)
|
||||
0x03 = Keyframe complete (small enough for single datagram)
|
||||
0x04 = Delta frame complete
|
||||
0x10 = Codec config (SPS/PPS for H.264, VPS/SPS/PPS for HEVC)
|
||||
|
||||
pts_us: Presentation timestamp in microseconds (i64, big-endian).
|
||||
Passed directly to WebCodecs EncodedVideoChunk.timestamp.
|
||||
For fragmented frames, every fragment carries the same pts_us.
|
||||
```
|
||||
|
||||
**Key design decisions**:
|
||||
- **Keyframes via reliable WebTransport stream**: SPS/PPS + IDR data must not be lost. Use `session.open_uni().await` for reliable delivery.
|
||||
- **Delta frames via datagram**: Loss-tolerant. If a delta frame is lost, the decoder waits for the next keyframe. This avoids accumulated corruption.
|
||||
- **Frame reassembly in browser**: Buffer fragments by `frame_id`, reassemble when all `total` fragments arrive, decode complete frame.
|
||||
- **Timestamp in microseconds**: The fragment header carries `pts_us: i64` (presentation timestamp in microseconds) so the browser can pass it directly to `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp.
|
||||
|
||||
### 6.3 Codec Configuration Exchange
|
||||
|
||||
The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. The browser configures the decoder in **Annex B mode** (no `description` at `configure()` time), and SPS/PPS arrive in-band with each keyframe.
|
||||
|
||||
On session establishment, the server sends a codec configuration message over the reliable QUIC stream to inform the browser of the codec and dimensions:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "codec_config",
|
||||
"codec": "avc1.42E01F",
|
||||
"width": 1920,
|
||||
"height": 1080,
|
||||
"framerate": 60
|
||||
}
|
||||
```
|
||||
|
||||
Browser uses this to configure `VideoDecoder` — without `description`, which activates Annex B mode:
|
||||
|
||||
```javascript
|
||||
decoder.configure({
|
||||
codec: config.codec,
|
||||
codedWidth: config.width,
|
||||
codedHeight: config.height,
|
||||
// NO description — Annex B mode. SPS/PPS arrive in-band with each keyframe.
|
||||
});
|
||||
```
|
||||
|
||||
**Why no AVCC description?** Per the WebCodecs AVC registration spec, providing `description` forces the decoder into AVC (length-prefixed) mode for ALL frames. Since our encoder outputs Annex B (start-code-prefixed), we must omit `description` and rely on in-band parameter sets guaranteed by the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). Note: `repeat_headers=1` is a libx264-only option — it does NOT work with `h264_vaapi`.
|
||||
|
||||
**Timestamp handling**: The `FragmentHeader` carries both a `frame_id` (u32) for reassembly ordering and `pts_us` (i64) — the presentation timestamp in microseconds. The browser uses `pts_us` directly as `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp. Every fragment of a frame carries the same `pts_us` value so the browser can extract it from any fragment during reassembly.
|
||||
|
||||
---
|
||||
|
||||
## 7. Browser-Side Design
|
||||
|
||||
### 7.1 Web UI (static/index.html + player.js)
|
||||
|
||||
Single-page application with minimal dependencies:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────┐
|
||||
│ wl-webrtc │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ │ │
|
||||
│ │ <canvas> (video) │ │
|
||||
│ │ WebCodecs → drawImage │ │
|
||||
│ │ │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
│ Status: Connected | Latency: 23ms │
|
||||
│ Resolution: 1920x1080 @ 60fps │
|
||||
│ [Fullscreen] [Disconnect] │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 7.2 WebCodecs Decoder Pipeline
|
||||
|
||||
**CRITICAL: Annex B mode only.** Per the [W3C AVC WebCodecs Registration](https://w3c.github.io/webcodecs/avc_codec_registration.html#videodecoderconfig-description), if `description` is provided at `configure()` time, ALL subsequent `EncodedVideoChunk` data must be in AVC format (4-byte length-prefixed). If `description` is **absent**, the bitstream is assumed to be in Annex B format (start-code-prefixed). Since our encoder outputs Annex B, we must NOT provide `description`.
|
||||
|
||||
The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. This enables the decoder to initialize from keyframe data alone.
|
||||
|
||||
```javascript
|
||||
// Simplified player.js flow
|
||||
const transport = new WebTransport("https://server:PORT/wt");
|
||||
const decoder = new VideoDecoder({
|
||||
output: (frame) => {
|
||||
ctx.drawImage(frame, 0, 0);
|
||||
frame.close();
|
||||
},
|
||||
error: (e) => console.error(e),
|
||||
});
|
||||
|
||||
// Configure WITHOUT description → Annex B mode.
|
||||
// SPS/PPS are delivered in-band with each keyframe (via h264_metadata BSF repeat_sps=1 repeat_pps=1 on encoder).
|
||||
decoder.configure({
|
||||
codec: "avc1.42E01F",
|
||||
codedWidth: 1920,
|
||||
codedHeight: 1080,
|
||||
// NO description field — Annex B mode
|
||||
});
|
||||
|
||||
// Receive frames
|
||||
const reader = transport.datagrams.readable.getReader();
|
||||
while (true) {
|
||||
const { value, done } = await reader.read();
|
||||
if (done) break;
|
||||
const frame = reassembleFrame(value);
|
||||
if (frame.complete) {
|
||||
decoder.decode(new EncodedVideoChunk({
|
||||
type: frame.isKeyframe ? "key" : "delta",
|
||||
timestamp: Number(frame.ptsUs),
|
||||
data: frame.data, // Annex B — valid because no description was provided
|
||||
}));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7.3 No Annex B → AVCC Conversion Needed
|
||||
|
||||
Because we configure the decoder in Annex B mode (no `description`), no format conversion is needed on the browser side. The server sends raw Annex B NAL units with start codes (`00 00 00 01`), and the decoder accepts them directly.
|
||||
|
||||
The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are included in every IDR frame. Note: `repeat_headers=1` (and `-flags2 +repeat_headers`) are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. This ensures the decoder can re-initialize after any keyframe, even if it missed earlier configuration data.
|
||||
|
||||
---
|
||||
|
||||
## 8. Error Handling & Recovery
|
||||
|
||||
### 8.1 Display Hot-Plug
|
||||
|
||||
1. `wl_registry.global_remove` → set `output_went_away` flag
|
||||
2. `on_copy_fail()` detects flag → transition to `OutputWentAway`
|
||||
3. Preserve: encoder context, transport sessions, WebRTC connections
|
||||
4. Discard: Wayland protocol objects (invalidated)
|
||||
5. Wait for same-name output ("DP-1") to reappear
|
||||
6. Create new `CaptureSource`, reuse old encoder, continue streaming
|
||||
|
||||
### 8.2 Network Disconnection
|
||||
|
||||
- QUIC handles keepalive and retransmission internally
|
||||
- Client page refresh → new WebTransport session → server auto-starts sending current frame stream
|
||||
- Server is stateless per session — no recovery needed, just reconnect
|
||||
|
||||
### 8.3 Dynamic Format Change
|
||||
|
||||
Capture format changes (resolution, rotation):
|
||||
1. Rebuild: `frames_rgb`, `video_filter`, `enc_video`, `frames_yuv`
|
||||
2. Preserve: `hw_device_ctx`, `transport_state`
|
||||
3. Send new codec configuration to browser via reliable stream
|
||||
4. Browser reconfigures `VideoDecoder` with new SPS/PPS and dimensions
|
||||
|
||||
### 8.4 Frame Loss Handling
|
||||
|
||||
- Lost delta frame → decoder continues, minor artifact until next keyframe
|
||||
- Lost keyframe → decoder cannot continue → request keyframe from server via reliable stream
|
||||
- Server receives keyframe request → sets next input frame to `AV_PICTURE_TYPE_I`
|
||||
|
||||
### 8.5 Graceful Shutdown
|
||||
|
||||
Shutdown is triggered by SIGINT/SIGTERM via `signal-hook` + `mio` integration:
|
||||
|
||||
1. Main loop sets `running = false` flag → stops queuing new captures
|
||||
2. Wait for in-flight frame to complete (drain `InFlightSurface`)
|
||||
3. Flush encoder (`avcodec_flush_buffers`) → drain remaining packets
|
||||
4. Send final frames through channel
|
||||
5. Drop `frame_tx` sender → signals EOF to transport
|
||||
6. Transport server drains pending frames, sends GOAWAY to clients
|
||||
7. `tokio::runtime::shutdown_background()` terminates async tasks
|
||||
8. Drop Wayland protocol objects (compositor handles cleanup)
|
||||
9. FFmpeg contexts freed via `Drop` implementations
|
||||
|
||||
**Key concern**: Do NOT use blocking `send_blocking()` on the main thread — use `try_send()` so the main loop never stalls during shutdown. If the channel is full, the frame is dropped (acceptable during shutdown).
|
||||
|
||||
**NOTE**: wayland-client 0.31 uses `Connection::connect_to_env()` and `GlobalList` instead of the old 0.29 API (`Display::connect_to_env()` / `GlobalManager::new()`). See plan Task 11 for correct API usage.
|
||||
|
||||
### 8.6 First Keyframe Delivery
|
||||
|
||||
When a new WebTransport session is established, the client needs a keyframe before it can decode any delta frames. Two strategies:
|
||||
|
||||
1. **Force IDR on connect**: Set `AV_PICTURE_TYPE_I` on the next encoded frame when a new session is detected
|
||||
2. **Buffer last keyframe**: Store the most recent keyframe in `TransportServer`, resend to new clients
|
||||
|
||||
Phase 1 uses strategy 1 (force IDR) for simplicity. The transport server sets a `needs_keyframe: bool` flag on new sessions, which the encode loop checks.
|
||||
|
||||
---
|
||||
|
||||
## 9. Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
# Wayland screen capture
|
||||
wayland-client = "0.31"
|
||||
wayland-protocols = { version = "0.32", features = ["client", "unstable", "staging"] }
|
||||
wayland-protocols-wlr = { version = "0.3", features = ["client"] }
|
||||
drm-fourcc = "2"
|
||||
|
||||
# GPU encoding
|
||||
ffmpeg-next = "8"
|
||||
|
||||
# WebTransport (HTTP/3 + WebTransport protocol, built on quinn + rustls)
|
||||
wtransport = { version = "0.7", features = ["self-signed"] }
|
||||
|
||||
# Web UI
|
||||
axum = { version = "0.8", features = ["ws"] }
|
||||
tower-http = { version = "0.6", features = ["cors"] }
|
||||
rust-embed = { version = "8", features = ["mime-guess"] }
|
||||
|
||||
# Async runtime
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
|
||||
# Sync/async bridge (sync send() on mio thread, async recv() on tokio)
|
||||
async-channel = "2"
|
||||
|
||||
# Event loop
|
||||
mio = "1"
|
||||
|
||||
# Utilities
|
||||
clap = { version = "4", features = ["derive"] }
|
||||
tracing = "0.1"
|
||||
tracing-subscriber = "0.3"
|
||||
anyhow = "1"
|
||||
bytes = "1"
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
signal-hook = { version = "0.3", features = ["iterator"] }
|
||||
base64 = "0.22"
|
||||
mime_guess = "2"
|
||||
```
|
||||
|
||||
**Encoder configuration note**: The VAAPI H.264 encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS parameter sets are emitted in-band with every IDR frame. This is required for WebCodecs Annex B decode mode on the browser side. **Important**: `repeat_headers=1` and `-flags2 +repeat_headers` are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders.
|
||||
|
||||
---
|
||||
|
||||
## 10. Implementation Phases
|
||||
|
||||
### Phase 1 — MVP: Screen → Browser Streaming
|
||||
|
||||
| # | Module | Description | Estimated Effort |
|
||||
|---|--------|-------------|------------------|
|
||||
| 1 | `main.rs` | CLI args, startup sequence | Small |
|
||||
| 2 | `cap_*.rs` | Implement capture backends (wlr-screencopy + ext-image-copy) | Medium |
|
||||
| 3 | `avhw.rs` | Implement FFmpeg HW device/frame context management | Medium |
|
||||
| 4 | `filter.rs` | Implement GPU video filter graph | Small |
|
||||
| 5 | `transform.rs` | Implement coordinate transforms for Wayland outputs | Small |
|
||||
| 6 | `fps_limit.rs` | Implement VRR-aware frame rate limiter | Small |
|
||||
| 7 | `state.rs` | State machine adapted for transport | Medium |
|
||||
| 8 | `transport.rs` | QUIC server + frame distribution | Large (new code) |
|
||||
| 9 | `nalu.rs` | Annex B framing protocol | Small (new code) |
|
||||
| 10 | `signaling.rs` | axum server + static files | Small (new code) |
|
||||
| 11 | `static/*` | Browser Web UI + WebCodecs player | Medium (new code) |
|
||||
|
||||
**Deliverable**: Run `wl-webrtc`, open `https://localhost:PORT` in Chrome, see live screen at <50ms latency.
|
||||
|
||||
### Phase 2 — Remote Input + Stability
|
||||
|
||||
| # | Feature | Description |
|
||||
|---|---------|-------------|
|
||||
| 12 | Remote input | Browser mouse/keyboard → wlr-virtual-pointer/virtual-keyboard |
|
||||
| 13 | Hot-plug recovery | Display disconnect/reconnect |
|
||||
| 14 | Dynamic format | Resolution/rotation change handling |
|
||||
| 15 | Multi-client | Multiple simultaneous browser viewers |
|
||||
|
||||
### Phase 3 — Optimization + Compatibility
|
||||
|
||||
| # | Feature | Description |
|
||||
|---|---------|-------------|
|
||||
| 16 | Adaptive bitrate | Network-aware VAAPI bit_rate adjustment |
|
||||
| 17 | Audio pipeline | Synchronous audio capture + encoding + transport |
|
||||
| 18 | WebRTC fallback | webrtc-rs path for Firefox compatibility |
|
||||
| 19 | Performance dashboard | Real-time stats in Web UI |
|
||||
|
||||
---
|
||||
|
||||
## 11. Open Questions
|
||||
|
||||
1. **ffmpeg-next vs direct VAAPI bindings**: ffmpeg-next adds FFI overhead but provides mature encoding pipeline. Direct vaapi-dmabuf bindings would be more Rust-native but much more implementation work. **Decision: ffmpeg-next for Phase 1, evaluate direct bindings in Phase 3.** NOTE: `ffmpeg-next` safe API does NOT wrap hardware contexts (`AVBufferRef`, `AVHWFramesContext`). Use raw `ffmpeg_next::ffi` directly for all HW context operations — see `wl-screenrec/src/avhw.rs` for the reference pattern.
|
||||
|
||||
2. **Frame fragmentation strategy**: Current design fragments large frames across QUIC datagrams at byte boundaries (not NAL-aligned). The framing protocol reassembles by `frame_id`, so a lost fragment invalidates the entire frame. Alternative: send all frames via reliable QUIC streams and accept slightly higher latency. **Decision: Start with datagrams for delta frames, measure latency, evaluate.**
|
||||
|
||||
3. **Self-signed certificate UX**: Browser will show SSL warning. Options: (a) accept for LAN, (b) guide user to trust CA, (c) use HTTP/2 prior knowledge. **Decision: Accept for Phase 1, add CA trust guide in Phase 2.**
|
||||
|
||||
4. **HEVC vs H.264 default**: H.264 has universal browser support. HEVC has better compression but spotty browser support. **Decision: H.264 default, HEVC as option flag.**
|
||||
|
||||
5. **WebCodecs bitstream format**: **Decision: Annex B mode (no `description` at configure time).** SPS/PPS are guaranteed in-band via the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). **Important**: The `repeat_headers=1` encoder option is libx264-only — it does NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. Per the W3C AVC WebCodecs Registration, providing `description` forces AVC (length-prefixed) mode for ALL subsequent frames. Since our encoder outputs Annex B, we must omit `description`.
|
||||
Reference in New Issue
Block a user