Files
wl-webrtc/docs/superpowers/specs/2026-04-03-wl-webrtc-architecture-design.md
dailz 6d49222de8 feat: Phase 1 MVP with audit fixes — Wayland screen capture + VAAPI encoding
Phase 1 MVP implementation of wl-webrtc: Wayland screen capture tool
with hardware-accelerated VAAPI H.264 encoding and WebTransport output.

Includes all 9 runtime bug fixes from code audit (fix-audit-issues plan):

CRITICAL:
- C2: h264_metadata BSF with repeat_sps/repeat_pps in encode pipeline
- C4: FpsLimit wired as timing gate in on_copy_complete

HIGH:
- C3+A2: DRM device discovery via dmabuf feedback MainDevice event,
  unified resolve_drm_path() helper (CLI > compositor > auto > fallback)
- H2: Separate physical_size (mm) from mode_size (pixels) in wl_output
- H1+A3: Multi-output warning + named-output-not-found error

MEDIUM:
- M5: tv_sec u32->u64 to avoid Y2106 timestamp truncation
- M4: Guard against SHM Buffer event (DMA-BUF only)

Key components:
- src/avhw.rs: FFmpeg VAAPI encoder + filter graph + BSF pipeline
- src/state.rs: Wayland event loop + output negotiation + screencopy
- src/cap_wlr_screencopy.rs: wlr-screencopy capture source
- src/fps_limit.rs: Frame rate limiting with configurable target
- src/transform.rs: Frame format conversion utilities
2026-04-05 23:35:00 +08:00

620 lines
34 KiB
Markdown

# wl-webrtc Architecture Design
**Date**: 2026-04-03
**Status**: Draft
**Author**: Sisyphus (AI-assisted design)
---
## 1. Overview
### 1.1 Problem Statement
Build a low-latency Wayland screen sharing server that captures the desktop via GPU, encodes with hardware acceleration (VAAPI/Vulkan), and streams to a browser for remote viewing and eventual remote control.
### 1.2 Goals
- **Glass-to-glass latency < 50ms** on LAN
- **GPU-accelerated pipeline**: capture + encode entirely on GPU, only encoded bitstream crosses to CPU
- **Browser-only client**: no native app installation required
- **Single binary deployment**: embedded web UI, no external dependencies
- **Linux Wayland only**: no cross-platform abstraction needed
- **Annex B mode**: encoder must emit in-band SPS/PPS with every keyframe via the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) — NOT `repeat_headers=1` (that option is libx264-only and does NOT exist for `h264_vaapi`)
- **Annex B streaming**: encoder outputs Annex B (start-code-prefixed) NAL units with SPS/PPS injected per-IDR via `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`), browser decodes in Annex B mode via WebCodecs. Note: `repeat_headers=1` is a libx264-only option, NOT available for `h264_vaapi`.
### 1.3 Non-Goals (Phase 1)
- Multi-client support (Phase 2)
- Audio streaming (Phase 3)
- Remote input injection (Phase 2)
- Firefox support (Phase 3 — WebRTC fallback)
- Adaptive bitrate (Phase 3)
### 1.4 Technology Stack
| Component | Technology | Rationale |
|-----------|-----------|-----------|
| Screen capture | wayland-client + DMA-BUF | Zero-copy GPU capture via DMA-BUF |
| GPU encoding | FFmpeg (ffmpeg-next) VAAPI/Vulkan | H.264/HEVC hardware encoding |
| Transport | wtransport (WebTransport over HTTP/3) | Full HTTP/3 + WebTransport protocol, built on quinn + rustls |
| Browser decode | WebCodecs VideoDecoder | Direct decode control, no MSE buffering |
| Web UI | axum + rust-embed | Single binary, compile-time embedded static files |
| Event loop | mio | Proven with Wayland file descriptor callbacks |
| Async runtime | tokio | Required by wtransport, also powers axum |
| Sync/async bridge | async_channel | Both sync send() and async recv(), bridges mio → tokio naturally |
### 1.5 Transport Decision: Why Not WebRTC
WebRTC was evaluated and rejected as the primary transport for this use case:
| Factor | WebRTC (webrtc-rs) | WebTransport + WebCodecs |
|--------|-------------------|-------------------------|
| Glass-to-glass latency | 30-110ms (unavoidable 20-60ms jitter buffer) | 12-38ms (no jitter buffer) |
| Rust ecosystem | webrtc-rs v0.20.0-alpha, mid-rewrite | wtransport production-grade, built on quinn |
| Protocol overhead | ICE/DTLS/SRTP/SDP — designed for P2P NAT traversal | QUIC TLS 1.3 — server-to-client, simpler |
| Decode control | Browser controls jitter buffer, cannot opt out | Application controls every frame decode |
| GPU data path | Sample { data: Bytes }, must copy to CPU | Same copy, but shorter pipeline |
| Browser support | All browsers | Chrome/Edge only (Firefox lacks WebCodecs) |
**Transport library choice**: We use the `wtransport` crate (v0.7) instead of raw `quinn` + `h3`. The browser's `WebTransport` API requires a full HTTP/3 server with the WebTransport extension (RFC 9297). Raw QUIC is NOT sufficient — there is no browser API for raw QUIC connections. The `wtransport` crate provides the complete protocol stack (HTTP/3 + WebTransport) built on top of `quinn` 0.11 and `rustls` 0.23, with support for datagrams, unidirectional streams, and bidirectional streams.
WebRTC will be added as a Phase 3 fallback for Firefox compatibility.
---
## 2. Architecture
### 2.1 Thread Model
```
┌─────────────────────────────────────────────────────────────┐
│ wl-webrtc process │
│ │
│ Main Thread (mio event loop) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Wayland event queue dispatch │ │
│ │ Screen capture (DMA-BUF, zero-copy from compositor) │ │
│ │ GPU encode (FFmpeg VAAPI/Vulkan, sync calls) │ │
│ │ State machine transitions │ │
│ │ FPS limiting │ │
│ └──────────────────────┬───────────────────────────────┘ │
│ │ │
│ async_channel::bounded<16>(EncodedFrame) │
│ │ │
│ Tokio Runtime Thread Pool (2+ threads) │
│ ┌──────────────────────▼───────────────────────────────┐ │
│ │ wtransport WebTransport server │ │
│ │ HTTP/3 + WebTransport session management │ │
│ │ Frame distribution to connected clients │ │
│ │ axum HTTP server (Web UI + control API) │ │
│ │ rust-embed static file serving │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Design rationale**:
- **Capture + encode on main thread**: GPU encoding is synchronous (3-8ms per frame at 30-60fps), doesn't block the mio event loop at these frame rates. This avoids cross-thread synchronization for the GPU pipeline.
- **wtransport on tokio**: wtransport is built on quinn and tokio. axum requires tokio. Both coexist naturally. Both the WebTransport server and the HTTP static file server share the same tokio runtime.
- **async_channel::bounded(16)**: Channel capacity of 16 frames provides ~260ms of buffer at 60fps — enough to absorb transport jitter without excessive latency. The sender uses `try_send()`: if the channel is full, the frame is dropped and logged. This is standard practice in real-time streaming — newer frames are always more valuable than older ones. `try_send()` returns `Err(TrySendError::Full(_))` on a full channel, which the main loop handles by discarding the frame. This avoids blocking the main mio event loop, which must remain responsive for Wayland event dispatch. **Do NOT use `send_blocking()`** on the mio thread — it would stall the capture pipeline if the transport consumer falls behind.
### 2.2 Module Dependency Graph
```
┌──────────┐
│ main.rs │ entry point, CLI, orchestration
└──┬──┬─┬──┘
│ │ │
┌────────────┘ │ └────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌────────────┐
│ state.rs │ │ avhw.rs │ │ transport.rs│
│ StateMachine │ HW ctx │ │ QUIC server │
│ CaptureSource │ │ │ Sessions │
└──┬───┬────┘ └────┬─────┘ └────────────┘
│ │ │
┌─────┘ └──────┐ │
▼ ▼ ▼
┌─────────┐ ┌────────────┐ ┌────────┐
│cap_wlr_ │ │cap_ext_ │ │filter.rs│
│screen │ │image_copy │ │ crop/ │
│copy │ │ │ │ scale/ │
└─────────┘ └────────────┘ │transpose│
└────────┘
┌────────────┐ ┌──────────────┐
│transform.rs│ │signaling.rs │
│ coordinate │ │ axum + embed │
│ transform │ │ Web UI serve │
└────────────┘ └──────────────┘
┌────────────┐
│fps_limit.rs│
└────────────┘
```
**Dependency layers** (bottom-up):
1. `transform.rs`, `fps_limit.rs` — leaf modules, zero internal dependencies
2. `avhw.rs`, `filter.rs` — FFmpeg wrapper layer
3. `cap_wlr_screencopy.rs`, `cap_ext_image_copy.rs` — capture backends, depend on state + avhw
4. `state.rs` — state machine + CaptureSource trait
5. `transport.rs`, `signaling.rs` — network layer
6. `main.rs` — orchestration
### 2.3 Project File Structure
```
wl-webrtc/
├── Cargo.toml
├── README.md
├── src/
│ ├── main.rs # ~300 lines — CLI, startup, orchestration
│ ├── state.rs # ~600 lines — State<S>, EncConstructionStage, InFlightSurface
│ ├── avhw.rs # ~450 lines — FFmpeg HW device/frame contexts
│ ├── filter.rs # ~200 lines — FFmpeg video filter graph
│ ├── cap_wlr_screencopy.rs # ~170 lines — wlr-screencopy backend
│ ├── cap_ext_image_copy.rs # ~240 lines — ext-image-copy-capture backend
│ ├── transform.rs # ~220 lines — coordinate transforms
│ ├── fps_limit.rs # ~130 lines — VRR-aware frame rate limiter
│ ├── transport.rs # ~400 lines — QUIC/WebTransport server
│ ├── signaling.rs # ~200 lines — axum HTTP + WebSocket control
│ └── nalu.rs # ~150 lines — Annex B NAL unit splitting, framing protocol
├── static/
│ ├── index.html # Web UI shell
│ ├── player.js # WebCodecs decoder + Canvas renderer
│ └── style.css # Minimal styling
└── protocols/ # Wayland protocol XML files
```
---
## 3. Data Flow
### 3.1 Zero-Copy Capture Pipeline
```
GPU Frame Pool ─alloc()→ HW Surface
av_hwframe_map → DMA-BUF fd
zwp_linux_dmabuf → WlBuffer (fd shared)
Compositor writes directly to GPU Surface
FFmpeg VAAPI/Vulkan encode (GPU-internal)
AVPacket.data (Annex B with 00 00 00 01 start codes)
↓ ← GPU→CPU copy via vaMapBuffer (unavoidable)
Bytes::from(Vec<u8>) wrapper
async_channel::bounded::send(EncodedFrame) // sync, non-blocking on main thread
```
### 3.2 Transport Pipeline
```
async_channel::bounded::recv(EncodedFrame)
Frame byte-splitting at MTU boundaries (not NAL-aligned)
┌─ Keyframe → QUIC reliable stream (guaranteed delivery)
└─ Delta frame → QUIC datagram (unreliable, low latency)
Quinn WebTransport send
Browser WebTransport.receive()
Frame reassembly (if fragmented)
WebCodecs VideoDecoder.decode(EncodedVideoChunk)
Canvas.drawImage(VideoFrame)
```
### 3.3 Latency Budget
| Stage | Latency | Notes |
|-------|---------|-------|
| Wayland capture (KMS/dmabuf) | 1-3ms | Zero-copy from compositor |
| GPU encode (VAAPI H.264) | 3-8ms | Synchronous, main thread |
| vaMapBuffer CPU copy | <1ms | Unavoidable GPU→CPU |
| async_channel | <0.1ms | In-process |
| QUIC datagram (LAN) | 1-10ms | LAN transit, merged with network |
| WebCodecs decode | 2-5ms | Browser hardware decode |
| Canvas render | 1-2ms | requestAnimationFrame |
| **Total (LAN)** | **9-29ms** | Well under 50ms target (corrected: removed double-counted network transit) |
### 3.4 EncodedFrame Structure
```rust
#[derive(Clone)]
struct EncodedFrame {
data: Bytes, // Annex B NALUs with start codes
pts_us: i64, // Presentation timestamp (microseconds, for WebCodecs)
duration: Duration, // Frame duration for timestamp calculation
frame_type: FrameType, // Keyframe or Delta (matches transport framing)
width: u32, // Frame width (may differ from capture on ROI)
height: u32, // Frame height
}
```
**Timestamp convention**: `pts_us` is in **microseconds** (not nanoseconds), matching WebCodecs' `EncodedVideoChunk.timestamp` requirement. The server tracks a monotonic PTS starting from 0, incrementing by `1_000_000 / fps` per frame.
---
## 4. State Machine
### 4.1 EncConstructionStage
```
┌──────────────────┐
App start │ ProbingOutputs │ Discover Wayland outputs,
│ └────────┬─────────┘ collect geometry info
▼ │ All outputs probed
┌───────────────┐ ▼
│ ProbingOutputs├──→ ┌──────────────────┐
└───────────────┘ │EverythingButFmt │ HW device ctx created,
└────────┬─────────┘ encoder initialized
│ negotiate_format()
┌───────────┐
┌─────→│ Streaming │──── Active capture + encode + transport
│ └─────┬─────┘
│ │ Output disconnected
│ Format │ ┌──────────────┐
│ changed │ │OutputWentAway│ Keep enc + transport,
│ │ └──────┬───────┘ drop capture objects
└────────────┘ │ Same output reconnects
←───────────────────────┘
Intermediate transient exists at all transition arrows (mem::replace)
```
**Key design choice**: `Streaming` state holds both `EncState` (encoding pipeline) AND `TransportState` (active WebTransport sessions). On `OutputWentAway`, both are preserved — only capture objects are discarded.
### 4.2 InFlightSurface
```
None → AllocQueued → Allocd(Frame) → CopyQueued { surface, drm_map, frame, buffer } → None
```
4-state enum with `assert!(matches!(...))` runtime guards. RAII cleanup on each state transition. Single-frame-in-flight constraint prevents buffer exhaustion.
### 4.3 TransportSessionState (new)
```
┌───────────┐ connect ┌───────────┐ disconnect ┌───────────┐
│ Listening │ ──────────────→ │ Active │ ──────────────→ │ Closed │
│ (quinn │ │ (sending │ │ (cleanup) │
│ endpoint)│ │ frames) │ │ │
└───────────┘ └───────────┘ └───────────┘
```
Multiple sessions can be `Active` simultaneously (Phase 2). Phase 1 supports exactly one.
---
## 5. Design Patterns
The architecture employs several established software design patterns for managing complexity:
| # | Pattern | Usage in wl-webrtc |
|---|---------|-------------------|
| 1 | Strategy Trait + Generic State | `CaptureSource` trait with `CapWlrScreencopy` / `CapExtImageCopy` backends |
| 2 | Polymorphic Enum State Machine | `EncConstructionStage` — 5 variants with type-safe transitions |
| 3 | Type-Safe Frame Lifecycle | `InFlightSurface` — 4-state enum with runtime guards |
| 4 | Pin\<Box\> Self-Referential | Vulkan device context — for self-referential FFmpeg structs |
| 5 | Independent Thread Pipe | tokio runtime replaces mpsc audio thread; same atomic flag pattern |
| 6 | VRR-Aware Frame Rate Control | `FpsLimit<T>` — one-frame-buffer delay for correct drop decisions |
| 7 | Generic Dispatch 3-Layer | Wayland protocol dispatch — generic event handling |
| 8 | Three-Stage Safe Construction | Incremental resource acquisition with partial state rollback |
| 9 | Hot-Plug Auto-Recovery | `OutputWentAway` — preserve encoder/transport, rebuild capture |
| 10 | Zero-Copy GPU Pipeline | DMA-BUF capture + GPU-internal encode, minimal CPU involvement |
---
## 6. Transport Protocol Design
### 6.1 WebTransport Connection Setup
```
Server generates self-signed TLS certificate (via wtransport built-in rcgen support)
→ wtransport::Endpoint::server(server_config, addr)
→ Browser: new WebTransport("https://server:PORT/wt")
→ wtransport handles full HTTP/3 + WebTransport handshake internally
→ Session established (datagrams + streams available)
```
**Transport library**: We use `wtransport` crate (v0.7) which provides a complete WebTransport-over-HTTP/3 server implementation built on top of `quinn` 0.11 and `rustls` 0.23. This handles all protocol details (HTTP/3 SETTINGS, CONNECT method with `:protocol = webtransport`, session management, datagram framing per RFC 9297). Raw `quinn` or `h3` would require building this protocol stack manually.
### 6.2 Frame Framing Protocol
QUIC datagrams have a practical MTU of ~1200 bytes. A 1080p H.264 frame is typically 10KB-200KB. Application-level framing:
```
Datagram format:
┌──────────┬──────────┬──────────┬──────────┬──────────┬─────────────┐
│ type (1) │ frame_id │ pts_us │ seq_num │ total │ payload │
│ │ (4 bytes)│ (8 bytes)│ (2 bytes)│ (2 bytes)│ (variable) │
└──────────┴──────────┴──────────┴──────────┴──────────┴─────────────┘
type:
0x01 = Keyframe fragment (sent via reliable stream, not datagram)
0x02 = Delta frame fragment (sent via datagram)
0x03 = Keyframe complete (small enough for single datagram)
0x04 = Delta frame complete
0x10 = Codec config (SPS/PPS for H.264, VPS/SPS/PPS for HEVC)
pts_us: Presentation timestamp in microseconds (i64, big-endian).
Passed directly to WebCodecs EncodedVideoChunk.timestamp.
For fragmented frames, every fragment carries the same pts_us.
```
**Key design decisions**:
- **Keyframes via reliable WebTransport stream**: SPS/PPS + IDR data must not be lost. Use `session.open_uni().await` for reliable delivery.
- **Delta frames via datagram**: Loss-tolerant. If a delta frame is lost, the decoder waits for the next keyframe. This avoids accumulated corruption.
- **Frame reassembly in browser**: Buffer fragments by `frame_id`, reassemble when all `total` fragments arrive, decode complete frame.
- **Timestamp in microseconds**: The fragment header carries `pts_us: i64` (presentation timestamp in microseconds) so the browser can pass it directly to `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp.
### 6.3 Codec Configuration Exchange
The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. The browser configures the decoder in **Annex B mode** (no `description` at `configure()` time), and SPS/PPS arrive in-band with each keyframe.
On session establishment, the server sends a codec configuration message over the reliable QUIC stream to inform the browser of the codec and dimensions:
```json
{
"type": "codec_config",
"codec": "avc1.42E01F",
"width": 1920,
"height": 1080,
"framerate": 60
}
```
Browser uses this to configure `VideoDecoder` — without `description`, which activates Annex B mode:
```javascript
decoder.configure({
codec: config.codec,
codedWidth: config.width,
codedHeight: config.height,
// NO description — Annex B mode. SPS/PPS arrive in-band with each keyframe.
});
```
**Why no AVCC description?** Per the WebCodecs AVC registration spec, providing `description` forces the decoder into AVC (length-prefixed) mode for ALL frames. Since our encoder outputs Annex B (start-code-prefixed), we must omit `description` and rely on in-band parameter sets guaranteed by the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). Note: `repeat_headers=1` is a libx264-only option — it does NOT work with `h264_vaapi`.
**Timestamp handling**: The `FragmentHeader` carries both a `frame_id` (u32) for reassembly ordering and `pts_us` (i64) — the presentation timestamp in microseconds. The browser uses `pts_us` directly as `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp. Every fragment of a frame carries the same `pts_us` value so the browser can extract it from any fragment during reassembly.
---
## 7. Browser-Side Design
### 7.1 Web UI (static/index.html + player.js)
Single-page application with minimal dependencies:
```
┌──────────────────────────────────────┐
│ wl-webrtc │
│ ┌──────────────────────────────┐ │
│ │ │ │
│ │ <canvas> (video) │ │
│ │ WebCodecs → drawImage │ │
│ │ │ │
│ └──────────────────────────────┘ │
│ Status: Connected | Latency: 23ms │
│ Resolution: 1920x1080 @ 60fps │
│ [Fullscreen] [Disconnect] │
└──────────────────────────────────────┘
```
### 7.2 WebCodecs Decoder Pipeline
**CRITICAL: Annex B mode only.** Per the [W3C AVC WebCodecs Registration](https://w3c.github.io/webcodecs/avc_codec_registration.html#videodecoderconfig-description), if `description` is provided at `configure()` time, ALL subsequent `EncodedVideoChunk` data must be in AVC format (4-byte length-prefixed). If `description` is **absent**, the bitstream is assumed to be in Annex B format (start-code-prefixed). Since our encoder outputs Annex B, we must NOT provide `description`.
The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. This enables the decoder to initialize from keyframe data alone.
```javascript
// Simplified player.js flow
const transport = new WebTransport("https://server:PORT/wt");
const decoder = new VideoDecoder({
output: (frame) => {
ctx.drawImage(frame, 0, 0);
frame.close();
},
error: (e) => console.error(e),
});
// Configure WITHOUT description → Annex B mode.
// SPS/PPS are delivered in-band with each keyframe (via h264_metadata BSF repeat_sps=1 repeat_pps=1 on encoder).
decoder.configure({
codec: "avc1.42E01F",
codedWidth: 1920,
codedHeight: 1080,
// NO description field — Annex B mode
});
// Receive frames
const reader = transport.datagrams.readable.getReader();
while (true) {
const { value, done } = await reader.read();
if (done) break;
const frame = reassembleFrame(value);
if (frame.complete) {
decoder.decode(new EncodedVideoChunk({
type: frame.isKeyframe ? "key" : "delta",
timestamp: Number(frame.ptsUs),
data: frame.data, // Annex B — valid because no description was provided
}));
}
}
```
### 7.3 No Annex B → AVCC Conversion Needed
Because we configure the decoder in Annex B mode (no `description`), no format conversion is needed on the browser side. The server sends raw Annex B NAL units with start codes (`00 00 00 01`), and the decoder accepts them directly.
The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are included in every IDR frame. Note: `repeat_headers=1` (and `-flags2 +repeat_headers`) are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. This ensures the decoder can re-initialize after any keyframe, even if it missed earlier configuration data.
---
## 8. Error Handling & Recovery
### 8.1 Display Hot-Plug
1. `wl_registry.global_remove` → set `output_went_away` flag
2. `on_copy_fail()` detects flag → transition to `OutputWentAway`
3. Preserve: encoder context, transport sessions, WebRTC connections
4. Discard: Wayland protocol objects (invalidated)
5. Wait for same-name output ("DP-1") to reappear
6. Create new `CaptureSource`, reuse old encoder, continue streaming
### 8.2 Network Disconnection
- QUIC handles keepalive and retransmission internally
- Client page refresh → new WebTransport session → server auto-starts sending current frame stream
- Server is stateless per session — no recovery needed, just reconnect
### 8.3 Dynamic Format Change
Capture format changes (resolution, rotation):
1. Rebuild: `frames_rgb`, `video_filter`, `enc_video`, `frames_yuv`
2. Preserve: `hw_device_ctx`, `transport_state`
3. Send new codec configuration to browser via reliable stream
4. Browser reconfigures `VideoDecoder` with new SPS/PPS and dimensions
### 8.4 Frame Loss Handling
- Lost delta frame → decoder continues, minor artifact until next keyframe
- Lost keyframe → decoder cannot continue → request keyframe from server via reliable stream
- Server receives keyframe request → sets next input frame to `AV_PICTURE_TYPE_I`
### 8.5 Graceful Shutdown
Shutdown is triggered by SIGINT/SIGTERM via `signal-hook` + `mio` integration:
1. Main loop sets `running = false` flag → stops queuing new captures
2. Wait for in-flight frame to complete (drain `InFlightSurface`)
3. Flush encoder (`avcodec_flush_buffers`) → drain remaining packets
4. Send final frames through channel
5. Drop `frame_tx` sender → signals EOF to transport
6. Transport server drains pending frames, sends GOAWAY to clients
7. `tokio::runtime::shutdown_background()` terminates async tasks
8. Drop Wayland protocol objects (compositor handles cleanup)
9. FFmpeg contexts freed via `Drop` implementations
**Key concern**: Do NOT use blocking `send_blocking()` on the main thread — use `try_send()` so the main loop never stalls during shutdown. If the channel is full, the frame is dropped (acceptable during shutdown).
**NOTE**: wayland-client 0.31 uses `Connection::connect_to_env()` and `GlobalList` instead of the old 0.29 API (`Display::connect_to_env()` / `GlobalManager::new()`). See plan Task 11 for correct API usage.
### 8.6 First Keyframe Delivery
When a new WebTransport session is established, the client needs a keyframe before it can decode any delta frames. Two strategies:
1. **Force IDR on connect**: Set `AV_PICTURE_TYPE_I` on the next encoded frame when a new session is detected
2. **Buffer last keyframe**: Store the most recent keyframe in `TransportServer`, resend to new clients
Phase 1 uses strategy 1 (force IDR) for simplicity. The transport server sets a `needs_keyframe: bool` flag on new sessions, which the encode loop checks.
---
## 9. Dependencies
```toml
[dependencies]
# Wayland screen capture
wayland-client = "0.31"
wayland-protocols = { version = "0.32", features = ["client", "unstable", "staging"] }
wayland-protocols-wlr = { version = "0.3", features = ["client"] }
drm-fourcc = "2"
# GPU encoding
ffmpeg-next = "8"
# WebTransport (HTTP/3 + WebTransport protocol, built on quinn + rustls)
wtransport = { version = "0.7", features = ["self-signed"] }
# Web UI
axum = { version = "0.8", features = ["ws"] }
tower-http = { version = "0.6", features = ["cors"] }
rust-embed = { version = "8", features = ["mime-guess"] }
# Async runtime
tokio = { version = "1", features = ["full"] }
# Sync/async bridge (sync send() on mio thread, async recv() on tokio)
async-channel = "2"
# Event loop
mio = "1"
# Utilities
clap = { version = "4", features = ["derive"] }
tracing = "0.1"
tracing-subscriber = "0.3"
anyhow = "1"
bytes = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
signal-hook = { version = "0.3", features = ["iterator"] }
base64 = "0.22"
mime_guess = "2"
```
**Encoder configuration note**: The VAAPI H.264 encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS parameter sets are emitted in-band with every IDR frame. This is required for WebCodecs Annex B decode mode on the browser side. **Important**: `repeat_headers=1` and `-flags2 +repeat_headers` are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders.
---
## 10. Implementation Phases
### Phase 1 — MVP: Screen → Browser Streaming
| # | Module | Description | Estimated Effort |
|---|--------|-------------|------------------|
| 1 | `main.rs` | CLI args, startup sequence | Small |
| 2 | `cap_*.rs` | Implement capture backends (wlr-screencopy + ext-image-copy) | Medium |
| 3 | `avhw.rs` | Implement FFmpeg HW device/frame context management | Medium |
| 4 | `filter.rs` | Implement GPU video filter graph | Small |
| 5 | `transform.rs` | Implement coordinate transforms for Wayland outputs | Small |
| 6 | `fps_limit.rs` | Implement VRR-aware frame rate limiter | Small |
| 7 | `state.rs` | State machine adapted for transport | Medium |
| 8 | `transport.rs` | QUIC server + frame distribution | Large (new code) |
| 9 | `nalu.rs` | Annex B framing protocol | Small (new code) |
| 10 | `signaling.rs` | axum server + static files | Small (new code) |
| 11 | `static/*` | Browser Web UI + WebCodecs player | Medium (new code) |
**Deliverable**: Run `wl-webrtc`, open `https://localhost:PORT` in Chrome, see live screen at <50ms latency.
### Phase 2 — Remote Input + Stability
| # | Feature | Description |
|---|---------|-------------|
| 12 | Remote input | Browser mouse/keyboard → wlr-virtual-pointer/virtual-keyboard |
| 13 | Hot-plug recovery | Display disconnect/reconnect |
| 14 | Dynamic format | Resolution/rotation change handling |
| 15 | Multi-client | Multiple simultaneous browser viewers |
### Phase 3 — Optimization + Compatibility
| # | Feature | Description |
|---|---------|-------------|
| 16 | Adaptive bitrate | Network-aware VAAPI bit_rate adjustment |
| 17 | Audio pipeline | Synchronous audio capture + encoding + transport |
| 18 | WebRTC fallback | webrtc-rs path for Firefox compatibility |
| 19 | Performance dashboard | Real-time stats in Web UI |
---
## 11. Open Questions
1. **ffmpeg-next vs direct VAAPI bindings**: ffmpeg-next adds FFI overhead but provides mature encoding pipeline. Direct vaapi-dmabuf bindings would be more Rust-native but much more implementation work. **Decision: ffmpeg-next for Phase 1, evaluate direct bindings in Phase 3.** NOTE: `ffmpeg-next` safe API does NOT wrap hardware contexts (`AVBufferRef`, `AVHWFramesContext`). Use raw `ffmpeg_next::ffi` directly for all HW context operations — see `wl-screenrec/src/avhw.rs` for the reference pattern.
2. **Frame fragmentation strategy**: Current design fragments large frames across QUIC datagrams at byte boundaries (not NAL-aligned). The framing protocol reassembles by `frame_id`, so a lost fragment invalidates the entire frame. Alternative: send all frames via reliable QUIC streams and accept slightly higher latency. **Decision: Start with datagrams for delta frames, measure latency, evaluate.**
3. **Self-signed certificate UX**: Browser will show SSL warning. Options: (a) accept for LAN, (b) guide user to trust CA, (c) use HTTP/2 prior knowledge. **Decision: Accept for Phase 1, add CA trust guide in Phase 2.**
4. **HEVC vs H.264 default**: H.264 has universal browser support. HEVC has better compression but spotty browser support. **Decision: H.264 default, HEVC as option flag.**
5. **WebCodecs bitstream format**: **Decision: Annex B mode (no `description` at configure time).** SPS/PPS are guaranteed in-band via the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). **Important**: The `repeat_headers=1` encoder option is libx264-only — it does NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. Per the W3C AVC WebCodecs Registration, providing `description` forces AVC (length-prefixed) mode for ALL subsequent frames. Since our encoder outputs Annex B, we must omit `description`.