# wl-webrtc Architecture Design

**Date**: 2026-04-03
**Status**: Draft
**Author**: Sisyphus (AI-assisted design)

---

## 1. Overview

### 1.1 Problem Statement

Build a low-latency Wayland screen sharing server that captures the desktop via GPU, encodes with hardware acceleration (VAAPI/Vulkan), and streams to a browser for remote viewing and eventual remote control.

### 1.2 Goals

- **Glass-to-glass latency < 50ms** on LAN
- **GPU-accelerated pipeline**: capture + encode entirely on GPU, only encoded bitstream crosses to CPU
- **Browser-only client**: no native app installation required
- **Single binary deployment**: embedded web UI, no external dependencies
- **Linux Wayland only**: no cross-platform abstraction needed
- **Annex B mode**: encoder must emit in-band SPS/PPS with every keyframe via the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) — NOT `repeat_headers=1` (that option is libx264-only and does NOT exist for `h264_vaapi`)
- **Annex B streaming**: encoder outputs Annex B (start-code-prefixed) NAL units with SPS/PPS injected per-IDR via `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`), browser decodes in Annex B mode via WebCodecs. Note: `repeat_headers=1` is a libx264-only option, NOT available for `h264_vaapi`.

### 1.3 Non-Goals (Phase 1)

- Multi-client support (Phase 2)
- Audio streaming (Phase 3)
- Remote input injection (Phase 2)
- Firefox support (Phase 3 — WebRTC fallback)
- Adaptive bitrate (Phase 3)

### 1.4 Technology Stack

| Component | Technology | Rationale |
|-----------|-----------|-----------|
| Screen capture | wayland-client + DMA-BUF | Zero-copy GPU capture via DMA-BUF |
| GPU encoding | FFmpeg (ffmpeg-next) VAAPI/Vulkan | H.264/HEVC hardware encoding |
| Transport | wtransport (WebTransport over HTTP/3) | Full HTTP/3 + WebTransport protocol, built on quinn + rustls |
| Browser decode | WebCodecs VideoDecoder | Direct decode control, no MSE buffering |
| Web UI | axum + rust-embed | Single binary, compile-time embedded static files |
| Event loop | mio | Proven with Wayland file descriptor callbacks |
| Async runtime | tokio | Required by wtransport, also powers axum |
| Sync/async bridge | async_channel | Both sync send() and async recv(), bridges mio → tokio naturally |

### 1.5 Transport Decision: Why Not WebRTC

WebRTC was evaluated and rejected as the primary transport for this use case:

| Factor | WebRTC (webrtc-rs) | WebTransport + WebCodecs |
|--------|-------------------|-------------------------|
| Glass-to-glass latency | 30-110ms (unavoidable 20-60ms jitter buffer) | 12-38ms (no jitter buffer) |
| Rust ecosystem | webrtc-rs v0.20.0-alpha, mid-rewrite | wtransport production-grade, built on quinn |
| Protocol overhead | ICE/DTLS/SRTP/SDP — designed for P2P NAT traversal | QUIC TLS 1.3 — server-to-client, simpler |
| Decode control | Browser controls jitter buffer, cannot opt out | Application controls every frame decode |
| GPU data path | Sample { data: Bytes }, must copy to CPU | Same copy, but shorter pipeline |
| Browser support | All browsers | Chrome/Edge only (Firefox lacks WebCodecs) |

**Transport library choice**: We use the `wtransport` crate (v0.7) instead of raw `quinn` + `h3`. The browser's `WebTransport` API requires a full HTTP/3 server with the WebTransport extension (RFC 9297). Raw QUIC is NOT sufficient — there is no browser API for raw QUIC connections. The `wtransport` crate provides the complete protocol stack (HTTP/3 + WebTransport) built on top of `quinn` 0.11 and `rustls` 0.23, with support for datagrams, unidirectional streams, and bidirectional streams.

WebRTC will be added as a Phase 3 fallback for Firefox compatibility.

---

## 2. Architecture

### 2.1 Thread Model

```
┌─────────────────────────────────────────────────────────────┐
│                      wl-webrtc process                       │
│                                                              │
│  Main Thread (mio event loop)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Wayland event queue dispatch                        │   │
│  │  Screen capture (DMA-BUF, zero-copy from compositor) │   │
│  │  GPU encode (FFmpeg VAAPI/Vulkan, sync calls)        │   │
│  │  State machine transitions                           │   │
│  │  FPS limiting                                        │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│              async_channel::bounded<16>(EncodedFrame)              │
│                         │                                    │
│  Tokio Runtime Thread Pool (2+ threads)                      │
│  ┌──────────────────────▼───────────────────────────────┐   │
│  │  wtransport WebTransport server                      │   │
│  │  HTTP/3 + WebTransport session management            │   │
│  │  Frame distribution to connected clients             │   │
│  │  axum HTTP server (Web UI + control API)             │   │
│  │  rust-embed static file serving                      │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
```

**Design rationale**:

- **Capture + encode on main thread**: GPU encoding is synchronous (3-8ms per frame at 30-60fps), doesn't block the mio event loop at these frame rates. This avoids cross-thread synchronization for the GPU pipeline.
- **wtransport on tokio**: wtransport is built on quinn and tokio. axum requires tokio. Both coexist naturally. Both the WebTransport server and the HTTP static file server share the same tokio runtime.
- **async_channel::bounded(16)**: Channel capacity of 16 frames provides ~260ms of buffer at 60fps — enough to absorb transport jitter without excessive latency. The sender uses `try_send()`: if the channel is full, the frame is dropped and logged. This is standard practice in real-time streaming — newer frames are always more valuable than older ones. `try_send()` returns `Err(TrySendError::Full(_))` on a full channel, which the main loop handles by discarding the frame. This avoids blocking the main mio event loop, which must remain responsive for Wayland event dispatch. **Do NOT use `send_blocking()`** on the mio thread — it would stall the capture pipeline if the transport consumer falls behind.

### 2.2 Module Dependency Graph

```
                        ┌──────────┐
                        │ main.rs  │  entry point, CLI, orchestration
                        └──┬──┬─┬──┘
                           │  │ │
              ┌────────────┘  │ └────────────┐
              ▼               ▼              ▼
        ┌──────────┐   ┌──────────┐   ┌────────────┐
        │ state.rs │   │ avhw.rs  │   │ transport.rs│
        │ StateMachine   │ HW ctx   │   │ QUIC server │
        │ CaptureSource  │          │   │ Sessions    │
        └──┬───┬────┘   └────┬─────┘   └────────────┘
           │   │              │
     ┌─────┘   └──────┐       │
     ▼                ▼       ▼
┌─────────┐   ┌────────────┐ ┌────────┐
│cap_wlr_ │   │cap_ext_    │ │filter.rs│
│screen   │   │image_copy  │ │ crop/   │
│copy     │   │            │ │ scale/  │
└─────────┘   └────────────┘ │transpose│
                             └────────┘
  ┌────────────┐  ┌──────────────┐
  │transform.rs│  │signaling.rs   │
  │ coordinate │  │ axum + embed  │
  │ transform  │  │ Web UI serve  │
  └────────────┘  └──────────────┘
  ┌────────────┐
  │fps_limit.rs│
  └────────────┘
```

**Dependency layers** (bottom-up):

1. `transform.rs`, `fps_limit.rs` — leaf modules, zero internal dependencies
2. `avhw.rs`, `filter.rs` — FFmpeg wrapper layer
3. `cap_wlr_screencopy.rs`, `cap_ext_image_copy.rs` — capture backends, depend on state + avhw
4. `state.rs` — state machine + CaptureSource trait
5. `transport.rs`, `signaling.rs` — network layer
6. `main.rs` — orchestration

### 2.3 Project File Structure

```
wl-webrtc/
├── Cargo.toml
├── README.md
├── src/
│   ├── main.rs                # ~300 lines — CLI, startup, orchestration
│   ├── state.rs               # ~600 lines — State<S>, EncConstructionStage, InFlightSurface
│   ├── avhw.rs                # ~450 lines — FFmpeg HW device/frame contexts
│   ├── filter.rs              # ~200 lines — FFmpeg video filter graph
│   ├── cap_wlr_screencopy.rs  # ~170 lines — wlr-screencopy backend
│   ├── cap_ext_image_copy.rs  # ~240 lines — ext-image-copy-capture backend
│   ├── transform.rs           # ~220 lines — coordinate transforms
│   ├── fps_limit.rs           # ~130 lines — VRR-aware frame rate limiter
│   ├── transport.rs           # ~400 lines — QUIC/WebTransport server
│   ├── signaling.rs           # ~200 lines — axum HTTP + WebSocket control
│   └── nalu.rs                # ~150 lines — Annex B NAL unit splitting, framing protocol
├── static/
│   ├── index.html             # Web UI shell
│   ├── player.js              # WebCodecs decoder + Canvas renderer
│   └── style.css              # Minimal styling
└── protocols/                 # Wayland protocol XML files
```

---

## 3. Data Flow

### 3.1 Zero-Copy Capture Pipeline

```
GPU Frame Pool ─alloc()→ HW Surface
                          ↓
               av_hwframe_map → DMA-BUF fd
                          ↓
               zwp_linux_dmabuf → WlBuffer (fd shared)
                          ↓
               Compositor writes directly to GPU Surface
                          ↓
               FFmpeg VAAPI/Vulkan encode (GPU-internal)
                          ↓
               AVPacket.data (Annex B with 00 00 00 01 start codes)
                          ↓               ← GPU→CPU copy via vaMapBuffer (unavoidable)
               Bytes::from(Vec<u8>) wrapper
                          ↓
                async_channel::bounded::send(EncodedFrame)  // sync, non-blocking on main thread
```

### 3.2 Transport Pipeline

```
async_channel::bounded::recv(EncodedFrame)
                          ↓
                Frame byte-splitting at MTU boundaries (not NAL-aligned)
                           ↓
                ┌─ Keyframe → QUIC reliable stream (guaranteed delivery)
                └─ Delta frame → QUIC datagram (unreliable, low latency)
                          ↓
               Quinn WebTransport send
                          ↓
               Browser WebTransport.receive()
                          ↓
               Frame reassembly (if fragmented)
                          ↓
               WebCodecs VideoDecoder.decode(EncodedVideoChunk)
                          ↓
               Canvas.drawImage(VideoFrame)
```

### 3.3 Latency Budget

| Stage | Latency | Notes |
|-------|---------|-------|
| Wayland capture (KMS/dmabuf) | 1-3ms | Zero-copy from compositor |
| GPU encode (VAAPI H.264) | 3-8ms | Synchronous, main thread |
| vaMapBuffer CPU copy | <1ms | Unavoidable GPU→CPU |
| async_channel | <0.1ms | In-process |
| QUIC datagram (LAN) | 1-10ms | LAN transit, merged with network |
| WebCodecs decode | 2-5ms | Browser hardware decode |
| Canvas render | 1-2ms | requestAnimationFrame |
| **Total (LAN)** | **9-29ms** | Well under 50ms target (corrected: removed double-counted network transit) |

### 3.4 EncodedFrame Structure

```rust
#[derive(Clone)]
struct EncodedFrame {
    data: Bytes,           // Annex B NALUs with start codes
    pts_us: i64,           // Presentation timestamp (microseconds, for WebCodecs)
    duration: Duration,    // Frame duration for timestamp calculation
    frame_type: FrameType, // Keyframe or Delta (matches transport framing)
    width: u32,            // Frame width (may differ from capture on ROI)
    height: u32,           // Frame height
}
```

**Timestamp convention**: `pts_us` is in **microseconds** (not nanoseconds), matching WebCodecs' `EncodedVideoChunk.timestamp` requirement. The server tracks a monotonic PTS starting from 0, incrementing by `1_000_000 / fps` per frame.

---

## 4. State Machine

### 4.1 EncConstructionStage

```
                    ┌──────────────────┐
  App start         │ ProbingOutputs   │  Discover Wayland outputs,
    │               └────────┬─────────┘  collect geometry info
    ▼                        │ All outputs probed
┌───────────────┐            ▼
│ ProbingOutputs├──→ ┌──────────────────┐
└───────────────┘    │EverythingButFmt  │  HW device ctx created,
                       └────────┬─────────┘  encoder initialized
                                │ negotiate_format()
                                ▼
                         ┌───────────┐
                  ┌─────→│ Streaming │──── Active capture + encode + transport
                  │      └─────┬─────┘
                  │            │ Output disconnected
                  │  Format    │        ┌──────────────┐
                  │  changed   │        │OutputWentAway│  Keep enc + transport,
                  │            │        └──────┬───────┘  drop capture objects
                  └────────────┘               │ Same output reconnects
                       ←───────────────────────┘

    Intermediate transient exists at all transition arrows (mem::replace)
```

**Key design choice**: `Streaming` state holds both `EncState` (encoding pipeline) AND `TransportState` (active WebTransport sessions). On `OutputWentAway`, both are preserved — only capture objects are discarded.

### 4.2 InFlightSurface

```
None → AllocQueued → Allocd(Frame) → CopyQueued { surface, drm_map, frame, buffer } → None
```

4-state enum with `assert!(matches!(...))` runtime guards. RAII cleanup on each state transition. Single-frame-in-flight constraint prevents buffer exhaustion.

### 4.3 TransportSessionState (new)

```
┌───────────┐     connect      ┌───────────┐     disconnect     ┌───────────┐
│ Listening │ ──────────────→  │ Active    │ ──────────────→   │ Closed    │
│ (quinn    │                  │ (sending  │                    │ (cleanup) │
│  endpoint)│                  │  frames)  │                    │           │
└───────────┘                  └───────────┘                    └───────────┘
```

Multiple sessions can be `Active` simultaneously (Phase 2). Phase 1 supports exactly one.

---

## 5. Design Patterns

The architecture employs several established software design patterns for managing complexity:

| # | Pattern | Usage in wl-webrtc |
|---|---------|-------------------|
| 1 | Strategy Trait + Generic State | `CaptureSource` trait with `CapWlrScreencopy` / `CapExtImageCopy` backends |
| 2 | Polymorphic Enum State Machine | `EncConstructionStage` — 5 variants with type-safe transitions |
| 3 | Type-Safe Frame Lifecycle | `InFlightSurface` — 4-state enum with runtime guards |
| 4 | Pin\<Box\> Self-Referential | Vulkan device context — for self-referential FFmpeg structs |
| 5 | Independent Thread Pipe | tokio runtime replaces mpsc audio thread; same atomic flag pattern |
| 6 | VRR-Aware Frame Rate Control | `FpsLimit<T>` — one-frame-buffer delay for correct drop decisions |
| 7 | Generic Dispatch 3-Layer | Wayland protocol dispatch — generic event handling |
| 8 | Three-Stage Safe Construction | Incremental resource acquisition with partial state rollback |
| 9 | Hot-Plug Auto-Recovery | `OutputWentAway` — preserve encoder/transport, rebuild capture |
| 10 | Zero-Copy GPU Pipeline | DMA-BUF capture + GPU-internal encode, minimal CPU involvement |

---

## 6. Transport Protocol Design

### 6.1 WebTransport Connection Setup

```
Server generates self-signed TLS certificate (via wtransport built-in rcgen support)
  → wtransport::Endpoint::server(server_config, addr)
  → Browser: new WebTransport("https://server:PORT/wt")
  → wtransport handles full HTTP/3 + WebTransport handshake internally
  → Session established (datagrams + streams available)
```

**Transport library**: We use `wtransport` crate (v0.7) which provides a complete WebTransport-over-HTTP/3 server implementation built on top of `quinn` 0.11 and `rustls` 0.23. This handles all protocol details (HTTP/3 SETTINGS, CONNECT method with `:protocol = webtransport`, session management, datagram framing per RFC 9297). Raw `quinn` or `h3` would require building this protocol stack manually.

### 6.2 Frame Framing Protocol

QUIC datagrams have a practical MTU of ~1200 bytes. A 1080p H.264 frame is typically 10KB-200KB. Application-level framing:

```
Datagram format:
┌──────────┬──────────┬──────────┬──────────┬──────────┬─────────────┐
│ type (1) │ frame_id │ pts_us   │ seq_num  │ total    │ payload     │
│          │ (4 bytes)│ (8 bytes)│ (2 bytes)│ (2 bytes)│ (variable)  │
└──────────┴──────────┴──────────┴──────────┴──────────┴─────────────┘

type:
  0x01 = Keyframe fragment (sent via reliable stream, not datagram)
  0x02 = Delta frame fragment (sent via datagram)
  0x03 = Keyframe complete (small enough for single datagram)
  0x04 = Delta frame complete
  0x10 = Codec config (SPS/PPS for H.264, VPS/SPS/PPS for HEVC)

pts_us: Presentation timestamp in microseconds (i64, big-endian).
        Passed directly to WebCodecs EncodedVideoChunk.timestamp.
        For fragmented frames, every fragment carries the same pts_us.
```

**Key design decisions**:
- **Keyframes via reliable WebTransport stream**: SPS/PPS + IDR data must not be lost. Use `session.open_uni().await` for reliable delivery.
- **Delta frames via datagram**: Loss-tolerant. If a delta frame is lost, the decoder waits for the next keyframe. This avoids accumulated corruption.
- **Frame reassembly in browser**: Buffer fragments by `frame_id`, reassemble when all `total` fragments arrive, decode complete frame.
- **Timestamp in microseconds**: The fragment header carries `pts_us: i64` (presentation timestamp in microseconds) so the browser can pass it directly to `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp.

### 6.3 Codec Configuration Exchange

The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. The browser configures the decoder in **Annex B mode** (no `description` at `configure()` time), and SPS/PPS arrive in-band with each keyframe.

On session establishment, the server sends a codec configuration message over the reliable QUIC stream to inform the browser of the codec and dimensions:

```json
{
  "type": "codec_config",
  "codec": "avc1.42E01F",
  "width": 1920,
  "height": 1080,
  "framerate": 60
}
```

Browser uses this to configure `VideoDecoder` — without `description`, which activates Annex B mode:

```javascript
decoder.configure({
  codec: config.codec,
  codedWidth: config.width,
  codedHeight: config.height,
  // NO description — Annex B mode. SPS/PPS arrive in-band with each keyframe.
});
```

**Why no AVCC description?** Per the WebCodecs AVC registration spec, providing `description` forces the decoder into AVC (length-prefixed) mode for ALL frames. Since our encoder outputs Annex B (start-code-prefixed), we must omit `description` and rely on in-band parameter sets guaranteed by the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). Note: `repeat_headers=1` is a libx264-only option — it does NOT work with `h264_vaapi`.

**Timestamp handling**: The `FragmentHeader` carries both a `frame_id` (u32) for reassembly ordering and `pts_us` (i64) — the presentation timestamp in microseconds. The browser uses `pts_us` directly as `EncodedVideoChunk.timestamp`. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp. Every fragment of a frame carries the same `pts_us` value so the browser can extract it from any fragment during reassembly.

---

## 7. Browser-Side Design

### 7.1 Web UI (static/index.html + player.js)

Single-page application with minimal dependencies:

```
┌──────────────────────────────────────┐
│  wl-webrtc                           │
│  ┌──────────────────────────────┐    │
│  │                              │    │
│  │     <canvas> (video)         │    │
│  │     WebCodecs → drawImage    │    │
│  │                              │    │
│  └──────────────────────────────┘    │
│  Status: Connected | Latency: 23ms   │
│  Resolution: 1920x1080 @ 60fps       │
│  [Fullscreen] [Disconnect]           │
└──────────────────────────────────────┘
```

### 7.2 WebCodecs Decoder Pipeline

**CRITICAL: Annex B mode only.** Per the [W3C AVC WebCodecs Registration](https://w3c.github.io/webcodecs/avc_codec_registration.html#videodecoderconfig-description), if `description` is provided at `configure()` time, ALL subsequent `EncodedVideoChunk` data must be in AVC format (4-byte length-prefixed). If `description` is **absent**, the bitstream is assumed to be in Annex B format (start-code-prefixed). Since our encoder outputs Annex B, we must NOT provide `description`.

The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are injected into every IDR frame. Note: `repeat_headers=1` is a libx264-only option and does NOT exist for `h264_vaapi`. This enables the decoder to initialize from keyframe data alone.

```javascript
// Simplified player.js flow
const transport = new WebTransport("https://server:PORT/wt");
const decoder = new VideoDecoder({
  output: (frame) => {
    ctx.drawImage(frame, 0, 0);
    frame.close();
  },
  error: (e) => console.error(e),
});

// Configure WITHOUT description → Annex B mode.
// SPS/PPS are delivered in-band with each keyframe (via h264_metadata BSF repeat_sps=1 repeat_pps=1 on encoder).
decoder.configure({
  codec: "avc1.42E01F",
  codedWidth: 1920,
  codedHeight: 1080,
  // NO description field — Annex B mode
});

// Receive frames
const reader = transport.datagrams.readable.getReader();
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  const frame = reassembleFrame(value);
  if (frame.complete) {
    decoder.decode(new EncodedVideoChunk({
      type: frame.isKeyframe ? "key" : "delta",
      timestamp: Number(frame.ptsUs),
      data: frame.data, // Annex B — valid because no description was provided
    }));
  }
}
```

### 7.3 No Annex B → AVCC Conversion Needed

Because we configure the decoder in Annex B mode (no `description`), no format conversion is needed on the browser side. The server sends raw Annex B NAL units with start codes (`00 00 00 01`), and the decoder accepts them directly.

The encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS are included in every IDR frame. Note: `repeat_headers=1` (and `-flags2 +repeat_headers`) are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. This ensures the decoder can re-initialize after any keyframe, even if it missed earlier configuration data.

---

## 8. Error Handling & Recovery

### 8.1 Display Hot-Plug

1. `wl_registry.global_remove` → set `output_went_away` flag
2. `on_copy_fail()` detects flag → transition to `OutputWentAway`
3. Preserve: encoder context, transport sessions, WebRTC connections
4. Discard: Wayland protocol objects (invalidated)
5. Wait for same-name output ("DP-1") to reappear
6. Create new `CaptureSource`, reuse old encoder, continue streaming

### 8.2 Network Disconnection

- QUIC handles keepalive and retransmission internally
- Client page refresh → new WebTransport session → server auto-starts sending current frame stream
- Server is stateless per session — no recovery needed, just reconnect

### 8.3 Dynamic Format Change

Capture format changes (resolution, rotation):
1. Rebuild: `frames_rgb`, `video_filter`, `enc_video`, `frames_yuv`
2. Preserve: `hw_device_ctx`, `transport_state`
3. Send new codec configuration to browser via reliable stream
4. Browser reconfigures `VideoDecoder` with new SPS/PPS and dimensions

### 8.4 Frame Loss Handling

- Lost delta frame → decoder continues, minor artifact until next keyframe
- Lost keyframe → decoder cannot continue → request keyframe from server via reliable stream
- Server receives keyframe request → sets next input frame to `AV_PICTURE_TYPE_I`

### 8.5 Graceful Shutdown

Shutdown is triggered by SIGINT/SIGTERM via `signal-hook` + `mio` integration:

1. Main loop sets `running = false` flag → stops queuing new captures
2. Wait for in-flight frame to complete (drain `InFlightSurface`)
3. Flush encoder (`avcodec_flush_buffers`) → drain remaining packets
4. Send final frames through channel
5. Drop `frame_tx` sender → signals EOF to transport
6. Transport server drains pending frames, sends GOAWAY to clients
7. `tokio::runtime::shutdown_background()` terminates async tasks
8. Drop Wayland protocol objects (compositor handles cleanup)
9. FFmpeg contexts freed via `Drop` implementations

**Key concern**: Do NOT use blocking `send_blocking()` on the main thread — use `try_send()` so the main loop never stalls during shutdown. If the channel is full, the frame is dropped (acceptable during shutdown).

**NOTE**: wayland-client 0.31 uses `Connection::connect_to_env()` and `GlobalList` instead of the old 0.29 API (`Display::connect_to_env()` / `GlobalManager::new()`). See plan Task 11 for correct API usage.

### 8.6 First Keyframe Delivery

When a new WebTransport session is established, the client needs a keyframe before it can decode any delta frames. Two strategies:

1. **Force IDR on connect**: Set `AV_PICTURE_TYPE_I` on the next encoded frame when a new session is detected
2. **Buffer last keyframe**: Store the most recent keyframe in `TransportServer`, resend to new clients

Phase 1 uses strategy 1 (force IDR) for simplicity. The transport server sets a `needs_keyframe: bool` flag on new sessions, which the encode loop checks.

---

## 9. Dependencies

```toml
[dependencies]
# Wayland screen capture
wayland-client = "0.31"
wayland-protocols = { version = "0.32", features = ["client", "unstable", "staging"] }
wayland-protocols-wlr = { version = "0.3", features = ["client"] }
drm-fourcc = "2"

# GPU encoding
ffmpeg-next = "8"

# WebTransport (HTTP/3 + WebTransport protocol, built on quinn + rustls)
wtransport = { version = "0.7", features = ["self-signed"] }

# Web UI
axum = { version = "0.8", features = ["ws"] }
tower-http = { version = "0.6", features = ["cors"] }
rust-embed = { version = "8", features = ["mime-guess"] }

# Async runtime
tokio = { version = "1", features = ["full"] }

# Sync/async bridge (sync send() on mio thread, async recv() on tokio)
async-channel = "2"

# Event loop
mio = "1"

# Utilities
clap = { version = "4", features = ["derive"] }
tracing = "0.1"
tracing-subscriber = "0.3"
anyhow = "1"
bytes = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
signal-hook = { version = "0.3", features = ["iterator"] }
base64 = "0.22"
mime_guess = "2"
```

**Encoder configuration note**: The VAAPI H.264 encoder MUST be configured with the `h264_metadata` bitstream filter (`repeat_sps=1` `repeat_pps=1`) to guarantee SPS/PPS parameter sets are emitted in-band with every IDR frame. This is required for WebCodecs Annex B decode mode on the browser side. **Important**: `repeat_headers=1` and `-flags2 +repeat_headers` are libx264-only options — they do NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders.

---

## 10. Implementation Phases

### Phase 1 — MVP: Screen → Browser Streaming

| # | Module | Description | Estimated Effort |
|---|--------|-------------|------------------|
| 1 | `main.rs` | CLI args, startup sequence | Small |
| 2 | `cap_*.rs` | Implement capture backends (wlr-screencopy + ext-image-copy) | Medium |
| 3 | `avhw.rs` | Implement FFmpeg HW device/frame context management | Medium |
| 4 | `filter.rs` | Implement GPU video filter graph | Small |
| 5 | `transform.rs` | Implement coordinate transforms for Wayland outputs | Small |
| 6 | `fps_limit.rs` | Implement VRR-aware frame rate limiter | Small |
| 7 | `state.rs` | State machine adapted for transport | Medium |
| 8 | `transport.rs` | QUIC server + frame distribution | Large (new code) |
| 9 | `nalu.rs` | Annex B framing protocol | Small (new code) |
| 10 | `signaling.rs` | axum server + static files | Small (new code) |
| 11 | `static/*` | Browser Web UI + WebCodecs player | Medium (new code) |

**Deliverable**: Run `wl-webrtc`, open `https://localhost:PORT` in Chrome, see live screen at <50ms latency.

### Phase 2 — Remote Input + Stability

| # | Feature | Description |
|---|---------|-------------|
| 12 | Remote input | Browser mouse/keyboard → wlr-virtual-pointer/virtual-keyboard |
| 13 | Hot-plug recovery | Display disconnect/reconnect |
| 14 | Dynamic format | Resolution/rotation change handling |
| 15 | Multi-client | Multiple simultaneous browser viewers |

### Phase 3 — Optimization + Compatibility

| # | Feature | Description |
|---|---------|-------------|
| 16 | Adaptive bitrate | Network-aware VAAPI bit_rate adjustment |
| 17 | Audio pipeline | Synchronous audio capture + encoding + transport |
| 18 | WebRTC fallback | webrtc-rs path for Firefox compatibility |
| 19 | Performance dashboard | Real-time stats in Web UI |

---

## 11. Open Questions

1. **ffmpeg-next vs direct VAAPI bindings**: ffmpeg-next adds FFI overhead but provides mature encoding pipeline. Direct vaapi-dmabuf bindings would be more Rust-native but much more implementation work. **Decision: ffmpeg-next for Phase 1, evaluate direct bindings in Phase 3.** NOTE: `ffmpeg-next` safe API does NOT wrap hardware contexts (`AVBufferRef`, `AVHWFramesContext`). Use raw `ffmpeg_next::ffi` directly for all HW context operations — see `wl-screenrec/src/avhw.rs` for the reference pattern.

2. **Frame fragmentation strategy**: Current design fragments large frames across QUIC datagrams at byte boundaries (not NAL-aligned). The framing protocol reassembles by `frame_id`, so a lost fragment invalidates the entire frame. Alternative: send all frames via reliable QUIC streams and accept slightly higher latency. **Decision: Start with datagrams for delta frames, measure latency, evaluate.**

3. **Self-signed certificate UX**: Browser will show SSL warning. Options: (a) accept for LAN, (b) guide user to trust CA, (c) use HTTP/2 prior knowledge. **Decision: Accept for Phase 1, add CA trust guide in Phase 2.**

4. **HEVC vs H.264 default**: H.264 has universal browser support. HEVC has better compression but spotty browser support. **Decision: H.264 default, HEVC as option flag.**

5. **WebCodecs bitstream format**: **Decision: Annex B mode (no `description` at configure time).** SPS/PPS are guaranteed in-band via the `h264_metadata` BSF (`repeat_sps=1` `repeat_pps=1`). **Important**: The `repeat_headers=1` encoder option is libx264-only — it does NOT work with `h264_vaapi`. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. Per the W3C AVC WebCodecs Registration, providing `description` forces AVC (length-prefixed) mode for ALL subsequent frames. Since our encoder outputs Annex B, we must omit `description`.