Phase 1 MVP implementation of wl-webrtc: Wayland screen capture tool with hardware-accelerated VAAPI H.264 encoding and WebTransport output. Includes all 9 runtime bug fixes from code audit (fix-audit-issues plan): CRITICAL: - C2: h264_metadata BSF with repeat_sps/repeat_pps in encode pipeline - C4: FpsLimit wired as timing gate in on_copy_complete HIGH: - C3+A2: DRM device discovery via dmabuf feedback MainDevice event, unified resolve_drm_path() helper (CLI > compositor > auto > fallback) - H2: Separate physical_size (mm) from mode_size (pixels) in wl_output - H1+A3: Multi-output warning + named-output-not-found error MEDIUM: - M5: tv_sec u32->u64 to avoid Y2106 timestamp truncation - M4: Guard against SHM Buffer event (DMA-BUF only) Key components: - src/avhw.rs: FFmpeg VAAPI encoder + filter graph + BSF pipeline - src/state.rs: Wayland event loop + output negotiation + screencopy - src/cap_wlr_screencopy.rs: wlr-screencopy capture source - src/fps_limit.rs: Frame rate limiting with configurable target - src/transform.rs: Frame format conversion utilities
34 KiB
wl-webrtc Architecture Design
Date: 2026-04-03 Status: Draft Author: Sisyphus (AI-assisted design)
1. Overview
1.1 Problem Statement
Build a low-latency Wayland screen sharing server that captures the desktop via GPU, encodes with hardware acceleration (VAAPI/Vulkan), and streams to a browser for remote viewing and eventual remote control.
1.2 Goals
- Glass-to-glass latency < 50ms on LAN
- GPU-accelerated pipeline: capture + encode entirely on GPU, only encoded bitstream crosses to CPU
- Browser-only client: no native app installation required
- Single binary deployment: embedded web UI, no external dependencies
- Linux Wayland only: no cross-platform abstraction needed
- Annex B mode: encoder must emit in-band SPS/PPS with every keyframe via the
h264_metadatabitstream filter (repeat_sps=1repeat_pps=1) — NOTrepeat_headers=1(that option is libx264-only and does NOT exist forh264_vaapi) - Annex B streaming: encoder outputs Annex B (start-code-prefixed) NAL units with SPS/PPS injected per-IDR via
h264_metadataBSF (repeat_sps=1repeat_pps=1), browser decodes in Annex B mode via WebCodecs. Note:repeat_headers=1is a libx264-only option, NOT available forh264_vaapi.
1.3 Non-Goals (Phase 1)
- Multi-client support (Phase 2)
- Audio streaming (Phase 3)
- Remote input injection (Phase 2)
- Firefox support (Phase 3 — WebRTC fallback)
- Adaptive bitrate (Phase 3)
1.4 Technology Stack
| Component | Technology | Rationale |
|---|---|---|
| Screen capture | wayland-client + DMA-BUF | Zero-copy GPU capture via DMA-BUF |
| GPU encoding | FFmpeg (ffmpeg-next) VAAPI/Vulkan | H.264/HEVC hardware encoding |
| Transport | wtransport (WebTransport over HTTP/3) | Full HTTP/3 + WebTransport protocol, built on quinn + rustls |
| Browser decode | WebCodecs VideoDecoder | Direct decode control, no MSE buffering |
| Web UI | axum + rust-embed | Single binary, compile-time embedded static files |
| Event loop | mio | Proven with Wayland file descriptor callbacks |
| Async runtime | tokio | Required by wtransport, also powers axum |
| Sync/async bridge | async_channel | Both sync send() and async recv(), bridges mio → tokio naturally |
1.5 Transport Decision: Why Not WebRTC
WebRTC was evaluated and rejected as the primary transport for this use case:
| Factor | WebRTC (webrtc-rs) | WebTransport + WebCodecs |
|---|---|---|
| Glass-to-glass latency | 30-110ms (unavoidable 20-60ms jitter buffer) | 12-38ms (no jitter buffer) |
| Rust ecosystem | webrtc-rs v0.20.0-alpha, mid-rewrite | wtransport production-grade, built on quinn |
| Protocol overhead | ICE/DTLS/SRTP/SDP — designed for P2P NAT traversal | QUIC TLS 1.3 — server-to-client, simpler |
| Decode control | Browser controls jitter buffer, cannot opt out | Application controls every frame decode |
| GPU data path | Sample { data: Bytes }, must copy to CPU | Same copy, but shorter pipeline |
| Browser support | All browsers | Chrome/Edge only (Firefox lacks WebCodecs) |
Transport library choice: We use the wtransport crate (v0.7) instead of raw quinn + h3. The browser's WebTransport API requires a full HTTP/3 server with the WebTransport extension (RFC 9297). Raw QUIC is NOT sufficient — there is no browser API for raw QUIC connections. The wtransport crate provides the complete protocol stack (HTTP/3 + WebTransport) built on top of quinn 0.11 and rustls 0.23, with support for datagrams, unidirectional streams, and bidirectional streams.
WebRTC will be added as a Phase 3 fallback for Firefox compatibility.
2. Architecture
2.1 Thread Model
┌─────────────────────────────────────────────────────────────┐
│ wl-webrtc process │
│ │
│ Main Thread (mio event loop) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Wayland event queue dispatch │ │
│ │ Screen capture (DMA-BUF, zero-copy from compositor) │ │
│ │ GPU encode (FFmpeg VAAPI/Vulkan, sync calls) │ │
│ │ State machine transitions │ │
│ │ FPS limiting │ │
│ └──────────────────────┬───────────────────────────────┘ │
│ │ │
│ async_channel::bounded<16>(EncodedFrame) │
│ │ │
│ Tokio Runtime Thread Pool (2+ threads) │
│ ┌──────────────────────▼───────────────────────────────┐ │
│ │ wtransport WebTransport server │ │
│ │ HTTP/3 + WebTransport session management │ │
│ │ Frame distribution to connected clients │ │
│ │ axum HTTP server (Web UI + control API) │ │
│ │ rust-embed static file serving │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Design rationale:
- Capture + encode on main thread: GPU encoding is synchronous (3-8ms per frame at 30-60fps), doesn't block the mio event loop at these frame rates. This avoids cross-thread synchronization for the GPU pipeline.
- wtransport on tokio: wtransport is built on quinn and tokio. axum requires tokio. Both coexist naturally. Both the WebTransport server and the HTTP static file server share the same tokio runtime.
- async_channel::bounded(16): Channel capacity of 16 frames provides ~260ms of buffer at 60fps — enough to absorb transport jitter without excessive latency. The sender uses
try_send(): if the channel is full, the frame is dropped and logged. This is standard practice in real-time streaming — newer frames are always more valuable than older ones.try_send()returnsErr(TrySendError::Full(_))on a full channel, which the main loop handles by discarding the frame. This avoids blocking the main mio event loop, which must remain responsive for Wayland event dispatch. Do NOT usesend_blocking()on the mio thread — it would stall the capture pipeline if the transport consumer falls behind.
2.2 Module Dependency Graph
┌──────────┐
│ main.rs │ entry point, CLI, orchestration
└──┬──┬─┬──┘
│ │ │
┌────────────┘ │ └────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌────────────┐
│ state.rs │ │ avhw.rs │ │ transport.rs│
│ StateMachine │ HW ctx │ │ QUIC server │
│ CaptureSource │ │ │ Sessions │
└──┬───┬────┘ └────┬─────┘ └────────────┘
│ │ │
┌─────┘ └──────┐ │
▼ ▼ ▼
┌─────────┐ ┌────────────┐ ┌────────┐
│cap_wlr_ │ │cap_ext_ │ │filter.rs│
│screen │ │image_copy │ │ crop/ │
│copy │ │ │ │ scale/ │
└─────────┘ └────────────┘ │transpose│
└────────┘
┌────────────┐ ┌──────────────┐
│transform.rs│ │signaling.rs │
│ coordinate │ │ axum + embed │
│ transform │ │ Web UI serve │
└────────────┘ └──────────────┘
┌────────────┐
│fps_limit.rs│
└────────────┘
Dependency layers (bottom-up):
transform.rs,fps_limit.rs— leaf modules, zero internal dependenciesavhw.rs,filter.rs— FFmpeg wrapper layercap_wlr_screencopy.rs,cap_ext_image_copy.rs— capture backends, depend on state + avhwstate.rs— state machine + CaptureSource traittransport.rs,signaling.rs— network layermain.rs— orchestration
2.3 Project File Structure
wl-webrtc/
├── Cargo.toml
├── README.md
├── src/
│ ├── main.rs # ~300 lines — CLI, startup, orchestration
│ ├── state.rs # ~600 lines — State<S>, EncConstructionStage, InFlightSurface
│ ├── avhw.rs # ~450 lines — FFmpeg HW device/frame contexts
│ ├── filter.rs # ~200 lines — FFmpeg video filter graph
│ ├── cap_wlr_screencopy.rs # ~170 lines — wlr-screencopy backend
│ ├── cap_ext_image_copy.rs # ~240 lines — ext-image-copy-capture backend
│ ├── transform.rs # ~220 lines — coordinate transforms
│ ├── fps_limit.rs # ~130 lines — VRR-aware frame rate limiter
│ ├── transport.rs # ~400 lines — QUIC/WebTransport server
│ ├── signaling.rs # ~200 lines — axum HTTP + WebSocket control
│ └── nalu.rs # ~150 lines — Annex B NAL unit splitting, framing protocol
├── static/
│ ├── index.html # Web UI shell
│ ├── player.js # WebCodecs decoder + Canvas renderer
│ └── style.css # Minimal styling
└── protocols/ # Wayland protocol XML files
3. Data Flow
3.1 Zero-Copy Capture Pipeline
GPU Frame Pool ─alloc()→ HW Surface
↓
av_hwframe_map → DMA-BUF fd
↓
zwp_linux_dmabuf → WlBuffer (fd shared)
↓
Compositor writes directly to GPU Surface
↓
FFmpeg VAAPI/Vulkan encode (GPU-internal)
↓
AVPacket.data (Annex B with 00 00 00 01 start codes)
↓ ← GPU→CPU copy via vaMapBuffer (unavoidable)
Bytes::from(Vec<u8>) wrapper
↓
async_channel::bounded::send(EncodedFrame) // sync, non-blocking on main thread
3.2 Transport Pipeline
async_channel::bounded::recv(EncodedFrame)
↓
Frame byte-splitting at MTU boundaries (not NAL-aligned)
↓
┌─ Keyframe → QUIC reliable stream (guaranteed delivery)
└─ Delta frame → QUIC datagram (unreliable, low latency)
↓
Quinn WebTransport send
↓
Browser WebTransport.receive()
↓
Frame reassembly (if fragmented)
↓
WebCodecs VideoDecoder.decode(EncodedVideoChunk)
↓
Canvas.drawImage(VideoFrame)
3.3 Latency Budget
| Stage | Latency | Notes |
|---|---|---|
| Wayland capture (KMS/dmabuf) | 1-3ms | Zero-copy from compositor |
| GPU encode (VAAPI H.264) | 3-8ms | Synchronous, main thread |
| vaMapBuffer CPU copy | <1ms | Unavoidable GPU→CPU |
| async_channel | <0.1ms | In-process |
| QUIC datagram (LAN) | 1-10ms | LAN transit, merged with network |
| WebCodecs decode | 2-5ms | Browser hardware decode |
| Canvas render | 1-2ms | requestAnimationFrame |
| Total (LAN) | 9-29ms | Well under 50ms target (corrected: removed double-counted network transit) |
3.4 EncodedFrame Structure
#[derive(Clone)]
struct EncodedFrame {
data: Bytes, // Annex B NALUs with start codes
pts_us: i64, // Presentation timestamp (microseconds, for WebCodecs)
duration: Duration, // Frame duration for timestamp calculation
frame_type: FrameType, // Keyframe or Delta (matches transport framing)
width: u32, // Frame width (may differ from capture on ROI)
height: u32, // Frame height
}
Timestamp convention: pts_us is in microseconds (not nanoseconds), matching WebCodecs' EncodedVideoChunk.timestamp requirement. The server tracks a monotonic PTS starting from 0, incrementing by 1_000_000 / fps per frame.
4. State Machine
4.1 EncConstructionStage
┌──────────────────┐
App start │ ProbingOutputs │ Discover Wayland outputs,
│ └────────┬─────────┘ collect geometry info
▼ │ All outputs probed
┌───────────────┐ ▼
│ ProbingOutputs├──→ ┌──────────────────┐
└───────────────┘ │EverythingButFmt │ HW device ctx created,
└────────┬─────────┘ encoder initialized
│ negotiate_format()
▼
┌───────────┐
┌─────→│ Streaming │──── Active capture + encode + transport
│ └─────┬─────┘
│ │ Output disconnected
│ Format │ ┌──────────────┐
│ changed │ │OutputWentAway│ Keep enc + transport,
│ │ └──────┬───────┘ drop capture objects
└────────────┘ │ Same output reconnects
←───────────────────────┘
Intermediate transient exists at all transition arrows (mem::replace)
Key design choice: Streaming state holds both EncState (encoding pipeline) AND TransportState (active WebTransport sessions). On OutputWentAway, both are preserved — only capture objects are discarded.
4.2 InFlightSurface
None → AllocQueued → Allocd(Frame) → CopyQueued { surface, drm_map, frame, buffer } → None
4-state enum with assert!(matches!(...)) runtime guards. RAII cleanup on each state transition. Single-frame-in-flight constraint prevents buffer exhaustion.
4.3 TransportSessionState (new)
┌───────────┐ connect ┌───────────┐ disconnect ┌───────────┐
│ Listening │ ──────────────→ │ Active │ ──────────────→ │ Closed │
│ (quinn │ │ (sending │ │ (cleanup) │
│ endpoint)│ │ frames) │ │ │
└───────────┘ └───────────┘ └───────────┘
Multiple sessions can be Active simultaneously (Phase 2). Phase 1 supports exactly one.
5. Design Patterns
The architecture employs several established software design patterns for managing complexity:
| # | Pattern | Usage in wl-webrtc |
|---|---|---|
| 1 | Strategy Trait + Generic State | CaptureSource trait with CapWlrScreencopy / CapExtImageCopy backends |
| 2 | Polymorphic Enum State Machine | EncConstructionStage — 5 variants with type-safe transitions |
| 3 | Type-Safe Frame Lifecycle | InFlightSurface — 4-state enum with runtime guards |
| 4 | Pin<Box> Self-Referential | Vulkan device context — for self-referential FFmpeg structs |
| 5 | Independent Thread Pipe | tokio runtime replaces mpsc audio thread; same atomic flag pattern |
| 6 | VRR-Aware Frame Rate Control | FpsLimit<T> — one-frame-buffer delay for correct drop decisions |
| 7 | Generic Dispatch 3-Layer | Wayland protocol dispatch — generic event handling |
| 8 | Three-Stage Safe Construction | Incremental resource acquisition with partial state rollback |
| 9 | Hot-Plug Auto-Recovery | OutputWentAway — preserve encoder/transport, rebuild capture |
| 10 | Zero-Copy GPU Pipeline | DMA-BUF capture + GPU-internal encode, minimal CPU involvement |
6. Transport Protocol Design
6.1 WebTransport Connection Setup
Server generates self-signed TLS certificate (via wtransport built-in rcgen support)
→ wtransport::Endpoint::server(server_config, addr)
→ Browser: new WebTransport("https://server:PORT/wt")
→ wtransport handles full HTTP/3 + WebTransport handshake internally
→ Session established (datagrams + streams available)
Transport library: We use wtransport crate (v0.7) which provides a complete WebTransport-over-HTTP/3 server implementation built on top of quinn 0.11 and rustls 0.23. This handles all protocol details (HTTP/3 SETTINGS, CONNECT method with :protocol = webtransport, session management, datagram framing per RFC 9297). Raw quinn or h3 would require building this protocol stack manually.
6.2 Frame Framing Protocol
QUIC datagrams have a practical MTU of ~1200 bytes. A 1080p H.264 frame is typically 10KB-200KB. Application-level framing:
Datagram format:
┌──────────┬──────────┬──────────┬──────────┬──────────┬─────────────┐
│ type (1) │ frame_id │ pts_us │ seq_num │ total │ payload │
│ │ (4 bytes)│ (8 bytes)│ (2 bytes)│ (2 bytes)│ (variable) │
└──────────┴──────────┴──────────┴──────────┴──────────┴─────────────┘
type:
0x01 = Keyframe fragment (sent via reliable stream, not datagram)
0x02 = Delta frame fragment (sent via datagram)
0x03 = Keyframe complete (small enough for single datagram)
0x04 = Delta frame complete
0x10 = Codec config (SPS/PPS for H.264, VPS/SPS/PPS for HEVC)
pts_us: Presentation timestamp in microseconds (i64, big-endian).
Passed directly to WebCodecs EncodedVideoChunk.timestamp.
For fragmented frames, every fragment carries the same pts_us.
Key design decisions:
- Keyframes via reliable WebTransport stream: SPS/PPS + IDR data must not be lost. Use
session.open_uni().awaitfor reliable delivery. - Delta frames via datagram: Loss-tolerant. If a delta frame is lost, the decoder waits for the next keyframe. This avoids accumulated corruption.
- Frame reassembly in browser: Buffer fragments by
frame_id, reassemble when alltotalfragments arrive, decode complete frame. - Timestamp in microseconds: The fragment header carries
pts_us: i64(presentation timestamp in microseconds) so the browser can pass it directly toEncodedVideoChunk.timestamp. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp.
6.3 Codec Configuration Exchange
The encoder MUST be configured with the h264_metadata bitstream filter (repeat_sps=1 repeat_pps=1) to guarantee SPS/PPS are injected into every IDR frame. Note: repeat_headers=1 is a libx264-only option and does NOT exist for h264_vaapi. The browser configures the decoder in Annex B mode (no description at configure() time), and SPS/PPS arrive in-band with each keyframe.
On session establishment, the server sends a codec configuration message over the reliable QUIC stream to inform the browser of the codec and dimensions:
{
"type": "codec_config",
"codec": "avc1.42E01F",
"width": 1920,
"height": 1080,
"framerate": 60
}
Browser uses this to configure VideoDecoder — without description, which activates Annex B mode:
decoder.configure({
codec: config.codec,
codedWidth: config.width,
codedHeight: config.height,
// NO description — Annex B mode. SPS/PPS arrive in-band with each keyframe.
});
Why no AVCC description? Per the WebCodecs AVC registration spec, providing description forces the decoder into AVC (length-prefixed) mode for ALL frames. Since our encoder outputs Annex B (start-code-prefixed), we must omit description and rely on in-band parameter sets guaranteed by the h264_metadata BSF (repeat_sps=1 repeat_pps=1). Note: repeat_headers=1 is a libx264-only option — it does NOT work with h264_vaapi.
Timestamp handling: The FragmentHeader carries both a frame_id (u32) for reassembly ordering and pts_us (i64) — the presentation timestamp in microseconds. The browser uses pts_us directly as EncodedVideoChunk.timestamp. This is required by WebCodecs — a sequential frame_id counter is NOT a valid timestamp. Every fragment of a frame carries the same pts_us value so the browser can extract it from any fragment during reassembly.
7. Browser-Side Design
7.1 Web UI (static/index.html + player.js)
Single-page application with minimal dependencies:
┌──────────────────────────────────────┐
│ wl-webrtc │
│ ┌──────────────────────────────┐ │
│ │ │ │
│ │ <canvas> (video) │ │
│ │ WebCodecs → drawImage │ │
│ │ │ │
│ └──────────────────────────────┘ │
│ Status: Connected | Latency: 23ms │
│ Resolution: 1920x1080 @ 60fps │
│ [Fullscreen] [Disconnect] │
└──────────────────────────────────────┘
7.2 WebCodecs Decoder Pipeline
CRITICAL: Annex B mode only. Per the W3C AVC WebCodecs Registration, if description is provided at configure() time, ALL subsequent EncodedVideoChunk data must be in AVC format (4-byte length-prefixed). If description is absent, the bitstream is assumed to be in Annex B format (start-code-prefixed). Since our encoder outputs Annex B, we must NOT provide description.
The encoder MUST be configured with the h264_metadata bitstream filter (repeat_sps=1 repeat_pps=1) to guarantee SPS/PPS are injected into every IDR frame. Note: repeat_headers=1 is a libx264-only option and does NOT exist for h264_vaapi. This enables the decoder to initialize from keyframe data alone.
// Simplified player.js flow
const transport = new WebTransport("https://server:PORT/wt");
const decoder = new VideoDecoder({
output: (frame) => {
ctx.drawImage(frame, 0, 0);
frame.close();
},
error: (e) => console.error(e),
});
// Configure WITHOUT description → Annex B mode.
// SPS/PPS are delivered in-band with each keyframe (via h264_metadata BSF repeat_sps=1 repeat_pps=1 on encoder).
decoder.configure({
codec: "avc1.42E01F",
codedWidth: 1920,
codedHeight: 1080,
// NO description field — Annex B mode
});
// Receive frames
const reader = transport.datagrams.readable.getReader();
while (true) {
const { value, done } = await reader.read();
if (done) break;
const frame = reassembleFrame(value);
if (frame.complete) {
decoder.decode(new EncodedVideoChunk({
type: frame.isKeyframe ? "key" : "delta",
timestamp: Number(frame.ptsUs),
data: frame.data, // Annex B — valid because no description was provided
}));
}
}
7.3 No Annex B → AVCC Conversion Needed
Because we configure the decoder in Annex B mode (no description), no format conversion is needed on the browser side. The server sends raw Annex B NAL units with start codes (00 00 00 01), and the decoder accepts them directly.
The encoder MUST be configured with the h264_metadata bitstream filter (repeat_sps=1 repeat_pps=1) to guarantee SPS/PPS are included in every IDR frame. Note: repeat_headers=1 (and -flags2 +repeat_headers) are libx264-only options — they do NOT work with h264_vaapi. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. This ensures the decoder can re-initialize after any keyframe, even if it missed earlier configuration data.
8. Error Handling & Recovery
8.1 Display Hot-Plug
wl_registry.global_remove→ setoutput_went_awayflagon_copy_fail()detects flag → transition toOutputWentAway- Preserve: encoder context, transport sessions, WebRTC connections
- Discard: Wayland protocol objects (invalidated)
- Wait for same-name output ("DP-1") to reappear
- Create new
CaptureSource, reuse old encoder, continue streaming
8.2 Network Disconnection
- QUIC handles keepalive and retransmission internally
- Client page refresh → new WebTransport session → server auto-starts sending current frame stream
- Server is stateless per session — no recovery needed, just reconnect
8.3 Dynamic Format Change
Capture format changes (resolution, rotation):
- Rebuild:
frames_rgb,video_filter,enc_video,frames_yuv - Preserve:
hw_device_ctx,transport_state - Send new codec configuration to browser via reliable stream
- Browser reconfigures
VideoDecoderwith new SPS/PPS and dimensions
8.4 Frame Loss Handling
- Lost delta frame → decoder continues, minor artifact until next keyframe
- Lost keyframe → decoder cannot continue → request keyframe from server via reliable stream
- Server receives keyframe request → sets next input frame to
AV_PICTURE_TYPE_I
8.5 Graceful Shutdown
Shutdown is triggered by SIGINT/SIGTERM via signal-hook + mio integration:
- Main loop sets
running = falseflag → stops queuing new captures - Wait for in-flight frame to complete (drain
InFlightSurface) - Flush encoder (
avcodec_flush_buffers) → drain remaining packets - Send final frames through channel
- Drop
frame_txsender → signals EOF to transport - Transport server drains pending frames, sends GOAWAY to clients
tokio::runtime::shutdown_background()terminates async tasks- Drop Wayland protocol objects (compositor handles cleanup)
- FFmpeg contexts freed via
Dropimplementations
Key concern: Do NOT use blocking send_blocking() on the main thread — use try_send() so the main loop never stalls during shutdown. If the channel is full, the frame is dropped (acceptable during shutdown).
NOTE: wayland-client 0.31 uses Connection::connect_to_env() and GlobalList instead of the old 0.29 API (Display::connect_to_env() / GlobalManager::new()). See plan Task 11 for correct API usage.
8.6 First Keyframe Delivery
When a new WebTransport session is established, the client needs a keyframe before it can decode any delta frames. Two strategies:
- Force IDR on connect: Set
AV_PICTURE_TYPE_Ion the next encoded frame when a new session is detected - Buffer last keyframe: Store the most recent keyframe in
TransportServer, resend to new clients
Phase 1 uses strategy 1 (force IDR) for simplicity. The transport server sets a needs_keyframe: bool flag on new sessions, which the encode loop checks.
9. Dependencies
[dependencies]
# Wayland screen capture
wayland-client = "0.31"
wayland-protocols = { version = "0.32", features = ["client", "unstable", "staging"] }
wayland-protocols-wlr = { version = "0.3", features = ["client"] }
drm-fourcc = "2"
# GPU encoding
ffmpeg-next = "8"
# WebTransport (HTTP/3 + WebTransport protocol, built on quinn + rustls)
wtransport = { version = "0.7", features = ["self-signed"] }
# Web UI
axum = { version = "0.8", features = ["ws"] }
tower-http = { version = "0.6", features = ["cors"] }
rust-embed = { version = "8", features = ["mime-guess"] }
# Async runtime
tokio = { version = "1", features = ["full"] }
# Sync/async bridge (sync send() on mio thread, async recv() on tokio)
async-channel = "2"
# Event loop
mio = "1"
# Utilities
clap = { version = "4", features = ["derive"] }
tracing = "0.1"
tracing-subscriber = "0.3"
anyhow = "1"
bytes = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
signal-hook = { version = "0.3", features = ["iterator"] }
base64 = "0.22"
mime_guess = "2"
Encoder configuration note: The VAAPI H.264 encoder MUST be configured with the h264_metadata bitstream filter (repeat_sps=1 repeat_pps=1) to guarantee SPS/PPS parameter sets are emitted in-band with every IDR frame. This is required for WebCodecs Annex B decode mode on the browser side. Important: repeat_headers=1 and -flags2 +repeat_headers are libx264-only options — they do NOT work with h264_vaapi. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders.
10. Implementation Phases
Phase 1 — MVP: Screen → Browser Streaming
| # | Module | Description | Estimated Effort |
|---|---|---|---|
| 1 | main.rs |
CLI args, startup sequence | Small |
| 2 | cap_*.rs |
Implement capture backends (wlr-screencopy + ext-image-copy) | Medium |
| 3 | avhw.rs |
Implement FFmpeg HW device/frame context management | Medium |
| 4 | filter.rs |
Implement GPU video filter graph | Small |
| 5 | transform.rs |
Implement coordinate transforms for Wayland outputs | Small |
| 6 | fps_limit.rs |
Implement VRR-aware frame rate limiter | Small |
| 7 | state.rs |
State machine adapted for transport | Medium |
| 8 | transport.rs |
QUIC server + frame distribution | Large (new code) |
| 9 | nalu.rs |
Annex B framing protocol | Small (new code) |
| 10 | signaling.rs |
axum server + static files | Small (new code) |
| 11 | static/* |
Browser Web UI + WebCodecs player | Medium (new code) |
Deliverable: Run wl-webrtc, open https://localhost:PORT in Chrome, see live screen at <50ms latency.
Phase 2 — Remote Input + Stability
| # | Feature | Description |
|---|---|---|
| 12 | Remote input | Browser mouse/keyboard → wlr-virtual-pointer/virtual-keyboard |
| 13 | Hot-plug recovery | Display disconnect/reconnect |
| 14 | Dynamic format | Resolution/rotation change handling |
| 15 | Multi-client | Multiple simultaneous browser viewers |
Phase 3 — Optimization + Compatibility
| # | Feature | Description |
|---|---|---|
| 16 | Adaptive bitrate | Network-aware VAAPI bit_rate adjustment |
| 17 | Audio pipeline | Synchronous audio capture + encoding + transport |
| 18 | WebRTC fallback | webrtc-rs path for Firefox compatibility |
| 19 | Performance dashboard | Real-time stats in Web UI |
11. Open Questions
-
ffmpeg-next vs direct VAAPI bindings: ffmpeg-next adds FFI overhead but provides mature encoding pipeline. Direct vaapi-dmabuf bindings would be more Rust-native but much more implementation work. Decision: ffmpeg-next for Phase 1, evaluate direct bindings in Phase 3. NOTE:
ffmpeg-nextsafe API does NOT wrap hardware contexts (AVBufferRef,AVHWFramesContext). Use rawffmpeg_next::ffidirectly for all HW context operations — seewl-screenrec/src/avhw.rsfor the reference pattern. -
Frame fragmentation strategy: Current design fragments large frames across QUIC datagrams at byte boundaries (not NAL-aligned). The framing protocol reassembles by
frame_id, so a lost fragment invalidates the entire frame. Alternative: send all frames via reliable QUIC streams and accept slightly higher latency. Decision: Start with datagrams for delta frames, measure latency, evaluate. -
Self-signed certificate UX: Browser will show SSL warning. Options: (a) accept for LAN, (b) guide user to trust CA, (c) use HTTP/2 prior knowledge. Decision: Accept for Phase 1, add CA trust guide in Phase 2.
-
HEVC vs H.264 default: H.264 has universal browser support. HEVC has better compression but spotty browser support. Decision: H.264 default, HEVC as option flag.
-
WebCodecs bitstream format: Decision: Annex B mode (no
descriptionat configure time). SPS/PPS are guaranteed in-band via theh264_metadataBSF (repeat_sps=1repeat_pps=1). Important: Therepeat_headers=1encoder option is libx264-only — it does NOT work withh264_vaapi. The BSF approach is encoder-agnostic and works with all FFmpeg hardware encoders. Per the W3C AVC WebCodecs Registration, providingdescriptionforces AVC (length-prefixed) mode for ALL subsequent frames. Since our encoder outputs Annex B, we must omitdescription.