Phase 1 MVP implementation of wl-webrtc: Wayland screen capture tool with hardware-accelerated VAAPI H.264 encoding and WebTransport output. Includes all 9 runtime bug fixes from code audit (fix-audit-issues plan): CRITICAL: - C2: h264_metadata BSF with repeat_sps/repeat_pps in encode pipeline - C4: FpsLimit wired as timing gate in on_copy_complete HIGH: - C3+A2: DRM device discovery via dmabuf feedback MainDevice event, unified resolve_drm_path() helper (CLI > compositor > auto > fallback) - H2: Separate physical_size (mm) from mode_size (pixels) in wl_output - H1+A3: Multi-output warning + named-output-not-found error MEDIUM: - M5: tv_sec u32->u64 to avoid Y2106 timestamp truncation - M4: Guard against SHM Buffer event (DMA-BUF only) Key components: - src/avhw.rs: FFmpeg VAAPI encoder + filter graph + BSF pipeline - src/state.rs: Wayland event loop + output negotiation + screencopy - src/cap_wlr_screencopy.rs: wlr-screencopy capture source - src/fps_limit.rs: Frame rate limiting with configurable target - src/transform.rs: Frame format conversion utilities
90 KiB
wl-webrtc Phase 1 Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build a working Wayland screen → browser streaming pipeline with <50ms LAN latency.
Architecture: mio event loop on main thread handles Wayland capture + GPU encode. tokio runtime handles WebTransport + Web UI. async_channel bridges the two (sync send on mio, async recv on tokio). Browser uses WebCodecs for direct decode.
Tech Stack: Rust, wayland-client, ffmpeg-next (VAAPI/Vulkan), wtransport (WebTransport over HTTP/3), axum + rust-embed (Web UI), WebCodecs (browser)
Reference: Design spec at docs/superpowers/specs/2026-04-03-wl-webrtc-architecture-design.md. Source analysis at analysis.md.
File Structure
wl-webrtc/
├── Cargo.toml
├── build.rs # FFmpeg library detection
├── src/
│ ├── main.rs # CLI + startup + mio event loop
│ ├── args.rs # CLI argument definitions (clap)
│ ├── cap_ext_image_copy.rs # ext-image-copy-capture backend
│ ├── cap_wlr_screencopy.rs # wlr-screencopy backend
│ ├── avhw.rs # FFmpeg HW device/frame contexts
│ ├── filter.rs # FFmpeg video filter graph
│ ├── transform.rs # Coordinate transforms
│ ├── fps_limit.rs # VRR-aware frame rate limiter
│ ├── state.rs # State machine + CaptureSource trait
│ ├── transport.rs # QUIC/WebTransport server (new)
│ ├── signaling.rs # axum HTTP + static files (new)
│ └── nalu.rs # Annex B NAL unit framing (new)
├── static/
│ ├── index.html # Web UI shell
│ ├── player.js # WebCodecs decoder + Canvas renderer
│ └── style.css # Minimal styling
└── tests/
├── integration_test.rs # End-to-end test harness
└── nalu_test.rs # NAL unit framing tests
Task 1: Project Scaffold
Files:
-
Create:
Cargo.toml -
Create:
build.rs -
Create:
src/main.rs -
Create:
src/args.rs -
Step 1: Create Cargo.toml with all dependencies
[package]
name = "wl-webrtc"
version = "0.1.0"
edition = "2021"
description = "Low-latency Wayland screen sharing via WebTransport"
[dependencies]
# Wayland screen capture
wayland-client = "0.31"
wayland-protocols = { version = "0.32", features = ["client", "unstable", "staging"] }
wayland-protocols-wlr = { version = "0.3", features = ["client"] }
drm-fourcc = "2"
# GPU encoding — WARNING: safe API does NOT wrap HW contexts.
# Use ffmpeg_next::ffi directly for AVBufferRef, AVHWFramesContext.
# See wl-screenrec/src/avhw.rs for reference pattern.
ffmpeg-next = "8"
# WebTransport over HTTP/3 (built on quinn + rustls, self-signed cert support)
wtransport = { version = "0.7", features = ["self-signed"] }
# Web UI
axum = { version = "0.8", features = ["ws"] }
tower-http = { version = "0.6", features = ["cors"] }
rust-embed = { version = "8", features = ["mime-guess"] }
# Async runtime
tokio = { version = "1", features = ["full"] }
# Sync/async bridge (sync send() on mio thread, async recv() on tokio)
async-channel = "2"
# Event loop
mio = "1"
# Utilities
clap = { version = "4", features = ["derive"] }
tracing = "0.1"
tracing-subscriber = "0.3"
anyhow = "1"
bytes = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
signal-hook = { version = "0.3", features = ["iterator"] }
base64 = "0.22"
mime_guess = "2"
```rust
fn main() {
// No custom build steps needed.
// ffmpeg-next's own build.rs handles library detection via pkg-config.
}
- Step 3: Create src/args.rs with CLI argument definitions
use clap::Parser;
/// Low-latency Wayland screen sharing via WebTransport
#[derive(Parser, Debug)]
#[command(name = "wl-webrtc", version, about)]
pub struct Args {
/// Display output to capture (e.g. "DP-1"). Defaults to first output.
#[arg(short, long)]
pub output: Option<String>,
/// Region of interest: x,y,w,h (e.g. "100,100,800,600")
#[arg(short, long)]
pub roi: Option<String>,
/// Target framerate
#[arg(short, long, default_value = "60")]
pub fps: u32,
/// Port for WebTransport + Web UI
#[arg(short, long, default_value = "8443")]
pub port: u16,
/// Bind address
#[arg(long, default_value = "0.0.0.0")]
pub bind: String,
/// Video codec: h264 or hevc
#[arg(long, default_value = "h264")]
pub codec: String,
/// Hardware encoding backend: vaapi or vulkan
#[arg(long, default_value = "vaapi")]
pub hw_accel: String,
/// DRM device path (e.g. /dev/dri/renderD128)
#[arg(long)]
pub drm_device: Option<String>,
/// Target bitrate in bits per second
#[arg(long, default_value = "8000000")]
pub bitrate: u64,
/// GOP size (keyframe interval in frames)
#[arg(long, default_value = "30")]
pub gop_size: u32,
/// Enable verbose logging
#[arg(short, long)]
pub verbose: bool,
}
- Step 4: Create minimal src/main.rs that parses args and exits
This is a minimal stub. The full implementation is in Task 11. Do NOT add Wayland, transport, or nalu references here — those modules don't exist yet.
mod args;
use args::Args;
use clap::Parser;
fn main() -> anyhow::Result<()> {
let args = Args::parse();
if args.verbose {
tracing_subscriber::fmt().with_max_level(tracing::Level::DEBUG).init();
} else {
tracing_subscriber::fmt().with_max_level(tracing::Level::INFO).init();
}
tracing::info!("wl-webrtc starting");
tracing::info!("codec={}, hw={}, fps={}, bitrate={}",
args.codec, args.hw_accel, args.fps, args.bitrate);
// Full implementation is in Task 11 (main event loop wiring).
// This stub only verifies args parsing and logging work.
tracing::warn!("Stub mode — event loop not yet implemented. See Task 11.");
Ok(())
}
- Step 5: Verify project compiles
Run: cargo build
Expected: Compiles successfully (may take time for first dependency download)
- Step 6: Commit
git init && git add -A && git commit -m "feat: project scaffold with Cargo.toml, CLI args, main stub"
Task 2: Leaf Modules — transform.rs
Files:
- Create:
src/transform.rs
Port coordinate transformation module. This module handles Wayland output transforms (rotation/flip) and ROI clipping.
Reference: Wayland wl_output transform enum defines 8 orientations. The transform logic is standard 2D affine math (rotation + reflection matrices). See wl_output::Transform in the Wayland protocol spec.
- Step 1: Create src/transform.rs
Implement the following coordinate transformation utilities:
Transformenum — 8 Wayland output transform variants (Normal, _90, _180, _270, Flipped, Flipped90, Flipped180, Flipped270)transform_basis()— maps Transform to 2x2 matrix (a, b, c, d)screen_to_frame()— transforms a rectangle from screen space to frame space using transform matrixtranspose_if_transform_transposed()— swaps width/height for 90/270 degree rotationsfit_inside_bounds()— clips ROI rectangle to stay within output bounds
The module is self-contained with no internal dependencies. Only uses standard library types and drm_fourcc::DrmFourcc if needed for format handling.
Key types:
#[derive(Debug, Clone, Copy)]
pub enum Transform {
Normal,
Normal90,
Normal180,
Normal270,
Flipped,
Flipped90,
Flipped180,
Flipped270,
}
pub struct Rect {
pub x: i32,
pub y: i32,
pub w: i32,
pub h: i32,
}
pub fn screen_to_frame(transform: Transform, rect: Rect, frame_w: i32, frame_h: i32) -> Rect;
pub fn transpose_if_transform_transposed(transform: Transform, w: i32, h: i32) -> (i32, i32);
pub fn fit_inside_bounds(rect: Rect, bounds_w: i32, bounds_h: i32) -> Rect;
-
Step 2: Add
mod transform;to src/main.rs -
Step 3: Verify compilation
Run: cargo build
- Step 4: Commit
git add -A && git commit -m "feat: coordinate transform module for Wayland output transforms"
Task 3: Leaf Modules — fps_limit.rs
Files:
- Create:
src/fps_limit.rs
Implement a VRR-aware frame rate limiter. Uses a one-frame-buffer delay strategy to make correct frame-drop decisions in the presence of variable refresh rate displays.
Reference: The one-frame-buffer approach is a standard technique for VRR displays — buffer the current frame, output the previous frame only if enough time has elapsed, otherwise discard the old frame.
- Step 1: Create src/fps_limit.rs
Implement the FpsLimit<T> struct:
pub struct FpsLimit<T> {
on_deck: Option<(T, std::time::Instant)>,
min_interval: std::time::Duration,
}
impl<T> FpsLimit<T> {
pub fn new(fps: u32) -> Self;
/// Feed a new frame. Returns:
/// - Some(output_frame) if a buffered frame should be displayed
/// - None if no frame is ready to output yet
/// The returned frame is the PREVIOUS frame (on_deck), not the current one.
/// The current frame is stored in on_deck for next call.
pub fn on_new_frame(&mut self, frame: T, timestamp: std::time::Instant) -> Option<T>;
/// Call at end of stream to flush the last buffered frame
pub fn flush(&mut self) -> Option<T>;
}
Logic:
-
First frame: return None, store in on_deck
-
Second frame onwards: check if interval between on_deck timestamp and new frame >= min_interval
- If yes: output on_deck frame, store new frame in on_deck
- If no: discard on_deck (too close), store new frame in on_deck
-
Step 2: Add
mod fps_limit;to src/main.rs -
Step 3: Create tests/fps_limit_test.rs with unit tests
#[cfg(test)]
mod tests {
use super::*;
use std::time::{Duration, Instant};
#[test]
fn first_frame_is_buffered() {
let mut limiter: FpsLimit<u32> = FpsLimit::new(30);
let now = Instant::now();
let result = limiter.on_new_frame(1u32, now);
assert!(result.is_none());
}
#[test]
fn frames_too_close_drops_old() {
let mut limiter: FpsLimit<u32> = FpsLimit::new(30);
let now = Instant::now();
limiter.on_new_frame(1, now);
// Send second frame almost immediately
let result = limiter.on_new_frame(2, now + Duration::from_millis(1));
// Too close → old frame dropped, new frame buffered
assert!(result.is_none());
}
#[test]
fn frames_far_enough_output_old() {
let mut limiter: FpsLimit<u32> = FpsLimit::new(30);
let now = Instant::now();
limiter.on_new_frame(1, now);
// Wait long enough (33ms for 30fps)
let result = limiter.on_new_frame(2, now + Duration::from_millis(40));
assert_eq!(result, Some(1));
}
#[test]
fn flush_returns_last_buffered() {
let mut limiter: FpsLimit<u32> = FpsLimit::new(30);
let now = Instant::now();
limiter.on_new_frame(1, now);
assert_eq!(limiter.flush(), Some(1));
assert_eq!(limiter.flush(), None);
}
}
- Step 4: Run tests
Run: cargo test fps_limit
Expected: All 4 tests pass
- Step 5: Commit
git add -A && git commit -m "feat: VRR-aware FPS limiter with unit tests"
Task 4: NAL Unit Framing — nalu.rs
Files:
- Create:
src/nalu.rs - Create:
tests/nalu_test.rs
New code (not ported). Handles Annex B NAL unit splitting and the framing protocol for QUIC transport.
- Step 1: Create src/nalu.rs with core types
use bytes::Bytes;
/// A single NAL unit extracted from an Annex B bitstream
pub struct NalUnit {
pub nal_type: u8,
pub data: Bytes, // NAL unit data WITHOUT start codes
}
/// Identifies frame type for transport framing
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum FrameType {
Keyframe,
Delta,
}
/// A complete encoded frame ready for transport
#[derive(Clone)]
pub struct EncodedFrame {
pub data: Bytes,
pub pts_us: i64,
pub duration: std::time::Duration,
pub frame_type: FrameType,
pub width: u32,
pub height: u32,
}
/// Fragment header for QUIC datagram framing
#[derive(Debug, Clone)]
pub struct FragmentHeader {
pub frame_type: u8, // 0x01-0x04, 0x10
pub frame_id: u32,
pub pts_us: i64, // Presentation timestamp in microseconds (for WebCodecs)
pub seq_num: u16,
pub total_frags: u16,
}
impl FragmentHeader {
pub const SIZE: usize = 17; // 1 + 4 + 8 + 2 + 2
pub const TYPE_KEYFRAME_FRAG: u8 = 0x01;
pub const TYPE_DELTA_FRAG: u8 = 0x02;
pub const TYPE_KEYFRAME_COMPLETE: u8 = 0x03;
pub const TYPE_DELTA_COMPLETE: u8 = 0x04;
pub const TYPE_CODEC_CONFIG: u8 = 0x10;
pub fn encode(&self) -> [u8; Self::SIZE] {
let mut buf = [0u8; Self::SIZE];
buf[0] = self.frame_type;
buf[1..5].copy_from_slice(&self.frame_id.to_be_bytes());
buf[5..13].copy_from_slice(&self.pts_us.to_be_bytes());
buf[13..15].copy_from_slice(&self.seq_num.to_be_bytes());
buf[15..17].copy_from_slice(&self.total_frags.to_be_bytes());
buf
}
pub fn decode(data: &[u8]) -> anyhow::Result<Self> {
if data.len() < Self::SIZE {
return Err(anyhow::anyhow!("fragment header too short"));
}
Ok(Self {
frame_type: data[0],
frame_id: u32::from_be_bytes([data[1], data[2], data[3], data[4]]),
pts_us: i64::from_be_bytes([data[5], data[6], data[7], data[8], data[9], data[10], data[11], data[12]]),
seq_num: u16::from_be_bytes([data[13], data[14]]),
total_frags: u16::from_be_bytes([data[15], data[16]]),
})
}
}
/// Splits an encoded frame into fragments suitable for QUIC datagrams
pub struct FrameFragmenter {
max_payload_size: usize, // MTU - header size
frame_counter: u32,
}
impl FrameFragmenter {
pub fn new(mtu: usize) -> Self {
Self {
max_payload_size: mtu.saturating_sub(FragmentHeader::SIZE),
frame_counter: 0,
}
}
/// Get the current frame counter value (next frame_id that will be assigned)
pub fn current_frame_id(&self) -> u32 {
self.frame_counter
}
/// Fragment an encoded frame into QUIC-datagram-sized chunks.
/// Returns Vec of (FragmentHeader, payload) pairs.
/// Keyframes should be sent via reliable stream instead —
/// this method is primarily for delta frames.
pub fn fragment(&mut self, frame: &EncodedFrame) -> Vec<(FragmentHeader, Bytes)> {
let frame_id = self.frame_counter;
self.frame_counter += 1;
let frag_type = match frame.frame_type {
FrameType::Keyframe => FragmentHeader::TYPE_KEYFRAME_FRAG,
FrameType::Delta => FragmentHeader::TYPE_DELTA_FRAG,
};
if frame.data.len() <= self.max_payload_size {
// Fits in a single datagram
let complete_type = match frame.frame_type {
FrameType::Keyframe => FragmentHeader::TYPE_KEYFRAME_COMPLETE,
FrameType::Delta => FragmentHeader::TYPE_DELTA_COMPLETE,
};
return vec![(
FragmentHeader {
frame_type: complete_type,
frame_id,
pts_us: frame.pts_us,
seq_num: 0,
total_frags: 1,
},
frame.data.clone(),
)];
}
// Need fragmentation
let total_frags = (frame.data.len() + self.max_payload_size - 1) / self.max_payload_size;
let total_frags = total_frags as u16;
frame.data
.chunks(self.max_payload_size)
.enumerate()
.map(|(i, chunk)| {
(
FragmentHeader {
frame_type: frag_type,
frame_id,
pts_us: frame.pts_us,
seq_num: i as u16,
total_frags,
},
Bytes::copy_from_slice(chunk),
)
})
.collect()
}
}
/// H.264 NAL unit type constants
pub mod h264 {
pub const IDR: u8 = 5;
pub const SPS: u8 = 7;
pub const PPS: u8 = 8;
}
/// HEVC NAL unit type constants
pub mod hevc {
pub const VPS: u8 = 32;
pub const SPS: u8 = 33;
pub const PPS: u8 = 34;
pub const IDR_W_RADL: u8 = 19;
pub const IDR_N_LP: u8 = 20;
}
/// Find Annex B start code positions in a byte buffer.
/// Returns positions and lengths of start codes (3 or 4 bytes).
pub fn find_annex_b_start_codes(data: &[u8]) -> Vec<(usize, usize)> {
let mut positions = Vec::new();
let mut i = 0;
while i + 3 <= data.len() {
if data[i..i+3] == [0, 0, 1] {
if i > 0 && data[i-1] == 0 {
positions.push((i - 1, 4)); // 4-byte start code
} else {
positions.push((i, 3)); // 3-byte start code
}
i += 3;
} else {
i += 1;
}
}
positions
}
/// Determine if an Annex B encoded frame is a keyframe by checking NAL types.
/// For H.264: checks for IDR (type 5) NAL units.
/// For HEVC: checks for IDR_W_RADL (19) or IDR_N_LP (20) NAL types.
pub fn is_keyframe(data: &[u8], is_hevc: bool) -> bool {
let start_codes = find_annex_b_start_codes(data);
if start_codes.is_empty() {
return false;
}
for i in 0..start_codes.len() {
let nal_start = start_codes[i].0 + start_codes[i].1;
if nal_start >= data.len() {
continue;
}
let nal_type = if is_hevc {
(data[nal_start] >> 1) & 0x3f
} else {
data[nal_start] & 0x1f
};
if is_hevc {
if nal_type == hevc::IDR_W_RADL || nal_type == hevc::IDR_N_LP {
return true;
}
} else {
if nal_type == h264::IDR {
return true;
}
}
}
false
}
-
Step 2: Add
mod nalu;to src/main.rs -
Step 3: Create tests/nalu_test.rs
use wl_webrtc::nalu::*;
#[test]
fn test_find_start_codes_single_nal() {
// 4-byte start code + 1 byte NAL
let data: &[u8] = &[0x00, 0x00, 0x00, 0x01, 0x65, 0xFF, 0xFF];
let sc = find_annex_b_start_codes(data);
assert_eq!(sc.len(), 1);
assert_eq!(sc[0], (0, 4));
}
#[test]
fn test_find_start_codes_multiple_nals() {
// SPS + PPS + IDR
let data: Vec<u8> = [
&[0x00, 0x00, 0x00, 0x01][..], // start code
&[0x67][..], // SPS (type 7)
&[0x00, 0x00, 0x00, 0x01][..], // start code
&[0x68][..], // PPS (type 8)
&[0x00, 0x00, 0x00, 0x01][..], // start code
&[0x65, 0xFF][..], // IDR (type 5)
].concat();
let sc = find_annex_b_start_codes(&data);
assert_eq!(sc.len(), 3);
}
#[test]
fn test_is_keyframe_h264_idr() {
let data: Vec<u8> = [
&[0x00, 0x00, 0x00, 0x01][..],
&[0x65, 0xFF][..], // IDR NAL type 5
].concat();
assert!(is_keyframe(&data, false));
}
#[test]
fn test_is_keyframe_h264_non_idr() {
let data: Vec<u8> = [
&[0x00, 0x00, 0x00, 0x01][..],
&[0x41, 0xFF][..], // Non-IDR slice (type 1)
].concat();
assert!(!is_keyframe(&data, false));
}
#[test]
fn test_fragment_header_roundtrip() {
let header = FragmentHeader {
frame_type: FragmentHeader::TYPE_DELTA_FRAG,
frame_id: 42,
pts_us: 12345678,
seq_num: 3,
total_frags: 7,
};
let encoded = header.encode();
let decoded = FragmentHeader::decode(&encoded).unwrap();
assert_eq!(decoded.frame_type, header.frame_type);
assert_eq!(decoded.frame_id, header.frame_id);
assert_eq!(decoded.seq_num, header.seq_num);
assert_eq!(decoded.total_frags, header.total_frags);
assert_eq!(decoded.pts_us, header.pts_us);
}
#[test]
fn test_fragmenter_small_frame() {
let mut fragger = FrameFragmenter::new(1200);
let frame = EncodedFrame {
data: Bytes::from(vec![0u8; 500]),
pts_us: 0,
duration: std::time::Duration::from_secs_f64(1.0 / 60.0),
frame_type: FrameType::Delta,
width: 1920,
height: 1080,
};
let frags = fragger.fragment(&frame);
assert_eq!(frags.len(), 1);
assert_eq!(frags[0].0.frame_type, FragmentHeader::TYPE_DELTA_COMPLETE);
assert_eq!(frags[0].0.total_frags, 1);
}
#[test]
fn test_fragmenter_large_frame() {
let mut fragger = FrameFragmenter::new(1200);
let frame = EncodedFrame {
data: Bytes::from(vec![0u8; 5000]),
pts_us: 0,
duration: std::time::Duration::from_secs_f64(1.0 / 60.0),
frame_type: FrameType::Delta,
width: 1920,
height: 1080,
};
let frags = fragger.fragment(&frame);
assert!(frags.len() > 1);
// All fragments should have the same frame_id
let fid = frags[0].0.frame_id;
for (h, _) in &frags {
assert_eq!(h.frame_id, fid);
}
}
- Step 4: Add
[[test]]sections to Cargo.toml and wire up tests
Add to Cargo.toml:
[[test]]
name = "nalu_test"
path = "tests/nalu_test.rs"
- Step 5: Run tests
Run: cargo test nalu_test
Expected: All 7 tests pass
- Step 6: Commit
git add -A && git commit -m "feat: NAL unit framing protocol with Annex B parser and fragmenter"
Task 5: Transport Layer — transport.rs
Files:
- Create:
src/transport.rs
New code. WebTransport server using the wtransport crate (which provides full HTTP/3 + WebTransport protocol support). The browser's WebTransport API requires HTTP/3 with the WebTransport extension — wtransport handles all of this internally, including TLS certificate generation and the HTTP/3 handshake.
Key decisions:
wtransportreplacesquinn+h3+h3-quinn— it bundles the full HTTP/3 + WebTransport server stackasync_channel::Receiverreplacescrossbeam::channel::Receiver— the asyncrecv()method eliminates the sync→async bridge problem- TLS is handled internally by
wtransport(self-signed cert auto-generated) — no manualTlsConfigorrustlsconfig needed - Keyframes → reliable WebTransport stream (
open_uni()); Delta frames → unreliable datagrams (send_datagram()) - Single client session in Phase 1
Port architecture: WebTransport runs over HTTP/3 (UDP). The HTTP static files server (Task 6, axum on TCP) is separate. player.js connects to the WebTransport port. Both can share the same port number if desired (HTTP/3 is UDP, axum is TCP), but player.js must configure SERVER_URL to point to the WebTransport server's address.
- Step 1: Verify transport dependencies in Cargo.toml
Dependencies (wtransport, async-channel) should already be in Cargo.toml from Task 1. If not, add:
wtransport = { version = "0.7", features = ["self-signed"] }
async-channel = "2"
These replace the quinn, rustls, and rcgen deps (remove those if present — wtransport handles TLS internally). The self-signed feature enables Identity::self_signed() for automatic self-signed certificate generation.
- Step 2: Create src/transport.rs
use anyhow::Result;
use async_channel::Receiver;
use std::net::SocketAddr;
use crate::nalu::{EncodedFrame, FrameFragmenter, FragmentHeader, FrameType};
/// Codec configuration extracted from encoder SPS/PPS.
/// Sent to new clients upon session establishment.
pub struct CodecConfig {
pub codec: String, // e.g. "avc1.42E01F"
pub width: u32,
pub height: u32,
pub framerate: u32,
}
/// Manages WebTransport connections and frame distribution.
/// Uses wtransport crate for full HTTP/3 + WebTransport protocol support.
///
/// CRITICAL API NOTES (wtransport 0.7):
/// - `wtransport::Endpoint::server(config)` returns `Result<Endpoint<Server>>` — use `?`
/// - `endpoint.accept().await` returns `IncomingSession` (NOT a Connection)
/// - `incoming_session.await` returns `SessionRequest` (second await)
/// - `session_request.accept().await` returns `Connection` (third await)
/// - `connection.open_uni().await?.await?` — DOUBLE await for unidirectional streams
/// - `connection.send_datagram(data)` — takes `impl AsRef<[u8]>` (`Vec<u8>` works directly)
/// - `connection.receive_datagram().await` — receive datagram
/// - TLS requires `Identity::self_signed()` with `self-signed` feature enabled
pub struct TransportServer {
endpoint: wtransport::endpoint::Endpoint<wtransport::endpoint::endpoint_side::Server>,
frame_rx: Receiver<EncodedFrame>,
codec_config: Option<CodecConfig>,
last_keyframe: Option<EncodedFrame>,
}
impl TransportServer {
/// Create a new WebTransport server.
///
/// Uses Identity::self_signed() for auto-generated TLS certificate.
/// The `self-signed` feature must be enabled in Cargo.toml.
pub fn new(addr: SocketAddr, frame_rx: Receiver<EncodedFrame>) -> Result<Self> {
let identity = wtransport::Identity::self_signed(["localhost", "127.0.0.1", "::1"])
.map_err(|e| anyhow::anyhow!("Failed to generate self-signed certificate: {}", e))?;
let config = wtransport::ServerConfig::builder()
.with_bind_address(addr)
.with_identity(identity)
.keep_alive_interval(Some(std::time::Duration::from_secs(3)))
.build();
let endpoint = wtransport::Endpoint::server(config)?;
Ok(Self {
endpoint,
frame_rx,
codec_config: None,
last_keyframe: None,
})
}
/// Run the transport server loop. Blocks until shutdown.
/// Should be spawned on the tokio runtime.
pub async fn run(mut self) -> Result<()> {
tracing::info!("WebTransport server listening on {}", self.endpoint.local_addr()?);
let mut fragmenter = FrameFragmenter::new(1200);
let mut session: Option<Session> = None;
loop {
tokio::select! {
// Accept new WebTransport connections
// 3-step chain: accept() -> IncomingSession.await -> SessionRequest.accept().await
incoming = self.endpoint.accept() => {
match Self::accept_session(incoming).await {
Ok(connection) => {
tracing::info!("New WebTransport session from {:?}", connection.remote_address());
// Send codec config as first message if available
if let Some(ref config) = self.codec_config {
if let Err(e) = Self::send_codec_config(&connection, config).await {
tracing::warn!("Failed to send codec config: {}", e);
}
}
// Resend last keyframe if available (so new client can decode immediately)
if let Some(ref keyframe) = self.last_keyframe {
if let Err(e) = Self::send_keyframe_data(&connection, keyframe, &mut fragmenter).await {
tracing::warn!("Failed to send last keyframe to new client: {}", e);
}
}
session = Some(Session::new(connection));
}
Err(e) => {
tracing::warn!("Session accept failed: {}", e);
}
}
}
// Receive encoded frames from capture pipeline (async_channel)
frame = self.frame_rx.recv() => {
match frame {
Ok(frame) => {
// Cache keyframes for new client delivery (EncodedFrame derives Clone)
if frame.frame_type == FrameType::Keyframe {
self.last_keyframe = Some(frame.clone());
}
if let Some(s) = &mut session {
if let Err(e) = s.send_frame(&frame, &mut fragmenter).await {
tracing::error!("Frame send error: {}", e);
session = None;
}
}
}
Err(_) => break, // Channel closed — shutdown
}
}
}
}
Ok(())
}
/// Accept an incoming session: 3-step async chain
async fn accept_session(
incoming: wtransport::endpoint::IncomingSession,
) -> Result<wtransport::Connection> {
// Step 1: incoming_session.await -> SessionRequest
let session_request = incoming.await?;
tracing::info!(
"Session request: authority='{}', path='{}'",
session_request.authority(),
session_request.path()
);
// Step 2: session_request.accept().await -> Connection
let connection = session_request.accept().await?;
Ok(connection)
}
/// Send codec configuration to a client via reliable stream.
/// In Annex B mode, we only send codec name and dimensions.
/// SPS/PPS arrive in-band with each keyframe (via `h264_metadata` BSF with `repeat_sps=1`/`repeat_pps=1`).
async fn send_codec_config(
connection: &wtransport::Connection,
config: &CodecConfig,
) -> Result<()> {
let mut stream = connection.open_uni().await?.await?;
// Build JSON config message — NO AVCC description needed in Annex B mode.
// The browser decoder is configured WITHOUT description, which activates Annex B mode.
let config_json = serde_json::json!({
"type": "codec_config",
"codec": config.codec,
"width": config.width,
"height": config.height,
"framerate": config.framerate,
});
// Write type byte + JSON payload
let header = FragmentHeader {
frame_type: FragmentHeader::TYPE_CODEC_CONFIG,
frame_id: 0,
pts_us: 0,
seq_num: 0,
total_frags: 1,
};
stream.write_all(&header.encode()).await?;
stream.write_all(config_json.to_string().as_bytes()).await?;
stream.finish().await?;
Ok(())
}
/// Send a cached keyframe to a new client
async fn send_keyframe_data(
connection: &wtransport::Connection,
frame: &EncodedFrame,
fragmenter: &mut FrameFragmenter,
) -> Result<()> {
let mut stream = connection.open_uni().await?.await?;
let header = FragmentHeader {
frame_type: FragmentHeader::TYPE_KEYFRAME_COMPLETE,
frame_id: fragmenter.current_frame_id(),
pts_us: frame.pts_us,
seq_num: 0,
total_frags: 1,
};
stream.write_all(&header.encode()).await?;
stream.write_all(&frame.data).await?;
stream.finish().await?;
Ok(())
}
/// Update codec configuration (called from encoder when format changes)
pub fn update_codec_config(&mut self, config: CodecConfig) {
self.codec_config = Some(config);
}
}
/// A single connected client session
struct Session {
conn: wtransport::Connection,
needs_keyframe: bool,
}
impl Session {
fn new(conn: wtransport::Connection) -> Self {
Self { conn, needs_keyframe: false }
}
async fn send_frame(
&mut self,
frame: &EncodedFrame,
fragmenter: &mut FrameFragmenter,
) -> Result<()> {
match frame.frame_type {
FrameType::Keyframe => {
// Send keyframes via reliable WebTransport stream
// NOTE: open_uni().await?.await? — double await!
let mut stream = self.conn.open_uni().await?.await?;
// Write fragment header + full data
let header = FragmentHeader {
frame_type: FragmentHeader::TYPE_KEYFRAME_COMPLETE,
frame_id: fragmenter.current_frame_id(),
pts_us: frame.pts_us,
seq_num: 0,
total_frags: 1,
};
stream.write_all(&header.encode()).await?;
stream.write_all(&frame.data).await?;
stream.finish().await?;
}
FrameType::Delta => {
// Send delta frames via datagrams (unreliable, low latency)
let fragments = fragmenter.fragment(frame);
for (header, payload) in fragments {
let mut datagram = Vec::with_capacity(FragmentHeader::SIZE + payload.len());
datagram.extend_from_slice(&header.encode());
datagram.extend_from_slice(&payload);
self.conn.send_datagram(datagram.into())?;
}
}
}
Ok(())
}
}
API notes for the implementer (wtransport 0.7 — VERIFIED against source):
-
wtransport::Identity::self_signed(["localhost", "127.0.0.1", "::1"])generates a self-signed TLS certificate (requiresself-signedfeature) -
wtransport::ServerConfig::builder().with_bind_address(addr).with_identity(identity).build()creates server config -
wtransport::Endpoint::server(config)returnsResult<Endpoint<Server>>— use?operator -
endpoint.accept().awaitreturnsIncomingSession— this is the FIRST step -
incoming_session.awaitreturnsSessionRequest— this is the SECOND step -
session_request.accept().awaitreturnsConnection— this is the THIRD step -
connection.open_uni().await?.await?opens a unidirectional stream — DOUBLE await (first opens QUIC stream, second waits for readiness) -
connection.send_datagram(bytes)sends an unreliable datagram (takesimpl Into<Bytes>) -
connection.receive_datagram().awaitreceives a datagram -
stream.write_all(&data).await?writes data to a stream -
stream.finish().await?closes the stream gracefully -
self.frame_rx.recv()is async (fromasync_channel) — no sync→async bridge needed -
Step 3: Add
mod transport;to src/main.rs -
Step 4: Verify compilation
Run: cargo build
- Step 5: Commit
git add -A && git commit -m "feat: WebTransport server with wtransport for browser streaming"
Task 6: Web UI — signaling.rs + static/*
Files:
-
Create:
src/signaling.rs -
Create:
static/index.html -
Create:
static/player.js -
Create:
static/style.css -
Step 1: Create static/index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>wl-webrtc</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div id="app">
<div id="video-container">
<canvas id="video-canvas"></canvas>
<div id="status-overlay">
<span id="status">Connecting...</span>
</div>
</div>
<div id="controls">
<span id="info">--</span>
<button id="fullscreen-btn">Fullscreen</button>
</div>
</div>
<script src="player.js"></script>
</body>
</html>
- Step 2: Create static/style.css
* { margin: 0; padding: 0; box-sizing: border-box; }
body { background: #000; color: #fff; font-family: monospace; }
#app { display: flex; flex-direction: column; height: 100vh; }
#video-container { flex: 1; position: relative; display: flex; align-items: center; justify-content: center; }
#video-canvas { max-width: 100%; max-height: 100%; }
#status-overlay { position: absolute; top: 0; left: 0; right: 0; padding: 8px; background: rgba(0,0,0,0.7); text-align: center; }
#controls { display: flex; justify-content: space-between; align-items: center; padding: 8px; background: #111; }
#info { font-size: 12px; color: #aaa; }
button { background: #333; color: #fff; border: 1px solid #555; padding: 6px 16px; cursor: pointer; }
button:hover { background: #444; }
- Step 3: Create static/player.js
const canvas = document.getElementById('video-canvas');
const ctx = canvas.getContext('2d');
const statusEl = document.getElementById('status');
const infoEl = document.getElementById('info');
const fullscreenBtn = document.getElementById('fullscreen-btn');
// WebTransport server runs on HTTP/3 (UDP) on args.port.
// HTTP static file server runs on TCP on args.port + 1.
// When loaded from the HTTP server, location.port = args.port + 1.
// WebTransport port is one less than the HTTP port.
const WT_PORT = parseInt(location.port) - 1;
const SERVER_URL = `https://${location.hostname}:${WT_PORT}/wt`;
let decoder = null;
let transport = null;
let frameIdCounter = 0;
let fragmentBuffer = new Map(); // frame_id -> { total, fragments: Map<seq, data> }
let stats = { frames: 0, startTime: 0, latency: 0 };
let codecConfigured = false;
// --- Fragment reassembly ---
const HEADER_SIZE = 17;
const TYPE_KEYFRAME_FRAG = 0x01;
const TYPE_DELTA_FRAG = 0x02;
const TYPE_KEYFRAME_COMPLETE = 0x03;
const TYPE_DELTA_COMPLETE = 0x04;
const TYPE_CODEC_CONFIG = 0x10;
function parseHeader(data) {
const view = new DataView(data.buffer, data.byteOffset, data.byteLength);
return {
frameType: data[0],
frameId: view.getUint32(1),
// pts_us: bytes 5-12 (i64 big-endian, but JS DataView only supports Float64/Int32)
ptsUs: view.getBigInt64(5),
seqNum: view.getUint16(13),
totalFrags: view.getUint16(15),
};
}
function reassembleFrame(header, payload) {
if (header.totalFrags === 1) {
// Complete frame in single packet
return payload;
}
// Buffer fragment
if (!fragmentBuffer.has(header.frameId)) {
fragmentBuffer.set(header.frameId, {
total: header.totalFrags,
fragments: new Map(),
});
}
const buf = fragmentBuffer.get(header.frameId);
buf.fragments.set(header.seqNum, payload);
// Check if complete
if (buf.fragments.size === buf.total) {
const sorted = [...buf.fragments.entries()].sort((a, b) => a[0] - b[0]);
let totalLen = 0;
for (const [, d] of sorted) totalLen += d.byteLength;
const result = new Uint8Array(totalLen);
let offset = 0;
for (const [, d] of sorted) {
result.set(new Uint8Array(d.buffer, d.byteOffset, d.byteLength), offset);
offset += d.byteLength;
}
fragmentBuffer.delete(header.frameId);
return result.buffer;
}
// Cleanup: evict old incomplete frames to prevent memory leak
const MAX_BUFFER_SIZE = 32;
if (fragmentBuffer.size > MAX_BUFFER_SIZE) {
// Delete oldest entries (lowest frameId)
const sortedIds = [...fragmentBuffer.keys()].sort((a, b) => a - b);
const toDelete = sortedIds.slice(0, fragmentBuffer.size - MAX_BUFFER_SIZE);
for (const id of toDelete) {
fragmentBuffer.delete(id);
}
}
// Not complete yet
return null;
}
// --- WebCodecs Decoder ---
function initDecoder(config) {
decoder = new VideoDecoder({
output: (frame) => {
canvas.width = frame.codedWidth;
canvas.height = frame.codedHeight;
ctx.drawImage(frame, 0, 0);
frame.close();
stats.frames++;
if (stats.startTime === 0) stats.startTime = performance.now();
const elapsed = (performance.now() - stats.startTime) / 1000;
if (elapsed > 1) {
const fps = (stats.frames / elapsed).toFixed(1);
infoEl.textContent = `${frame.codedWidth}x${frame.codedHeight} | ${fps} fps`;
}
},
error: (e) => {
console.error('Decoder error:', e);
statusEl.textContent = `Decoder error: ${e.message}`;
},
});
// CRITICAL: Configure WITHOUT description → Annex B mode.
// Per W3C AVC WebCodecs Registration, providing description forces AVC (length-prefixed)
// mode for ALL frames. Since our server sends Annex B (start-code-prefixed),
// we must omit description and rely on in-band SPS/PPS (injected via `h264_metadata` BSF with `repeat_sps=1`/`repeat_pps=1`).
decoder.configure({
codec: config.codec,
codedWidth: config.width,
codedHeight: config.height,
// NO description field — Annex B mode. SPS/PPS arrive in-band with each keyframe.
});
codecConfigured = true;
statusEl.textContent = 'Streaming';
}
// --- Connection ---
async function connect() {
statusEl.textContent = 'Connecting...';
try {
transport = new WebTransport(SERVER_URL);
await transport.ready;
statusEl.textContent = 'Connected, waiting for stream...';
// Read from datagrams for delta frames
readDatagrams();
// Read from unidirectional streams for keyframes and codec config
readStreams();
} catch (e) {
console.error('Connection failed:', e);
statusEl.textContent = `Connection failed: ${e.message}. Retrying in 3s...`;
setTimeout(connect, 3000);
}
}
async function readDatagrams() {
const reader = transport.datagrams.readable.getReader();
try {
while (true) {
const { value, done } = await reader.read();
if (done) break;
const header = parseHeader(new Uint8Array(value, 0, HEADER_SIZE));
const payload = value.slice(HEADER_SIZE);
handleFrame(header, payload);
}
} catch (e) {
console.error('Datagram read error:', e);
}
}
async function readStreams() {
try {
const reader = transport.incoming_unidirectional_streams.getReader();
while (true) {
const { value, done } = await reader.read();
if (done) break;
const stream = value;
const data = await readAll(stream);
const header = parseHeader(new Uint8Array(data, 0, HEADER_SIZE));
const payload = data.slice(HEADER_SIZE);
handleFrame(header, payload);
}
} catch (e) {
console.error('Stream read error:', e);
}
}
async function readAll(stream) {
const reader = stream.readable.getReader();
const chunks = [];
let totalLen = 0;
while (true) {
const { value, done } = await reader.read();
if (done) break;
chunks.push(value);
totalLen += value.byteLength;
}
const result = new Uint8Array(totalLen);
let offset = 0;
for (const chunk of chunks) {
result.set(new Uint8Array(chunk.buffer, chunk.byteOffset, chunk.byteLength), offset);
offset += chunk.byteLength;
}
return result.buffer;
}
function handleFrame(header, payload) {
if (header.frameType === TYPE_CODEC_CONFIG) {
// Codec config message — contains codec name and dimensions only.
// No AVCC description needed: we configure in Annex B mode (no `description` field).
// SPS/PPS arrive in-band with each keyframe (injected via `h264_metadata` BSF).
const config = JSON.parse(new TextDecoder().decode(payload));
initDecoder({
codec: config.codec,
codedWidth: config.width,
codedHeight: config.height,
});
return;
}
if (!codecConfigured || !decoder || decoder.state === 'closed') return;
const frameData = reassembleFrame(header, payload);
if (!frameData) return;
const isKeyframe = header.frameType === TYPE_KEYFRAME_COMPLETE || header.frameType === TYPE_KEYFRAME_FRAG;
const chunk = new EncodedVideoChunk({
type: isKeyframe ? 'key' : 'delta',
// i64 → Number conversion is safe: Number can represent integers up to 2^53 exactly.
// Our PTS starts at 0 and increments by ~16,667μs/frame (60fps).
// Overflow would take ~4.7 million years of continuous streaming.
timestamp: Number(header.ptsUs),
data: frameData,
});
if (decoder.state === 'configured') {
decoder.decode(chunk);
}
}
// --- Fullscreen ---
fullscreenBtn.addEventListener('click', () => {
if (document.fullscreenElement) {
document.exitFullscreen();
} else {
document.getElementById('app').requestFullscreen();
}
});
// --- Start ---
connect();
- Step 4: Create src/signaling.rs
use anyhow::Result;
use axum::{
body::Body,
extract::State,
http::{header, StatusCode},
response::{Html, IntoResponse, Response},
routing::get,
Router,
};
use rust_embed::Embed;
use std::net::SocketAddr;
#[derive(Embed)]
#[folder = "static/"]
struct Asset;
/// Create the axum router for serving static files
pub fn create_router() -> Router {
Router::new()
.route("/", get(index_handler))
.fallback(static_handler)
}
async fn index_handler() -> impl IntoResponse {
static_handler(axum::extract::Path("index.html".to_string())).await
}
async fn static_handler(path: axum::extract::Path<String>) -> impl IntoResponse {
let path = path.0.trim_start_matches('/');
let path = if path.is_empty() { "index.html" } else { path };
match Asset::get(path) {
Some(file) => {
let mime = mime_guess::from_path(path).first_or_octet_stream();
Response::builder()
.header(header::CONTENT_TYPE, mime.as_ref())
.body(Body::from(file.data.to_vec()))
.unwrap()
.into_response()
}
None => StatusCode::NOT_FOUND.into_response(),
}
}
/// Run the HTTP server for static files on the given address.
/// This should be spawned on the tokio runtime.
pub async fn serve(addr: SocketAddr) -> Result<()> {
let app = create_router();
let listener = tokio::net::TcpListener::bind(addr).await?;
tracing::info!("HTTP server listening on {}", addr);
axum::serve(listener, app).await?;
Ok(())
}
-
Step 5: Add
mod signaling;to src/main.rs -
Step 6: Verify compilation
Run: cargo build
- Step 7: Commit
git add -A && git commit -m "feat: Web UI with WebCodecs player and static file serving"
Task 7: FFmpeg Hardware Context — avhw.rs
Files:
- Create:
src/avhw.rs
Manages FFmpeg hardware device and frame contexts for VAAPI/Vulkan encoding.
CRITICAL WARNING: ffmpeg-next crate (v8.x) does NOT provide safe wrappers for AVBufferRef (hardware device context) or AVHWFramesContext (hardware frame context). The safe API covers codec lookup, encoder configuration, filter graph, and encode/decode flow. ALL hardware context operations MUST use raw FFI via ffmpeg_next::ffi.
Reference implementation: See wl-screenrec/src/avhw.rs (https://github.com/russelltg/wl-screenrec/blob/main/src/avhw.rs) — this is the closest real-world reference for this exact pattern.
API References:
-
ffmpeg-next FFI:
ffmpeg_next::ffi::*— raw C API bindings -
Key FFI functions:
ffi::av_hwdevice_ctx_create(&mut ptr, device_type, device_path, opts, flags)— create HW deviceffi::av_hwdevice_ctx_create_derived(&mut dst, type, src, flags)— derive HW device (e.g. DRM → VAAPI)ffi::av_hwframe_ctx_alloc(device_ref)— allocate frame context on deviceffi::av_hwframe_ctx_init(frames_ref)— commit frame context parametersffi::av_buffer_ref(ptr)— increment reference countffi::av_buffer_unref(&mut ptr)— decrement reference count (for Drop)ffi::av_hwframe_get_buffer(frames_ref, frame, 0)— allocate HW surface from pool
-
FFmpeg VAAPI encoding: https://ffmpeg.org/ffmpeg-codecs.html#vaapi
-
VA-API specification: https://01.org/linuxmedia/vaapi
-
Step 1: Create src/avhw.rs
Implement hardware device and frame context management using ffmpeg-next FFI:
Key types:
use ffmpeg_next as ff;
use ffmpeg_next::ffi as ffi;
use std::ptr;
/// Hardware encoding backend selection
pub enum HwAccel {
Vaapi,
Vulkan,
}
/// Encode pixel format — drives three-way dispatch
pub enum EncodePixelFormat {
Vaapi(ff::PixelFormat),
Vulkan(ff::PixelFormat),
Sw(ff::PixelFormat),
}
/// Holds FFmpeg hardware device context.
///
/// Wraps a raw *mut AVBufferRef. The ffmpeg-next safe API does NOT expose
/// this type — we use raw FFI pointers with manual Drop.
///
/// Reference: wl-screenrec/src/avhw.rs — same pattern
pub struct AvHwDevCtx {
ptr: *mut ffi::AVBufferRef,
fmt: ff::PixelFormat,
}
// SAFETY: AVBufferRef is thread-safe (reference counted internally)
unsafe impl Send for AvHwDevCtx {}
impl AvHwDevCtx {
/// Create VAAPI device context from DRM device path.
///
/// Uses raw FFI: ffi::av_hwdevice_ctx_create()
/// Internally calls vaGetDisplayDRM → vaInitialize
pub fn new_vaapi(drm_device: &std::path::Path) -> Result<Self> {
let device_cstr = std::ffi::CString::new(drm_device.to_str().unwrap())?;
let mut ptr: *mut ffi::AVBufferRef = ptr::null_mut();
let ret = unsafe {
ffi::av_hwdevice_ctx_create(
&mut ptr,
ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_VAAPI,
device_cstr.as_ptr(),
ptr::null_mut(),
0,
)
};
if ret < 0 {
anyhow::bail!("Failed to create VAAPI device context: error code {}", ret);
}
Ok(Self {
ptr,
fmt: ff::PixelFormat::VAAPI,
})
}
/// Create Vulkan device context from DRM device path.
///
/// Two-step process via raw FFI:
/// 1. ffi::av_hwdevice_ctx_create(DRM, path) → drm_ctx
/// 2. ffi::av_hwdevice_ctx_create_derived(VULKAN, drm_ctx) → vulkan_ctx
/// 3. Release drm_ctx (derived context shares the GPU device)
///
/// For self-referential FFmpeg internal structs, Vulkan requires
/// Pin<Box<>> for stable addresses (see wl-screenrec reference).
pub fn new_vulkan(drm_device: &std::path::Path) -> Result<Self> {
let device_cstr = std::ffi::CString::new(drm_device.to_str().unwrap())?;
// Step 1: Create DRM device context
let mut drm_ptr: *mut ffi::AVBufferRef = ptr::null_mut();
let ret = unsafe {
ffi::av_hwdevice_ctx_create(
&mut drm_ptr,
ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_DRM,
device_cstr.as_ptr(),
ptr::null_mut(),
0,
)
};
if ret < 0 {
anyhow::bail!("Failed to create DRM device context: error code {}", ret);
}
// Step 2: Derive Vulkan context from DRM
let mut vk_ptr: *mut ffi::AVBufferRef = ptr::null_mut();
let ret = unsafe {
ffi::av_hwdevice_ctx_create_derived(
&mut vk_ptr,
ffi::AVHWDeviceType::AV_HWDEVICE_TYPE_VULKAN,
drm_ptr,
0,
)
};
// Release DRM context regardless of derive result
unsafe { ffi::av_buffer_unref(&mut drm_ptr) };
if ret < 0 {
anyhow::bail!("Failed to derive Vulkan device context: error code {}", ret);
}
Ok(Self {
ptr: vk_ptr,
fmt: ff::PixelFormat::VULKAN,
})
}
/// Get raw pointer for use in FFI calls (encoder, filter setup)
pub fn as_ptr(&self) -> *mut ffi::AVBufferRef {
self.ptr
}
/// Create a new reference (increment refcount)
pub fn ref_clone(&self) -> *mut ffi::AVBufferRef {
unsafe { ffi::av_buffer_ref(self.ptr) }
}
}
impl Drop for AvHwDevCtx {
fn drop(&mut self) {
if !self.ptr.is_null() {
unsafe { ffi::av_buffer_unref(&mut self.ptr) };
}
}
}
/// Holds FFmpeg hardware frame context (used for both capture and encoding).
///
/// Wraps a raw *mut AVBufferRef pointing to AVHWFramesContext.
/// Must configure via raw pointer cast before calling av_hwframe_ctx_init().
///
/// Usage pattern:
/// 1. ffi::av_hwframe_ctx_alloc(device_ctx.as_ptr()) → frames_ref
/// 2. Cast frames_ref data to *mut AVHWFramesContext, set fields
/// 3. ffi::av_hwframe_ctx_init(frames_ref) → commit
pub struct AvHwFrameCtx {
ptr: *mut ffi::AVBufferRef,
}
unsafe impl Send for AvHwFrameCtx {}
impl AvHwFrameCtx {
/// Create hardware frame context for DMA-BUF capture surfaces.
///
/// Uses raw FFI:
/// 1. ffi::av_hwframe_ctx_alloc(hw_device_ctx.as_ptr()) → frames_ref
/// 2. Dereference frames_ref → AVHWFramesContext, set:
/// - format = AV_PIX_FMT_VAAPI (or VULKAN)
/// - sw_format = AV_PIX_FMT_NV12 (or equivalent)
/// - width, height
/// - initial_pool_size = 4 (double buffer + headroom)
/// 3. ffi::av_hwframe_ctx_init(frames_ref) → commit
pub fn for_capture(
hw_device_ctx: &AvHwDevCtx,
width: u32,
height: u32,
sw_format: ff::PixelFormat,
) -> Result<Self> {
let ptr = unsafe { ffi::av_hwframe_ctx_alloc(hw_device_ctx.as_ptr()) };
if ptr.is_null() {
anyhow::bail!("av_hwframe_ctx_alloc returned null");
}
// Configure the frames context via raw pointer
unsafe {
let frames_ctx = (*ptr).data as *mut ffi::AVHWFramesContext;
(*frames_ctx).format = hw_device_ctx.fmt.into();
(*frames_ctx).sw_format = sw_format.into();
(*frames_ctx).width = width as i32;
(*frames_ctx).height = height as i32;
(*frames_ctx).initial_pool_size = 4;
}
let ret = unsafe { ffi::av_hwframe_ctx_init(ptr) };
if ret < 0 {
unsafe { ffi::av_buffer_unref(&mut ptr as *mut _) };
anyhow::bail!("av_hwframe_ctx_init failed: error code {}", ret);
}
Ok(Self { ptr })
}
/// Create hardware frame context for encoder input surfaces.
/// Same as for_capture but with encoder-compatible format (typically NV12).
pub fn for_encode(
hw_device_ctx: &AvHwDevCtx,
width: u32,
height: u32,
sw_format: ff::PixelFormat,
) -> Result<Self> {
// Same pattern as for_capture, possibly different pool size
Self::for_capture(hw_device_ctx, width, height, sw_format)
}
/// Get raw pointer for assignment to encoder/filter hw_frames_ctx
pub fn as_ptr(&self) -> *mut ffi::AVBufferRef {
self.ptr
}
}
impl Drop for AvHwFrameCtx {
fn drop(&mut self) {
if !self.ptr.is_null() {
unsafe { ffi::av_buffer_unref(&mut self.ptr) };
}
}
}
/// Encapsulates the full encoding state
pub struct EncState {
pub enc_video: ff::codec::encoder::Video,
pub frames_rgb: crate::avhw::AvHwFrameCtx, // capture format
pub frames_yuv: crate::avhw::AvHwFrameCtx, // encode format
pub video_filter: ff::filter::Graph,
pub hw_device_ctx: crate::avhw::AvHwDevCtx,
pub starting_timestamp: Option<i64>,
}
impl EncState {
/// Create the full encoding pipeline.
///
/// Implementation steps:
/// 1. Create AvHwDevCtx (VAAPI or Vulkan based on --hw-accel flag)
/// 2. Create capture frame context (AvHwFrameCtx::for_capture)
/// 3. Create encode frame context (AvHwFrameCtx::for_encode)
/// 4. Create H.264 encoder:
/// - Find codec: `ff::encoder::find_by_name("h264_vaapi")` or `"h264_vulkan"`
/// - Create encoder from codec with hardware device context
/// - Set parameters: width, height, pixel_format (VAAPI/Vulkan), bit_rate, gop_size, time_base
/// - Open encoder
/// 5. Build video filter graph (see filter.rs Task 8)
/// 6. Wrap everything in EncState
pub fn new(
hw_accel: HwAccel,
drm_device: &std::path::Path,
width: u32,
height: u32,
bitrate: u64,
gop_size: u32,
fps: u32,
) -> Result<Self> {
// Step-by-step construction as described above
}
}
The module must handle:
- VAAPI device creation from DRM render node (
/dev/dri/renderD128) - Vulkan device creation (two-step: DRM → Vulkan derivation, then release DRM)
- Hardware frame context creation with correct parameters (capture vs encode)
- Vulkan self-referential
Pin<Box<>>pattern for stable FFmpeg struct addresses EncodePixelFormatthree-way dispatch for codec selection, filter building, frame context creation- DRM device discovery from Wayland protocol data (the DRM device associated with the captured output)
Implementation notes:
-
Use raw FFI via
ffmpeg_next::ffifor ALL hardware context operations:ffi::av_hwdevice_ctx_create()for VAAPI/DRM device creationffi::av_hwframe_ctx_alloc()+ffi::av_hwframe_ctx_init()for frame pool creationffi::av_buffer_ref()/ffi::av_buffer_unref()for reference counting(*ctx_as_mut_ptr()).hw_device_ctx = ffi::av_buffer_ref(...)to assign HW context
-
Use
ff::codec::encoder::find_by_name("h264_vaapi")to find the VAAPI H.264 encoder -
For the encoder, set
pixel_formatto the hardware format (e.g.ff::PixelFormat::VA_API) andhw_frames_ctxon the encoder'scodec_context -
IMPORTANT: Use
h264_metadatabitstream filter withrepeat_sps=1andrepeat_pps=1to guarantee SPS/PPS in every IDR frame. Note:repeat_headers=1is libx264-only and does NOT work withh264_vaapi. -
DRM device path can be discovered from the Wayland compositor's
zwp_linux_dmabuf_feedback_v1or defaulted to/dev/dri/renderD128 -
Step 2: Add
mod avhw;to src/main.rs -
Step 3: Verify compilation
Run: cargo build
Note: FFmpeg linking errors are expected if FFmpeg dev libraries aren't installed. Ensure libavcodec-dev, libavformat-dev, libavfilter-dev, libavutil-dev, libswscale-dev are installed.
- Step 4: Commit
git add -A && git commit -m "feat: FFmpeg hardware device and frame context management"
Task 8: FFmpeg Filter Graph — filter.rs
Files:
- Create:
src/filter.rs
Builds the GPU video filter pipeline (crop → scale → transpose) using FFmpeg's libavfilter via ffmpeg-next.
API References:
-
ffmpeg-next filter API: https://docs.rs/ffmpeg-next —
filter::Graph,filter::Context -
FFmpeg libavfilter docs: https://ffmpeg.org/ffmpeg-filters.html —
crop,scale_vaapi,scale_vulkan,transpose_vaapi,transpose_vulkan -
Step 1: Create src/filter.rs
Implement video filter graph construction:
use ffmpeg_next as ff;
use crate::avhw::EncodePixelFormat;
/// Build the video filter graph for the capture → encode pipeline.
///
/// Pipeline: buffersrc (HW) → crop → scale → [transpose] → buffersink
///
/// Implementation:
/// 1. Create `ff::filter::Graph::new()`
/// 2. Add `buffer` source filter with `hw_frames_ctx` set to the capture frame context:
/// - `graph.add_filter(&ff::filter::find("buffer").unwrap(), "in")`
/// - Set params: `video_size`, `pixel_format`, `time_base`, `frame_rate`
/// - Critical: set `hw_frames_ctx` on the buffersrc parameters to enable HW-accelerated filtering
/// 3. Add `crop` filter if ROI specified:
/// - `graph.add_filter(&ff::filter::find("crop").unwrap(), "crop")`
/// - Set params: `x`, `y`, `w`, `h`
/// - Use `exact=1` parameter to avoid rounding issues at edges
/// 4. Add `scale` filter (GPU-accelerated):
/// - VAAPI: `ff::filter::find("scale_vaapi")` — scales in GPU memory
/// - Vulkan: `ff::filter::find("scale_vulkan")` — scales in GPU memory
/// - Set `w` and `h` to encoder input dimensions
/// 5. Add `transpose` filter if output is rotated 90/270 degrees:
/// - VAAPI: `ff::filter::find("transpose_vaapi")`
/// - Vulkan: `ff::filter::find("transpose_vulkan")`
/// - Set `dir` parameter based on transform type
/// 6. Add `buffersink` output filter with `hw_frames_ctx` set to encode frame context:
/// - `graph.add_filter(&ff::filter::find("buffersink").unwrap(), "out")`
/// - Set `hw_frames_ctx` on the buffersink to ensure output is in the encoder's format
/// 7. Link all filters in sequence: `src → crop → scale → [transpose] → sink`
/// 8. Call `graph.validate()` to check the graph is valid
pub fn build_video_filter(
enc: &mut ff::codec::encoder::Video,
frames_rgb: &ff::HardwareFrameContext,
frames_yuv: &ff::HardwareFrameContext,
crop_rect: Option<(i32, i32, i32, i32)>,
needs_transpose: bool,
pixel_format: &EncodePixelFormat,
) -> Result<ff::filter::Graph> {
// Build graph as described above
}
Implementation notes:
-
The filter graph operates entirely on GPU memory — no CPU copies involved
-
hw_frames_ctxmust be set on BOTH buffersrc and buffersink for hardware filtering -
Scale filter names are backend-specific (
scale_vaapivsscale_vulkan), dispatch viaEncodePixelFormat -
The
cropfilter withexact=1avoids pixel alignment issues at ROI boundaries -
Transpose is only needed for 90/270 degree rotations (check via
transform.rshelpers) -
After building, call
graph.validate()to ensure correct linkage -
Step 2: Add
mod filter;to src/main.rs -
Step 3: Verify compilation
Run: cargo build
- Step 4: Commit
git add -A && git commit -m "feat: GPU video filter graph for crop/scale/transpose"
Task 9: Capture Backends — cap_*.rs
Files:
- Create:
src/cap_ext_image_copy.rs - Create:
src/cap_wlr_screencopy.rs
Implement two Wayland screen capture backends using their respective protocol specifications. Both implement the CaptureSource trait.
API References:
-
Wayland client crate: https://docs.rs/wayland-client —
Connection,GlobalList,Dispatchtrait -
wlr-screencopy protocol:
wayland-protocols-wlrcrate —zwlr_screencopy_manager_v1,zwlr_screencopy_frame_v1 -
ext-image-copy-capture protocol:
wayland-protocolscrate (withstagingfeature) —ext_image_copy_capture_manager_v1,ext_image_copy_capture_session_v1,ext_image_copy_capture_frame_v1 -
DMA-BUF protocol:
zwp_linux_dmabuf_v1(inwayland-protocolscrate) — buffer creation and format negotiation -
Step 1: Define CaptureSource trait in src/state.rs (create minimal file)
use bytes::Bytes;
use wayland_client::protocol::wl_output::WlOutput;
/// Unified interface for screen capture backends
pub trait CaptureSource: Sized + 'static {
/// Frame type specific to this backend
type Frame;
/// Create a new capture source from Wayland globals and target output.
///
/// Implementation: Use `wayland_client::globals::GlobalList` to bind the protocol global,
/// then create a capture session targeting the specified `WlOutput`.
fn new(
gm: &wayland_client::globals::GlobalList,
output: &WlOutput,
output_info: &crate::OutputInfo,
) -> Result<Self, anyhow::Error>;
/// Allocate a frame for capture. Returns Some(frame) for synchronous
/// backends (ext-image-copy), None for async (wlr-screencopy).
fn alloc_frame(&mut self) -> Option<Self::Frame>;
/// Submit DMA-BUF buffer to compositor for capture.
///
/// Implementation: Create a `wl_buffer` from the DMA-BUF fd using `zwp_linux_dmabuf_v1`,
/// then pass it to the capture protocol's capture request.
fn queue_copy(&self, buffer: wayland_client::protocol::wl_buffer::WlBuffer);
/// Callback when frame is done being used
fn on_done_with_frame(&self, frame: Self::Frame);
}
- Step 2: Create src/cap_ext_image_copy.rs
Implement ext-image-copy-capture backend (~240 lines).
Protocol overview (ext-image-copy-capture-v1):
ext_image_copy_capture_manager_v1— global for creating sessionsext_image_copy_capture_session_v1— session object, created for a specific outputext_image_copy_capture_frame_v1— per-frame capture object- Session-level format negotiation (not per-frame) — format is negotiated once and reused
- Synchronous allocation:
alloc_frame()returnsSome(frame)immediately - Supports real modifier list via
zwp_linux_dmabuf_feedback_v1(not hardcoded LINEAR)
Implementation steps:
- Bind
ext_image_copy_capture_manager_v1global fromGlobalList - Create session for target output:
manager.create_session(output) - Implement
Dispatchtrait forext_image_copy_capture_session_v1:- Handle
formatevent — receive negotiated DMA-BUF format and modifier - Handle
constraintsevent — receive size and modifier constraints
- Handle
- Implement
Dispatchtrait forext_image_copy_capture_frame_v1:- Handle
readyevent — frame data written to DMA-BUF - Handle
failedevent — capture failed
- Handle
alloc_frame()— allocate from hardware frame pool, returnSome(frame)queue_copy()— createwl_bufferviazwp_linux_dmabuf_v1::create_params, pass DMA-BUF fd, submit frame capture
Key types:
-
CapExtImageCopystruct holding session, format state, constraints -
ExtImageCopyFrame— the frame type for this backend -
Step 3: Create src/cap_wlr_screencopy.rs
Implement wlr-screencopy backend (~165 lines).
Protocol overview (zwlr-screencopy-unstable-v1):
zwlr_screencopy_manager_v1— global for creating frame captureszwlr_screencopy_frame_v1— per-frame capture object (created fresh each frame)- Per-frame format negotiation — format is negotiated for each capture
- Asynchronous allocation:
alloc_frame()returnsNone— frame object triggers allocation - Uses hardcoded
DrmModifier::LINEARfor buffer creation
Implementation steps:
- Bind
zwlr_screencopy_manager_v1global fromGlobalList - Implement
Dispatchtrait forzwlr_screencopy_frame_v1:- Handle
bufferevent — receive format, width, height, stride - Handle
readyevent — frame data written to buffer, receive timestamp - Handle
failedevent — capture failed
- Handle
- For each frame:
- Create new
zwlr_screencopy_frame_v1viamanager.capture_output(output) - On
bufferevent: negotiate format, create DMA-BUF buffer - On
readyevent: frame is complete
- Create new
alloc_frame()— returnsNone(async; frame object is the trigger)queue_copy()— submit the DMA-BUF buffer to the frame object
Key types:
-
CapWlrScreencopystruct holding manager, frame state -
WlrScreencopyFrame— the frame type -
Step 4: Add modules to src/main.rs
mod cap_ext_image_copy;
mod cap_wlr_screencopy;
mod state;
- Step 5: Verify compilation
Run: cargo build
Note: Wayland protocol code may require protocol XML files. Check wayland-protocols crate features.
- Step 6: Commit
git add -A && git commit -m "feat: Wayland capture backends (ext-image-copy + wlr-screencopy)"
Task 10: State Machine — state.rs
Files:
- Modify:
src/state.rs(expand from Task 9 minimal version)
Implement the full state machine with EncConstructionStage, InFlightSurface, and the State<S> struct.
Design reference: See architecture spec §4 for state diagrams and transitions.
- Step 1: Expand src/state.rs with full state machine
use crate::avhw::EncState;
use crate::nalu::EncodedFrame;
use crate::CaptureSource;
use async_channel::Sender;
/// Output information collected during probing
pub struct OutputInfo {
pub name: String,
pub transform: crate::transform::Transform,
pub width: i32,
pub height: i32,
pub scale: f64,
}
/// Partial output info during the ProbingOutputs stage
pub struct PartialOutputInfo {
pub name: Option<String>,
pub transform: Option<crate::transform::Transform>,
pub width: Option<i32>,
pub height: Option<i32>,
pub scale: Option<f64>,
pub wl_output: wayland_client::protocol::wl_output::WlOutput,
}
impl PartialOutputInfo {
pub fn is_complete(&self) -> bool {
self.name.is_some() && self.transform.is_some()
&& self.width.is_some() && self.height.is_some()
}
pub fn into_output_info(self) -> Option<OutputInfo> {
Some(OutputInfo {
name: self.name?,
transform: self.transform?,
width: self.width?,
height: self.height?,
scale: self.scale.unwrap_or(1.0),
})
}
}
/// State machine for encoder construction
pub enum EncConstructionStage<S: CaptureSource> {
/// Discovering Wayland outputs
ProbingOutputs {
outputs: Vec<PartialOutputInfo>,
},
/// Outputs known, waiting for format negotiation
EverythingButFmt {
output_info: OutputInfo,
hw_device_ctx: crate::avhw::AvHwDevCtx,
cap: S,
},
/// Fully operational — streaming
Streaming {
output_info: OutputInfo,
enc: EncState,
cap: S,
frame_tx: Sender<EncodedFrame>,
},
/// Output disconnected, waiting for reconnection
OutputWentAway {
output_name: String,
enc: EncState,
frame_tx: Sender<EncodedFrame>,
},
/// Transient state during transitions (mem::replace target)
Intermediate,
}
impl<S: CaptureSource> EncConstructionStage<S> {
/// Extract EncState, consuming self. Used during state transitions
/// that preserve the encoder while discarding capture objects.
pub fn take_enc(self) -> Option<(EncState, Sender<EncodedFrame>)> {
match self {
Self::Streaming { enc, frame_tx, .. } => Some((enc, frame_tx)),
Self::OutputWentAway { enc, frame_tx, .. } => Some((enc, frame_tx)),
_ => None,
}
}
}
/// Frame capture lifecycle — tracks the state of the single in-flight surface.
/// At most one frame is being processed at any time to prevent buffer exhaustion.
pub enum InFlightSurface<S: CaptureSource> {
/// No frame in flight — ready to start next capture
None,
/// Frame allocation requested, waiting for GPU surface
AllocQueued,
/// Surface allocated from hardware frame pool
Allocd(S::Frame),
/// Buffer submitted to compositor, waiting for capture completion
CopyQueued {
surface: ffmpeg_next::frame::Video,
drm_map: ffmpeg_next::ffi::AVDRMFrameDescriptor, // DMA-BUF mapping descriptor
frame: S::Frame,
buffer: wayland_client::protocol::wl_buffer::WlBuffer,
},
}
/// Global state parameterized by capture backend
pub struct State<S: CaptureSource> {
pub enc: EncConstructionStage<S>,
pub in_flight_surface: InFlightSurface<S>,
pub starting_timestamp: Option<i64>,
pub args: crate::args::Args,
pub frame_tx: Sender<EncodedFrame>,
pub errored: bool,
pub output_went_away: bool,
pub format_change: bool,
pub gm: wayland_client::globals::GlobalList,
pub xdg_output_manager: Option<wayland_client::protocol::xdg_output_manager_v1::XdgOutputManagerV1>,
pub fps_limit: Option<crate::fps_limit::FpsLimit<f64>>,
}
impl<S: CaptureSource> State<S> {
/// Create initial state in ProbingOutputs stage
pub fn new(
gm: wayland_client::globals::GlobalList,
args: crate::args::Args,
frame_tx: Sender<EncodedFrame>,
qh: &wayland_client::QueueHandle<State<S>>,
) -> Self {
Self {
enc: EncConstructionStage::ProbingOutputs {
outputs: Vec::new(),
},
in_flight_surface: InFlightSurface::None,
starting_timestamp: None,
args,
frame_tx,
errored: false,
output_went_away: false,
format_change: false,
gm,
xdg_output_manager: None,
fps_limit: None,
}
}
/// Frame captured successfully — run through filter + encode, send to transport.
///
/// Pipeline: captured HW frame → video_filter → encoder → EncodedFrame → channel
pub fn on_copy_complete(&mut self) -> anyhow::Result<()> {
// 1. Extract the captured frame from InFlightSurface
// - assert!(matches!(in_flight_surface, CopyQueued{..}))
// - Take ownership of surface, mark InFlightSurface::None
//
// 2. Push captured HW frame into filter graph (buffersrc)
// - enc.video_filter.get("in").source().add(&surface)
//
// 3. Pull filtered frame from filter graph (buffersink)
// - enc.video_filter.get("out").sink().frame(&mut filtered)
//
// 4. Send to encoder
// - enc.enc_video.send_frame(&filtered_frame)
//
// 5. Receive encoded packet
// - enc.enc_video.receive_packet(&mut packet)
// - Loop until receive_packet returns EAGAIN
//
// 6. Check if keyframe via NAL type inspection (crate::nalu::is_keyframe)
//
// 7. Wrap as EncodedFrame and send through frame_tx
// - IMPORTANT: Use try_send(), NOT send() or send_blocking()!
// try_send() is non-blocking: returns Err(TrySendError::Full(_)) if channel full.
// On full channel, log and drop the frame (acceptable for real-time streaming).
// Do NOT use send_blocking() — it would stall the mio capture pipeline.
// - frame_tx.try_send(EncodedFrame {
// data, pts_us, duration: 1_000_000/fps μs, frame_type, width, height
// })
//
// 8. Return InFlightSurface to None (ready for next frame)
Ok(())
}
/// Frame capture failed — handle errors.
pub fn on_copy_fail(&mut self) {
// 1. Check output_went_away flag
// - If true: transition to OutputWentAway (preserve encoder + transport, drop capture)
// 2. Check format_change flag
// - If true: rebuild filter + frame contexts (preserve device ctx + transport)
// 3. Otherwise: log error, set errored flag
}
/// Frame allocated — create DMA-BUF buffer and queue capture.
pub fn on_frame_allocd(&mut self, frame: S::Frame) {
// 1. Create wl_buffer from frame's DMA-BUF fd using zwp_linux_dmabuf_v1
// 2. Queue capture: cap.queue_copy(buffer)
// 3. Transition InFlightSurface: Allocd → CopyQueued
}
/// Initiate frame capture cycle — call when in Streaming state and no frame in flight.
pub fn queue_alloc_frame(&mut self) {
// 1. assert!(matches!(in_flight_surface, None))
// 2. Call cap.alloc_frame()
// - For ext-image-copy: returns Some(frame) → immediately call on_frame_allocd
// - For wlr-screencopy: returns None → set InFlightSurface::AllocQueued
// 3. Transition InFlightSurface: None → AllocQueued or Allocd
}
/// Negotiate DMA-BUF format with compositor.
/// Called during EverythingButFmt → Streaming transition.
pub fn negotiate_format(&mut self) -> anyhow::Result<()> {
// 1. Determine capture format from compositor feedback
// 2. Create hardware frame contexts (capture + encode)
// 3. Build video filter graph
// 4. Create encoder with negotiated parameters
// 5. Transition EverythingButFmt → Streaming
Ok(())
}
}
- Step 2: Verify compilation
Run: cargo build
- Step 3: Commit
git add -A && git commit -m "feat: state machine with EncConstructionStage and InFlightSurface"
Task 11: Main Event Loop — main.rs
Files:
- Modify:
src/main.rs(replace stub with full implementation)
Wire everything together: CLI → Wayland connection → state machine → mio event loop → tokio runtime.
- Step 1: Replace src/main.rs with full implementation
mod args;
mod avhw;
mod cap_ext_image_copy;
mod cap_wlr_screencopy;
mod filter;
mod fps_limit;
mod nalu;
mod signaling;
mod state;
mod transform;
mod transport;
use args::Args;
use cap_ext_image_copy::CapExtImageCopy;
use cap_wlr_screencopy::CapWlrScreencopy;
use clap::Parser;
use async_channel;
use mio::{Events, Interest, Poll, Token};
use state::{State, EncConstructionStage};
use std::os::unix::io::AsRawFd;
const TOKEN_WAYLAND: Token = Token(0);
const TOKEN_QUIT: Token = Token(1);
fn main() -> anyhow::Result<()> {
let args = Args::parse();
// Initialize logging
if args.verbose {
tracing_subscriber::fmt().with_max_level(tracing::Level::DEBUG).init();
} else {
tracing_subscriber::fmt().with_max_level(tracing::Level::INFO).init();
}
// Create async_channel for encoded frames (capture pipeline → transport)
// async_channel provides both sync try_send() and async recv()
let (frame_tx, frame_rx) = async_channel::bounded::<nalu::EncodedFrame>(16);
// Spawn tokio runtime for transport + web UI
let tokio_rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()?;
let bind_addr = format!("{}:{}", args.bind, args.port).parse::<std::net::SocketAddr>()?;
let http_addr = format!("{}:{}", args.bind, args.port + 1).parse::<std::net::SocketAddr>()?;
// Start transport server on tokio
tokio_rt.spawn(async move {
if let Err(e) = async {
let server = transport::TransportServer::new(bind_addr, frame_rx)?;
server.run().await
}
.await
{
tracing::error!("Transport server error: {}", e);
}
});
// Start HTTP server for Web UI on tokio
tokio_rt.spawn(async move {
if let Err(e) = signaling::serve(http_addr).await {
tracing::error!("HTTP server error: {}", e);
}
});
// Connect to Wayland compositor
// wayland-client 0.31 API: Connection (not deprecated Display)
let conn = wayland_client::Connection::connect_to_env()?;
// Use registry_queue_init for initial global discovery with () state.
// This returns (GlobalList, EventQueue<()>).
let (gm, mut probe_queue) = wayland_client::globals::registry_queue_init::<()>(&conn)?;
let _probe_qh = probe_queue.handle();
// Determine which capture backends are available
// Priority: ext-image-copy-capture > wlr-screencopy
let has_ext: bool = gm.instances().any(|(iface, _)| {
iface == "ext_image_copy_capture_manager_v1"
});
let has_wlr: bool = gm.instances().any(|(iface, _)| {
iface == "zwlr_screencopy_manager_v1"
});
if !has_ext && !has_wlr {
anyhow::bail!(
"No supported screen capture protocol found. \
Need ext-image-copy-capture-v1 or zwlr-screencopy-unstable-v1."
);
}
tracing::info!(
"Capture backends: ext-image-copy={}, wlr-screencopy={}",
has_ext, has_wlr
);
// Set up mio poll for Wayland fd + signals
let mut poll = Poll::new()?;
let mut events = Events::with_capacity(1024);
// Register Wayland fd with mio
let wl_fd = conn.fd();
let mut wl_event_source = mio::unix::SourceFd(&wl_fd);
poll.registry().register(
&mut wl_event_source,
TOKEN_WAYLAND,
Interest::READABLE,
)?;
// Set up signal handling via signal-hook + mio
let mut signals = signal_hook::iterator::Signals::new([
signal_hook::consts::SIGINT,
signal_hook::consts::SIGTERM,
])?;
let signal_fd = signals.as_raw_fd();
let mut signal_source = mio::unix::SourceFd(&signal_fd);
poll.registry().register(
&mut signal_source,
TOKEN_QUIT,
Interest::READABLE,
)?;
// Monomorphized backend selection: run the event loop with the chosen backend.
// Each branch creates a typed EventQueue<State<S>> and runs the same event loop.
// This avoids the CaptureBackend enum dispatch problem — the Wayland Dispatch trait
// requires compile-time monomorphization (State<CapExtImageCopy> vs State<CapWlrScreencopy>).
if has_ext {
run_event_loop::<CapExtImageCopy>(conn, gm, args, frame_tx, &mut poll, &mut events)?;
} else {
run_event_loop::<CapWlrScreencopy>(conn, gm, args, frame_tx, &mut poll, &mut events)?;
}
// Cleanup
tokio_rt.shutdown_background();
tracing::info!("wl-webrtc stopped");
Ok(())
}
/// Generic event loop, monomorphized per capture backend.
/// Both branches compile the same event loop code with the concrete State<S> type.
fn run_event_loop<S: state::CaptureSource>(
conn: wayland_client::Connection,
gm: wayland_client::globals::GlobalList,
args: Args,
frame_tx: async_channel::Sender<nalu::EncodedFrame>,
poll: &mut Poll,
events: &mut Events,
) -> anyhow::Result<()> {
// Create typed event queue for the actual event loop.
// wayland-client 0.31: Connection::new_event_queue::<T>() creates EventQueue<T>
let mut event_queue = conn.new_event_queue::<State<S>>();
let qh = event_queue.handle();
// Create initial state in ProbingOutputs stage
let mut state = State::<S>::new(gm, args, frame_tx, &qh);
tracing::info!("wl-webrtc event loop starting (backend: {})", std::any::type_name::<S>());
// Main event loop
let mut running = true;
'main: loop {
if !running { break 'main; }
// Poll with timeout to prevent blocking forever
poll.poll(events, Some(std::time::Duration::from_millis(100)))?;
for event in events.iter() {
match event.token() {
TOKEN_QUIT => {
tracing::info!("Shutdown signal received");
running = false;
break 'main;
}
TOKEN_WAYLAND => {
// Dispatch Wayland events — this drives:
// output probing, format negotiation, frame capture callbacks
event_queue.dispatch_pending(&mut state, |_, _, _| {})?;
}
_ => {}
}
}
// After dispatch, initiate next frame capture if ready
if matches!(state.enc, EncConstructionStage::Streaming { .. })
&& matches!(state.in_flight_surface, state::InFlightSurface::None)
{
state.queue_alloc_frame();
}
// Check for fatal errors
if state.errored {
tracing::error!("Fatal error in capture pipeline, exiting");
break;
}
// Flush any pending Wayland requests
event_queue.flush()?;
}
Ok(())
}
Implementation notes:
-
Wayland-client 0.31 API: Uses
Connection::connect_to_env()(not deprecatedDisplay),registry_queue_init()for global discovery, andconn.new_event_queue::<T>()for typed event queues. Seewl-screenrec/src/main.rsfor the reference pattern. -
Monomorphized backend selection (NOT enum dispatch): The
run_event_loop::<S>()function is generic over the capture backend. At startup, we call it with eitherCapExtImageCopyorCapWlrScreencopybased on available Wayland globals. This is the same pattern wl-screenrec uses — the WaylandDispatchtrait requires compile-time monomorphization. TheCaptureBackendenum dispatch pattern from earlier drafts was incorrect becauseDispatchimplementations are per-State<S>type, not per-enum-variant. -
Signal handling uses
signal-hookcrate with mio integration. SIGINT/SIGTERM trigger clean shutdown. -
The event loop is straightforward:
mio::pollblocks until Wayland fd is readable or signal arrivesevent_queue.dispatch_pendingprocesses all pending Wayland events (frame captures, output changes, etc.)- After dispatch, check if a new frame capture should be initiated
- 100ms timeout prevents indefinite blocking if no events arrive
-
Dependency:
signal-hookalready in Cargo.toml from Task 1.
- Step 2: Verify full project compiles
Run: cargo build
- Step 3: Commit
git add -A && git commit -m "feat: main event loop with Wayland capture and tokio transport"
Task 12: Integration Testing
Files:
-
Create:
tests/integration_test.rs -
Step 1: Create integration test scaffolding
/// Integration test: verify the full pipeline can start.
/// This test requires:
/// - A running Wayland compositor (or CI with weston headless)
/// - FFmpeg with VAAPI support (or software fallback)
/// - A display to capture
///
/// The test is tagged #[ignore] for CI environments without hardware.
#[cfg(test)]
mod tests {
use std::process::Command;
#[test]
#[ignore]
fn test_binary_starts() {
// Verify the binary starts and doesn't crash immediately
let output = Command::new("cargo")
.args(["run", "--", "--help"])
.output()
.expect("Failed to run wl-webrtc");
assert!(output.status.success());
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("wl-webrtc"));
assert!(stdout.contains("port"));
assert!(stdout.contains("codec"));
}
#[test]
fn test_transport_server_binds() {
// Test that QUIC server can bind to a port
// (without Wayland, just the transport layer)
let (tx, rx) = async_channel::bounded(16);
// This should succeed in creating a QUIC endpoint
// (tested in a separate thread since it needs tokio)
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(async {
// Can't fully test without Wayland, but verify types compile
let _ = tx;
});
}
}
- Step 2: Run unit tests
Run: cargo test
Expected: fps_limit tests pass, nalu tests pass, integration tests ignored
- Step 3: Verify full build in release mode
Run: cargo build --release
Expected: Compiles successfully
- Step 4: Final commit
git add -A && git commit -m "feat: integration test scaffolding and release build verification"
Self-Review Checklist
- Spec coverage: Every module in spec Section 2.3 has a corresponding task
- Placeholder scan: No TBD/TODO in step descriptions — all steps have concrete code or explicit implementation guidance referencing official protocol specs and public API documentation
- Known gaps:
- Task 7 (avhw.rs) and Task 8 (filter.rs) require deep FFmpeg API knowledge. Code shown is structural, not complete implementation. The actual implementation requires working with raw FFI bindings via
ffmpeg_next::ffifor hardware context management (AVBufferRef, AVHWFramesContext). Seewl-screenrec/src/avhw.rsfor reference pattern. - Task 11 (main.rs) uses monomorphized
run_event_loop::<S>()generic function — one branch per capture backend. This avoids the CaptureBackend enum dispatch problem (WaylandDispatchtrait requires compile-time monomorphization). wtransportcrate handles HTTP/3 + WebTransport protocol internally. Browsernew WebTransport()works directly — no manual h3 integration needed.
- Task 7 (avhw.rs) and Task 8 (filter.rs) require deep FFmpeg API knowledge. Code shown is structural, not complete implementation. The actual implementation requires working with raw FFI bindings via
Implementation complexity notes:
ffmpeg-nextcrate (v8.1.0) does NOT provide safe wrappers forAVBufferRef(hardware device context) orAVHWFramesContext(hardware frame context). These operations require using the raw FFI bindings viaffmpeg_next::ffi. Specifically:ffi::av_hwdevice_ctx_create(),ffi::av_hwframe_ctx_alloc(),ffi::av_hwframe_ctx_init(),ffi::av_hwframe_transfer_data()— transfers between DMA-BUF and VAAPI surfaces.(*ctx.as_mut_ptr()).hw_device_ctx = ffi::av_buffer_ref(...)to assign HW context to encoder/filter. The safe API covers: codec lookup, encoder configuration, filter graph, and encode/decode flow.- Encoder MUST use
h264_metadataBSF withrepeat_sps=1andrepeat_pps=1to guarantee SPS/PPS in every IDR frame. Note:repeat_headers=1is libx264-only and does NOT work withh264_vaapi. Required for WebCodecs Annex B mode on the browser side. - Task 11 (main.rs) uses monomorphized
run_event_loop::<S>()— NOT enum dispatch. Wayland'sDispatchtrait requires compile-time monomorphization. See wl-screenrec for the reference pattern. async_channel::bounded(16)provides both synctry_send()(from mio main thread) and asyncrecv()(from tokio transport). Main thread usestry_send()— if channel is full, the frame is dropped and logged. This prevents GPU pipeline stalls. Do NOT usesend_blocking()— it would stall the mio capture pipeline.wtransportcrate (v0.7) provides complete WebTransport-over-HTTP/3 server. No raw quinn or h3 integration needed. Browsernew WebTransport(url)works directly.- WebCodecs Annex B mode: Browser configures
VideoDecoderWITHOUTdescriptionfield. Per W3C AVC WebCodecs Registration, omittingdescriptionactivates Annex B mode — allEncodedVideoChunkdata must use start-code-prefixed NAL units (which our encoder outputs). Thedescriptionfield forces AVC (length-prefixed) mode — do NOT provide it.
Implementation Gap Notes
Warning
: The following tasks require significant implementation work not fully specified in the code blocks above. An implementing agent MUST consult the referenced source code for complete patterns.
Task 7 (avhw.rs) — FFmpeg FFI Completeness
The EncState::new() method is described structurally but not fully implemented. The implementing agent MUST:
- Provide complete
EncState::new()with all FFI calls verified againstwl-screenrec/src/avhw.rs - Implement the
AvHwDevCtx→ encoderhw_frames_ctxassignment via raw FFI - Implement the Vulkan
Pin<Box<>>self-referential pattern - Add
h264_metadataBSF to the encoding pipeline for SPS/PPS injection - Verify VAAPI output at runtime by test-encoding one frame and inspecting NAL units
Task 9 (cap_*.rs) — Wayland Dispatch Implementations
The capture backend descriptions omit the Dispatch trait implementations (~400 lines total). The implementing agent MUST:
- Provide complete
Dispatch<M, UserData>implementations for all Wayland protocol objects - For ext-image-copy: Dispatch for
ExtImageCopyCaptureSessionV1,ExtImageCopyCaptureFrameV1,ZwpLinuxDmabufFeedbackV1 - For wlr-screencopy: Dispatch for
ZwlrScreencopyFrameV1 - Dispatch for
WlOutputandXdgOutputManagerV1/XdgOutputV1for output probing - Reference
wl-screenrecfor the complete pattern
Task Order Note
The plan implements Task 5-6 (transport/signaling) before Task 7-8 (FFmpeg). This is correct because transport.rs depends only on nalu.rs, NOT on avhw.rs. The channel-based architecture decouples the two pipelines — transport and encoding can be developed in parallel.