# Wayland → WebRTC Remote Desktop Backend ## Technical Design Document ## Table of Contents 1. [System Architecture](#system-architecture) 2. [Technology Stack](#technology-stack) 3. [Key Components Design](#key-components-design) 4. [Data Flow Optimization](#data-flow-optimization) 5. [Low Latency Optimization](#low-latency-optimization) 6. [Implementation Roadmap](#implementation-roadmap) 7. [Potential Challenges & Solutions](#potential-challenges--solutions) --- ## System Architecture ### High-Level Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ Client Browser │ │ (WebRTC Receiver) │ └─────────────────────────────┬───────────────────────────────────────┘ │ WebRTC (UDP/TCP) │ Signaling (WebSocket/HTTP) ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Signaling Server │ │ (WebSocket/WebSocket Secure) │ │ - Session Management │ │ - SDP Exchange │ │ - ICE Candidates │ └─────────────────────────────┬───────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Rust Backend Server │ ├─────────────────────────────────────────────────────────────────────┤ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Capture │ │ Encoder │ │ WebRTC │ │ │ │ Manager │───▶│ Pipeline │───▶│ Transport │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ PipeWire │ │ Video │ │ Data │ │ │ │ Portal │ │ Encoder │ │ Channels │ │ │ │ (xdg- │ │ (H.264/ │ │ (Input/ │ │ │ │ desktop- │ │ H.265/VP9) │ │ Control) │ │ │ │ portal) │ └──────────────┘ └──────────────┘ │ │ └──────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Zero-Copy Buffer Manager │ │ │ │ - DMA-BUF Import/Export │ │ │ │ - Shared Memory Pools │ │ │ │ - Memory Ownership Tracking │ │ │ └─────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Wayland Compositor │ │ (PipeWire Screen Sharing) │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Component Breakdown #### 1. Capture Manager **Responsibilities:** - Interface with PipeWire xdg-desktop-portal - Request screen capture permissions - Receive DMA-BUF frames - Manage frame buffer lifecycle **Key Technologies:** - `pipewire` crate for PipeWire protocol - `wayland-client` for Wayland protocol - `ashpd` for desktop portals ```rust pub struct CaptureManager { pipewire_connection: Rc, stream_handle: Option, frame_sender: async_channel::Sender, config: CaptureConfig, } pub struct CaptureConfig { pub frame_rate: u32, pub quality: QualityLevel, pub screen_region: Option, } pub enum QualityLevel { Low, Medium, High, Ultra, } pub struct CapturedFrame { pub dma_buf: DmaBufHandle, pub width: u32, pub height: u32, pub format: PixelFormat, pub timestamp: u64, } ``` #### 2. Encoder Pipeline **Responsibilities:** - Receive raw frames from capture - Encode to H.264/H.265/VP9 - Hardware acceleration (VA-API, NVENC, VideoToolbox) - Bitrate adaptation **Zero-Copy Strategy:** - Direct DMA-BUF to encoder (no CPU copies) - Encoder outputs to memory-mapped buffers - WebRTC consumes encoded buffers directly ```rust pub struct EncoderPipeline { encoder: Box, config: EncoderConfig, stats: EncoderStats, } pub trait VideoEncoder: Send + Sync { fn encode_frame( &mut self, frame: CapturedFrame, ) -> Result; fn set_bitrate(&mut self, bitrate: u32) -> Result<(), EncoderError>; fn request_keyframe(&mut self) -> Result<(), EncoderError>; } pub struct EncodedFrame { pub data: Bytes, // Zero-copy Bytes wrapper pub is_keyframe: bool, pub timestamp: u64, pub sequence_number: u64, } ``` #### 3. WebRTC Transport **Responsibilities:** - WebRTC peer connection management - Media track (video) and data channels - RTP packetization - ICE/STUN/TURN handling - Congestion control **Libraries:** - `webrtc` crate (webrtc-rs) or custom WebRTC implementation ```rust pub struct WebRtcTransport { peer_connection: RTCPeerConnection, video_track: RTCVideoTrack, data_channel: Option, config: WebRtcConfig, } pub struct WebRtcConfig { pub stun_servers: Vec, pub turn_servers: Vec, pub ice_transport_policy: IceTransportPolicy, } pub struct TurnServer { pub urls: Vec, pub username: String, pub credential: String, } ``` #### 4. Zero-Copy Buffer Manager **Responsibilities:** - Manage DMA-BUF lifecycle - Pool pre-allocated memory - Track ownership via Rust types - Coordinate with PipeWire memory pools ```rust pub struct BufferManager { dma_buf_pool: Pool, encoded_buffer_pool: Pool, max_buffers: usize, } impl BufferManager { pub fn acquire_dma_buf(&self) -> Option { self.dma_buf_pool.acquire() } pub fn release_dma_buf(&self, handle: DmaBufHandle) { self.dma_buf_pool.release(handle) } pub fn acquire_encoded_buffer(&self, size: usize) -> Option { self.encoded_buffer_pool.acquire_with_size(size) } } ``` ### Data Flow ``` Wayland Compositor │ │ DMA-BUF (GPU memory) ▼ PipeWire Portal │ │ DMA-BUF file descriptor ▼ Capture Manager │ │ CapturedFrame { dma_buf, ... } │ (Zero-copy ownership transfer) ▼ Buffer Manager │ │ DmaBufHandle (moved, not copied) ▼ Encoder Pipeline │ │ EncodedFrame { data: Bytes, ... } │ (Zero-copy Bytes wrapper) ▼ WebRTC Transport │ │ RTP Packets (reference to Bytes) ▼ Network (UDP/TCP) │ ▼ Client Browser ``` --- ## Technology Stack ### Core Dependencies ```toml [dependencies] # Async Runtime tokio = { version = "1.35", features = ["full", "rt-multi-thread"] } async-trait = "0.1" # Wayland & PipeWire wayland-client = "0.31" wayland-protocols = "0.31" pipewire = "0.8" ashpd = "0.8" # Video Encoding (Low Latency) openh264 = { version = "0.6", optional = true } x264 = { version = "0.4", optional = true } nvenc = { version = "0.1", optional = true } vpx = { version = "0.1", optional = true } # Hardware Acceleration (Low Latency) libva = { version = "0.14", optional = true } # VA-API nvidia-encode = { version = "0.5", optional = true } # NVENC # WebRTC (Low Latency Configuration) webrtc = "0.11" # webrtc-rs # Memory & Zero-Copy bytes = "1.5" memmap2 = "0.9" shared_memory = "0.12" # Lock-free data structures for minimal contention crossbeam = { version = "0.8", features = ["std"] } crossbeam-channel = "0.5" crossbeam-queue = "0.3" parking_lot = "0.12" # Faster mutexes # Serialization serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" # Logging & Tracing tracing = "0.1" tracing-subscriber = "0.3" tracing-opentelemetry = "0.22" # For latency monitoring # Metrics & Monitoring prometheus = { version = "0.13", optional = true } metrics = "0.21" # Error Handling anyhow = "1.0" thiserror = "1.0" # Utilities regex = "1.10" uuid = { version = "1.6", features = ["v4", "serde", "fast-rng"] } instant = "0.1" # High-precision timing [features] default = ["software-encoder", "webrtc-rs"] # Encoder Options software-encoder = ["x264", "openh264"] hardware-vaapi = ["libva"] hardware-nvenc = ["nvidia-encode"] all-encoders = ["software-encoder", "hardware-vaapi", "hardware-nvenc"] # WebRTC Implementation webrtc-rs = ["webrtc"] custom-webrtc = [] # Low Latency Features low-latency = [] ultra-low-latency = ["low-latency", "all-encoders"] # Monitoring monitoring = ["prometheus", "tracing-opentelemetry"] # Development dev = ["monitoring", "all-encoders"] ``` ### Encoder Options | Encoder | Hardware | Performance | Quality | License | Use Case | |---------|----------|-------------|---------|---------|----------| | H.264 (x264) | CPU | Medium | High | GPL | Fallback | | H.264 (VA-API) | GPU | High | Medium | Open Source | Linux Intel/AMD | | H.264 (NVENC) | GPU (NVIDIA) | Very High | High | Proprietary | NVIDIA GPUs | | H.265 (HEVC) | GPU | High | Very High | Mixed | Bandwidth-constrained | | VP9 | CPU/GPU | Medium | High | BSD | Open Web | | AV1 | GPU | Medium | Very High | Open Source | Future-proof | **Recommended Primary:** VA-API H.264 (Linux), NVENC H.264 (NVIDIA) **Recommended Fallback:** x264 H.264 (software) ### WebRTC Libraries **Option 1: webrtc-rs** (Recommended) - Pure Rust implementation - Active development - Good WebRTC spec compliance - Zero-copy support for media **Option 2: Custom Implementation** - Use `webrtc` crate as base - Add specialized zero-copy optimizations - Tighter integration with encoder pipeline --- ## Key Components Design ### 1. Wayland Screen Capture Module ```rust // src/capture/mod.rs use pipewire as pw; use pipewire::properties; use pipewire::spa::param::format::Format; use pipewire::stream::StreamFlags; use async_channel::{Sender, Receiver}; pub struct WaylandCapture { core: pw::Core, context: pw::Context, main_loop: pw::MainLoop, stream: pw::stream::Stream, frame_sender: Sender, frame_receiver: Receiver, } impl WaylandCapture { pub async fn new(config: CaptureConfig) -> Result { let main_loop = pw::MainLoop::new()?; let context = pw::Context::new(&main_loop)?; let core = context.connect(None)?; // Request screen capture via xdg-desktop-portal let portal = Portal::new().await?; let session = portal.create_session(ScreenCaptureType::Monitor).await?; let sources = portal.request_sources(&session).await?; let (sender, receiver) = async_channel::bounded(30); Ok(Self { core, context, main_loop, stream: Self::create_stream(&context, &session, sender.clone())?, frame_sender: sender, frame_receiver: receiver, }) } fn create_stream( context: &pw::Context, session: &Session, sender: Sender, ) -> Result { let mut stream = pw::stream::Stream::new( context, "wl-webrtc-capture", properties! { *pw::keys::MEDIA_TYPE => "Video", *pw::keys::MEDIA_CATEGORY => "Capture", *pw::keys::MEDIA_ROLE => "Screen", }, )?; stream.connect( pw::spa::direction::Direction::Input, None, StreamFlags::AUTOCONNECT | StreamFlags::MAP_BUFFERS, )?; // Set up callback for new frames (zero-copy DMA-BUF) let listener = stream.add_local_listener()?; listener .register(pw::stream::events::Events::param_done, |data| { // Handle stream parameter changes }) .register(pw::stream::events::Events::process, |data| { // Process new frame - DMA-BUF is already mapped Self::process_frame(data, sender.clone()); })?; Ok(stream) } fn process_frame( stream: &pw::stream::Stream, sender: Sender, ) { // Get buffer without copying - DMA-BUF is in GPU memory let buffer = stream.dequeue_buffer().expect("no buffer"); let datas = buffer.datas(); let data = &datas[0]; // Create zero-copy frame let frame = CapturedFrame { dma_buf: DmaBufHandle::from_buffer(buffer), width: stream.format().unwrap().size().width, height: stream.format().unwrap().size().height, format: PixelFormat::from_spa_format(&stream.format().unwrap()), timestamp: timestamp_ns(), }; // Send frame (ownership transferred via move) let _ = sender.try_send(frame); } pub async fn next_frame(&self) -> CapturedFrame { self.frame_receiver.recv().await.unwrap() } } // Zero-copy DMA-BUF handle pub struct DmaBufHandle { fd: RawFd, size: usize, stride: u32, offset: u32, } impl DmaBufHandle { pub fn from_buffer(buffer: &pw::buffer::Buffer) -> Self { let data = &buffer.datas()[0]; Self { fd: data.fd().unwrap(), size: data.chunk().size() as usize, stride: data.chunk().stride(), offset: data.chunk().offset(), } } pub unsafe fn as_ptr(&self) -> *mut u8 { // Memory map the DMA-BUF let ptr = libc::mmap( ptr::null_mut(), self.size, libc::PROT_READ, libc::MAP_SHARED, self.fd, self.offset as i64, ); if ptr == libc::MAP_FAILED { panic!("Failed to mmap DMA-BUF"); } ptr as *mut u8 } } impl Drop for DmaBufHandle { fn drop(&mut self) { // Unmap and close FD when handle is dropped unsafe { libc::munmap(ptr::null_mut(), self.size); libc::close(self.fd); } } } ``` ### 2. Frame Buffer Management (Zero-Copy) ```rust // src/buffer/mod.rs use bytes::Bytes; use std::sync::Arc; use std::collections::VecDeque; pub struct FrameBufferPool { dma_bufs: VecDeque, encoded_buffers: VecDeque, max_dma_bufs: usize, max_encoded: usize, } impl FrameBufferPool { pub fn new(max_dma_bufs: usize, max_encoded: usize) -> Self { Self { dma_bufs: VecDeque::with_capacity(max_dma_bufs), encoded_buffers: VecDeque::with_capacity(max_encoded), max_dma_bufs, max_encoded, } } pub fn acquire_dma_buf(&mut self) -> Option { self.dma_bufs.pop_front() } pub fn release_dma_buf(&mut self, buf: DmaBufHandle) { if self.dma_bufs.len() < self.max_dma_bufs { self.dma_bufs.push_back(buf); } // Else: Drop the buffer, let OS reclaim DMA-BUF } pub fn acquire_encoded_buffer(&mut self, size: usize) -> Bytes { // Try to reuse existing buffer if let Some(mut buf) = self.encoded_buffers.pop_front() { if buf.len() >= size { // Slice to requested size (zero-copy view) return buf.split_to(size); } } // Allocate new buffer if needed Bytes::from(vec![0u8; size]) } pub fn release_encoded_buffer(&mut self, buf: Bytes) { if self.encoded_buffers.len() < self.max_encoded { self.encoded_buffers.push_back(buf); } // Else: Drop the buffer, memory freed } } // Zero-copy frame wrapper pub struct ZeroCopyFrame { pub data: Bytes, // Reference-counted, no copying pub metadata: FrameMetadata, } pub struct FrameMetadata { pub width: u32, pub height: u32, pub format: PixelFormat, pub timestamp: u64, pub is_keyframe: bool, } // Smart pointer for DMA-BUF pub struct DmaBufPtr { ptr: *mut u8, len: usize, _marker: PhantomData<&'static mut [u8]>, } impl DmaBufPtr { pub unsafe fn new(ptr: *mut u8, len: usize) -> Self { Self { ptr, len, _marker: PhantomData, } } pub fn as_slice(&self) -> &[u8] { unsafe { std::slice::from_raw_parts(self.ptr, self.len) } } } unsafe impl Send for DmaBufPtr {} unsafe impl Sync for DmaBufPtr {} impl Drop for DmaBufPtr { fn drop(&mut self) { // Memory will be unmapped by DmaBufHandle's Drop } } ``` ### 3. Video Encoder Integration ```rust // src/encoder/mod.rs use async_trait::async_trait; pub enum EncoderType { H264_VAAPI, H264_NVENC, H264_X264, VP9_VAAPI, } pub struct EncoderConfig { pub encoder_type: EncoderType, pub bitrate: u32, pub keyframe_interval: u32, pub preset: EncodePreset, } pub enum EncodePreset { Ultrafast, Superfast, Veryfast, Faster, Fast, Medium, Slow, Slower, Veryslow, } #[async_trait] pub trait VideoEncoder: Send + Sync { async fn encode(&mut self, frame: CapturedFrame) -> Result; async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError>; async fn request_keyframe(&mut self) -> Result<(), EncoderError>; } pub struct VaapiEncoder { display: va::Display, context: va::Context, config: EncoderConfig, sequence_number: u64, } impl VaapiEncoder { pub fn new(config: EncoderConfig) -> Result { let display = va::Display::open(None)?; let context = va::Context::new(&display)?; Ok(Self { display, context, config, sequence_number: 0, }) } } #[async_trait] impl VideoEncoder for VaapiEncoder { async fn encode(&mut self, frame: CapturedFrame) -> Result { // Zero-copy: Import DMA-BUF directly into VA-API surface let surface = unsafe { self.context.import_dma_buf( frame.dma_buf.fd, frame.width, frame.height, frame.format.as_va_format(), )? }; // Encode frame (hardware accelerated) let encoded_data = self.context.encode_surface(surface)?; // Create zero-copy Bytes wrapper let bytes = Bytes::from(encoded_data); self.sequence_number += 1; Ok(EncodedFrame { data: bytes, is_keyframe: surface.is_keyframe(), timestamp: frame.timestamp, sequence_number: self.sequence_number, }) } async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> { self.config = config; self.context.set_bitrate(config.bitrate)?; self.context.set_preset(config.preset)?; Ok(()) } async fn request_keyframe(&mut self) -> Result<(), EncoderError> { self.context.force_keyframe()?; Ok(()) } } // Fallback software encoder pub struct X264Encoder { encoder: x264::Encoder, config: EncoderConfig, sequence_number: u64, } impl X264Encoder { pub fn new(config: EncoderConfig) -> Result { let params = x264::Params::default(); params.set_width(1920); params.set_height(1080); params.set_fps(60, 1); params.set_bitrate(config.bitrate); params.set_preset(config.preset); params.set_tune("zerolatency"); let encoder = x264::Encoder::open(¶ms)?; Ok(Self { encoder, config, sequence_number: 0, }) } } #[async_trait] impl VideoEncoder for X264Encoder { async fn encode(&mut self, frame: CapturedFrame) -> Result { // Map DMA-BUF to CPU memory (one-time copy) let ptr = unsafe { frame.dma_buf.as_ptr() }; let slice = unsafe { std::slice::from_raw_parts(ptr, frame.dma_buf.size) }; // Convert to YUV if needed let yuv_frame = self.convert_to_yuv(slice, frame.width, frame.height)?; // Encode frame let encoded_data = self.encoder.encode(&yuv_frame)?; self.sequence_number += 1; Ok(EncodedFrame { data: Bytes::from(encoded_data), is_keyframe: self.encoder.is_keyframe(), timestamp: frame.timestamp, sequence_number: self.sequence_number, }) } async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> { self.config = config; // Reopen encoder with new params Ok(()) } async fn request_keyframe(&mut self) -> Result<(), EncoderError> { self.encoder.force_keyframe(); Ok(()) } } ``` ### 4. WebRTC Signaling and Data Transport ```rust // src/webrtc/mod.rs use webrtc::{ api::APIBuilder, ice_transport::ice_server::RTCIceServer, media_track::{track_local::track_local_static_sample::TrackLocalStaticSample, TrackLocal}, peer_connection::{ configuration::RTCConfiguration, peer_connection_state::RTCPeerConnectionState, sdp::session_description::RTCSessionDescription, RTCPeerConnection, }, rtp_transceiver::rtp_codec::RTCRtpCodecCapability, }; pub struct WebRtcServer { api: webrtc::API, peer_connections: Arc>>, signaling_server: SignalingServer, } impl WebRtcServer { pub async fn new(config: WebRtcConfig) -> Result { let mut api = APIBuilder::new().build(); let signaling_server = SignalingServer::new(config.signaling_addr).await?; Ok(Self { api, peer_connections: Arc::new(Mutex::new(HashMap::new())), signaling_server, }) } pub async fn create_peer_connection( &self, session_id: String, video_track: TrackLocalStaticSample, ) -> Result { let config = RTCConfiguration { ice_servers: vec![RTCIceServer { urls: self.signaling_server.stun_servers(), ..Default::default() }], ..Default::default() }; let pc = self.api.new_peer_connection(config).await?; // Add video track let rtp_transceiver = pc .add_track(Arc::new(video_track)) .await?; // Set ICE candidate handler let peer_connections = self.peer_connections.clone(); pc.on_ice_candidate(Box::new(move |candidate| { let peer_connections = peer_connections.clone(); Box::pin(async move { if let Some(candidate) = candidate { // Send candidate to signaling server // ... } }) })) .await; // Store peer connection self.peer_connections .lock() .await .insert(session_id.clone(), PeerConnection::new(pc)); Ok(session_id) } pub async fn send_video_frame( &self, session_id: &str, frame: EncodedFrame, ) -> Result<(), WebRtcError> { let peer_connections = self.peer_connections.lock().await; if let Some(peer) = peer_connections.get(session_id) { peer.video_track.write_sample(&webrtc::media::Sample { data: frame.data.to_vec(), duration: std::time::Duration::from_nanos(frame.timestamp), ..Default::default() }).await?; } Ok(()) } } pub struct PeerConnection { pc: RTCPeerConnection, video_track: Arc, data_channel: Option>, } impl PeerConnection { pub async fn create_offer(&mut self) -> Result { let offer = self.pc.create_offer(None).await?; self.pc.set_local_description(offer.clone()).await?; Ok(offer) } pub async fn set_remote_description( &mut self, desc: RTCSessionDescription, ) -> Result<(), WebRtcError> { self.pc.set_remote_description(desc).await } pub async fn create_answer(&mut self) -> Result { let answer = self.pc.create_answer(None).await?; self.pc.set_local_description(answer.clone()).await?; Ok(answer) } } // Data channel for input/control pub struct DataChannelManager { channels: HashMap>, } impl DataChannelManager { pub async fn send_input(&self, channel_id: &str, input: InputEvent) -> Result<()> { if let Some(channel) = self.channels.get(channel_id) { let data = serde_json::to_vec(&input)?; channel.send(&data).await?; } Ok(()) } pub fn on_input(&mut self, channel_id: String, callback: F) where F: Fn(InputEvent) + Send + Sync + 'static, { if let Some(channel) = self.channels.get(&channel_id) { channel.on_message(Box::new(move |msg| { if let Ok(input) = serde_json::from_slice::(&msg.data) { callback(input); } })).unwrap(); } } } #[derive(Debug, Serialize, Deserialize)] pub enum InputEvent { MouseMove { x: f32, y: f32 }, MouseClick { button: MouseButton }, KeyPress { key: String }, KeyRelease { key: String }, } #[derive(Debug, Serialize, Deserialize)] pub enum MouseButton { Left, Right, Middle, } ``` ### 5. IPC Layer (Optional) ```rust // src/ipc/mod.rs use tokio::net::UnixListener; use tokio::io::{AsyncReadExt, AsyncWriteExt}; pub struct IpcServer { listener: UnixListener, } impl IpcServer { pub async fn new(socket_path: &str) -> Result { // Remove existing socket if present let _ = std::fs::remove_file(socket_path); let listener = UnixListener::bind(socket_path)?; Ok(Self { listener }) } pub async fn run(&self, sender: async_channel::Sender) { loop { match self.listener.accept().await { Ok((mut stream, _)) => { let sender = sender.clone(); tokio::spawn(async move { let mut buf = [0; 1024]; loop { match stream.read(&mut buf).await { Ok(0) => break, Ok(n) => { if let Ok(msg) = serde_json::from_slice::(&buf[..n]) { let _ = sender.send(msg).await; } } Err(_) => break, } } }); } Err(_) => continue, } } } } #[derive(Debug, Serialize, Deserialize)] pub enum IpcMessage { StartCapture { session_id: String }, StopCapture { session_id: String }, SetQuality { level: QualityLevel }, GetStatus, } ``` --- ## Data Flow Optimization ### Zero-Copy Pipeline Stages ``` Stage 1: Capture Input: Wayland Compositor (GPU memory) Output: DMA-BUF file descriptor Copy: None (Zero-copy) Stage 2: Buffer Manager Input: DMA-BUF FD Output: DmaBufHandle (RAII wrapper) Copy: None (Zero-copy ownership transfer) Stage 3: Encoder Input: DmaBufHandle Output: Bytes (reference-counted) Copy: None (DMA-BUF imported directly to GPU encoder) Stage 4: WebRTC Input: Bytes Output: RTP packets (references to Bytes) Copy: None (Zero-copy to socket buffers) Stage 5: Network Input: RTP packets Output: UDP datagrams Copy: Minimal (kernel space only) ``` ### Memory Ownership Transfer ```rust // Example: Ownership transfer through pipeline async fn process_frame_pipeline( mut capture: WaylandCapture, mut encoder: VaapiEncoder, mut webrtc: WebRtcServer, ) -> Result<()> { loop { // Stage 1: Capture (ownership moves from PipeWire to our code) let frame = capture.next_frame().await; // CapturedFrame owns DmaBufHandle // Stage 2: Encode (ownership moved, not copied) let encoded = encoder.encode(frame).await?; // EncodedFrame owns Bytes // Stage 3: Send (Bytes is reference-counted, no copy) webrtc.send_video_frame("session-123", encoded).await?; // Ownership transferred all the way without copying } } ``` ### Buffer Sharing Mechanisms #### 1. DMA-BUF (Primary) - GPU memory buffers - Exported as file descriptors - Zero-copy to hardware encoders - Limited to same GPU/driver ```rust pub fn export_dma_buf(surface: &va::Surface) -> Result { let fd = surface.export_dma_buf()?; Ok(DmaBufHandle { fd, size: surface.size(), stride: surface.stride(), offset: 0, }) } ``` #### 2. Shared Memory (Fallback) - POSIX shared memory (shm_open) - For software encoding path - Copy from DMA-BUF to shared memory ```rust pub fn create_shared_buffer(size: usize) -> Result { let name = format!("/wl-webrtc-{}", uuid::Uuid::new_v4()); let fd = shm_open(&name, O_CREAT | O_RDWR, 0666)?; ftruncate(fd, size as i64)?; let ptr = unsafe { mmap(ptr::null_mut(), size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) }; Ok(SharedBuffer { ptr, size, fd, name, }) } ``` #### 3. Memory-Mapped Files (Alternative) - For persistent caching - Cross-process communication - Used for frame buffering ```rust pub struct MappedFile { file: File, ptr: *mut u8, size: usize, } impl MappedFile { pub fn new(path: &Path, size: usize) -> Result { let file = OpenOptions::new() .read(true) .write(true) .create(true) .open(path)?; file.set_len(size as u64)?; let ptr = unsafe { mmap( ptr::null_mut(), size, PROT_READ | PROT_WRITE, MAP_SHARED, file.as_raw_fd(), 0, ) }?; Ok(Self { file, ptr, size }) } } ``` ### Pipeline Optimization Strategies #### 1. Parallel Encoding ```rust // Run multiple encoders in parallel for different quality levels pub struct AdaptiveEncoder { encoders: Vec>, active_encoder: usize, bandwidth_monitor: BandwidthMonitor, } impl AdaptiveEncoder { pub async fn encode_adaptive(&mut self, frame: CapturedFrame) -> Result { let bandwidth = self.bandwidth_monitor.current_bandwidth(); // Switch encoder based on bandwidth let new_encoder = match bandwidth { b if b < 500_000 => 0, // Low bitrate b if b < 2_000_000 => 1, // Medium bitrate _ => 2, // High bitrate }; if new_encoder != self.active_encoder { self.active_encoder = new_encoder; } self.encoders[self.active_encoder].encode(frame).await } } ``` #### 2. Frame Skipping ```rust pub struct FrameSkipper { target_fps: u32, last_frame_time: Instant, skip_count: u32, } impl FrameSkipper { pub fn should_skip(&mut self) -> bool { let now = Instant::now(); let elapsed = now.duration_since(self.last_frame_time).as_millis(); let frame_interval = 1000 / self.target_fps as u128; if elapsed < frame_interval { self.skip_count += 1; return true; } self.last_frame_time = now; self.skip_count = 0; false } } ``` #### 3. Region of Interest (ROI) ```rust pub struct RegionEncoder { full_encoder: Box, roi_encoder: Box, current_region: Option, } impl RegionEncoder { pub async fn encode_roi( &mut self, frame: CapturedFrame, roi: Option, ) -> Result { if let Some(region) = roi { // Encode only ROI with higher quality let cropped = self.crop_frame(frame, region)?; self.roi_encoder.encode(cropped).await } else { // Encode full frame self.full_encoder.encode(frame).await } } fn crop_frame(&self, mut frame: CapturedFrame, region: ScreenRegion) -> Result { // Adjust DMA-BUF offsets for region frame.width = region.width; frame.height = region.height; Ok(frame) } } ``` --- ## Low Latency Optimization ### Design Philosophy To achieve 15-25ms latency on local networks, we prioritize: 1. **Speed over completeness**: Fast, low-latency delivery is more important than perfect reliability 2. **Minimize buffering**: Small buffers at every stage 3. **Zero-copy everywhere**: Eliminate CPU memory copies 4. **Hardware acceleration**: Use GPU for all intensive operations 5. **Predictive timing**: Reduce wait times with accurate timing ### 1. Encoder Optimization #### Hardware Encoder Configuration ```rust pub struct LowLatencyEncoderConfig { // Codec settings pub codec: VideoCodec, // Low-latency specific pub gop_size: u32, // Small GOP: 8-15 frames pub b_frames: u32, // Zero B-frames for minimal latency pub max_b_frames: u32, // Always 0 for low latency pub lookahead: u32, // Minimal lookahead: 0-2 frames // Rate control pub rc_mode: RateControlMode, // CBR or VBR with strict constraints pub bitrate: u32, // Adaptive bitrate pub max_bitrate: u32, // Tight max constraint pub min_bitrate: u32, pub vbv_buffer_size: u32, // Very small VBV buffer pub vbv_max_rate: u32, // Close to bitrate // Timing pub fps: u32, // Target FPS (30-60) pub intra_period: u32, // Keyframe interval // Quality vs Latency trade-offs pub preset: EncoderPreset, // Ultrafast/Fast pub tune: EncoderTune, // zerolatency pub quality: u8, // Constant quality (CRF) or CQ } pub enum VideoCodec { H264, // Best compatibility, good latency H265, // Better compression, slightly higher latency VP9, // Open alternative } pub enum RateControlMode { CBR, // Constant Bitrate - predictable VBR, // Variable Bitrate - better quality CQP, // Constant Quantizer - lowest latency } pub enum EncoderPreset { Ultrafast, // Lowest latency, lower quality Superfast, Veryfast, // Recommended for 15-25ms Faster, } pub enum EncoderTune { Zerolatency, // Mandatory for low latency Film, Animation, } ``` #### Recommended Encoder Settings **VA-API (Intel/AMD) - For 15-25ms latency:** ```c // libva-specific low-latency settings VAConfigAttrib attribs[] = { {VAConfigAttribRTFormat, VA_RT_FORMAT_YUV420}, {VAConfigAttribRateControl, VA_RC_CBR}, {VAConfigAttribEncMaxRefFrames, {1, 0}}, // Min reference frames {VAConfigAttribEncPackedHeaders, VA_ENC_PACKED_HEADER_SEQUENCE}, }; VAEncSequenceParameterBufferH264 seq_param = { .intra_period = 15, // Short GOP .ip_period = 1, // No B-frames .bits_per_second = 4000000, .max_num_ref_frames = 1, // Minimal references .time_scale = 90000, .num_units_in_tick = 1500, // 60 FPS }; VAEncPictureParameterBufferH264 pic_param = { .reference_frames = { {0, VA_FRAME_PICTURE}, // Single reference }, .num_ref_idx_l0_active_minus1 = 0, .num_ref_idx_l1_active_minus1 = 0, .pic_fields.bits.idr_pic_flag = 0, .pic_fields.bits.reference_pic_flag = 1, }; VAEncSliceParameterBufferH264 slice_param = { .num_ref_idx_l0_active_minus1 = 0, .num_ref_idx_l1_active_minus1 = 0, .disable_deblocking_filter_idc = 1, // Faster }; ``` **NVENC (NVIDIA) - For 15-25ms latency:** ```rust // NVENC low-latency configuration let mut create_params = NV_ENC_INITIALIZE_PARAMS::default(); create_params.encodeGUID = NV_ENC_CODEC_H264_GUID; create_params.presetGUID = NV_ENC_PRESET_P4_GUID; // Low latency let mut config = NV_ENC_CONFIG::default(); config.profileGUID = NV_ENC_H264_PROFILE_BASELINE_GUID; // Faster encoding config.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CBR; config.rcParams.averageBitRate = 4000000; config.rcParams.maxBitRate = 4000000; config.rcParams.vbvBufferSize = 4000000; // 1 second buffer config.rcParams.vbvInitialDelay = 0; // Minimal delay let mut h264_config = unsafe { config.encodeCodecConfig.h264Config }; h264_config.enableIntraRefresh = 1; h264_config.idrPeriod = 30; // Keyframe every 30 frames h264_config.repeatSPSPPS = 1; h264_config.enableConstrainedEncoding = 1; h264_config.frameNumD = 0; h264_config.sliceMode = NV_ENC_SLICE_MODE_AUTOSELECT; // Low-latency specific h264_config.maxNumRefFrames = 1; // Minimal references h264_config.idrPeriod = 15; // Shorter GOP ``` **x264 (Software) - For 50-100ms latency:** ```rust // x264 parameters for low latency let param = x264_param_t { i_width: 1920, i_height: 1080, i_fps_num: 60, i_fps_den: 1, // Rate control i_bitrate: 4000, // 4 Mbps i_keyint_max: 15, // Short GOP b_intra_refresh: 1, // Low latency b_repeat_headers: 1, b_annexb: 1, i_scenecut_threshold: 0, // Disable scene detection // No B-frames for latency i_bframe: 0, i_bframe_adaptive: 0, i_bframe_pyramid: 0, // References i_frame_reference: 1, // Minimal references // Preset: ultrafast or superfast // This is set via preset function }; // Apply preset x264_param_apply_preset(¶m, "superfast"); x264_param_apply_tune(¶m, "zerolatency"); ``` #### Dynamic Bitrate vs Latency Trade-offs ```rust pub struct AdaptiveBitrateController { target_latency_ms: u32, current_bitrate: u32, frame_rate: u32, network_quality: NetworkQuality, buffer_depth_ms: u32, } pub struct NetworkQuality { bandwidth_mbps: f64, latency_ms: u32, packet_loss_rate: f64, jitter_ms: u32, } impl AdaptiveBitrateController { pub fn update_target_bitrate(&mut self, measured_latency_ms: u32) -> u32 { let latency_ratio = measured_latency_ms as f64 / self.target_latency_ms as f64; if latency_ratio > 1.5 { // Latency too high - reduce bitrate aggressively self.current_bitrate = (self.current_bitrate as f64 * 0.7) as u32; } else if latency_ratio > 1.2 { // Moderately high - reduce bitrate self.current_bitrate = (self.current_bitrate as f64 * 0.85) as u32; } else if latency_ratio < 0.8 { // Can increase bitrate self.current_bitrate = (self.current_bitrate as f64 * 1.1) as u32; } // Clamp to reasonable bounds self.current_bitrate = self.current_bitrate.clamp(1000000, 8000000); self.current_bitrate } } ``` ### 2. Capture Optimization #### PipeWire DMA-BUF Zero-Copy ```rust pub struct LowLatencyCaptureConfig { pub frame_rate: u32, // 30-60 FPS pub zero_copy: bool, // Always true pub track_damage: bool, // Enable damage tracking pub partial_updates: bool, // Encode only damaged regions pub buffer_pool_size: usize, // Small pool: 3-5 buffers } pub struct DamageTracker { damaged_regions: VecDeque, last_frame: Option, threshold: u32, // Minimum change size to encode } impl DamageTracker { pub fn update(&mut self, new_frame: &CapturedFrame) -> Vec { match &self.last_frame { Some(last) => { let regions = self.compute_damage_regions(last, new_frame); self.last_frame = Some(new_frame.dma_buf.clone()); regions } None => { self.last_frame = Some(new_frame.dma_buf.clone()); vec![ScreenRegion { x: 0, y: 0, width: new_frame.width, height: new_frame.height, }] } } } fn compute_damage_regions(&self, last: &DmaBufHandle, new: &CapturedFrame) -> Vec { // Compare frames and find changed regions // This can be done efficiently with GPU // For MVP, we can use a simple block-based comparison // Block size for comparison (e.g., 16x16 pixels) let block_size = 16; let blocks_x = (new.width as usize + block_size - 1) / block_size; let blocks_y = (new.height as usize + block_size - 1) / block_size; // Merge adjacent damaged blocks into regions // ... vec![] // Placeholder } } ``` #### Partial Region Encoding ```rust pub struct RegionEncoder { full_encoder: Box, tile_encoder: Box, current_regions: Vec, } impl RegionEncoder { pub async fn encode_with_regions( &mut self, frame: CapturedFrame, regions: Vec, ) -> Result> { let mut encoded_tiles = Vec::new(); if regions.is_empty() || regions.len() > 4 { // Too many regions or no changes - encode full frame let encoded = self.full_encoder.encode(frame).await?; encoded_tiles.push(EncodedTile { region: ScreenRegion { x: 0, y: 0, width: frame.width, height: frame.height, }, data: encoded.data, is_keyframe: encoded.is_keyframe, }); } else { // Encode each damaged region separately for region in regions { let cropped = self.crop_frame(&frame, ®ion)?; let encoded = self.tile_encoder.encode(cropped).await?; encoded_tiles.push(EncodedTile { region, data: encoded.data, is_keyframe: encoded.is_keyframe, }); } } Ok(encoded_tiles) } fn crop_frame(&self, frame: &CapturedFrame, region: &ScreenRegion) -> Result { // Adjust DMA-BUF offsets for the region // This is a zero-copy operation - just metadata changes Ok(CapturedFrame { dma_buf: DmaBufHandle::from_region(&frame.dma_buf, region)?, width: region.width, height: region.height, format: frame.format, timestamp: frame.timestamp, }) } } ``` ### 3. WebRTC Transport Layer Optimization #### Low-Latency WebRTC Configuration ```rust pub struct LowLatencyWebRtcConfig { // ICE and transport pub ice_transport_policy: IceTransportPolicy, pub ice_servers: Vec, // Media settings pub video_codecs: Vec, pub max_bitrate: u32, pub min_bitrate: u32, pub start_bitrate: u32, // Buffering - minimize for low latency pub playout_delay_min_ms: u16, // 0-10ms (default 50ms) pub playout_delay_max_ms: u16, // 10-20ms (default 200ms) // Packetization pub rtp_payload_size: u16, // Smaller packets: 1200 bytes pub packetization_mode: PacketizationMode, // Feedback and retransmission pub nack_enabled: bool, // Limited NACK pub fec_enabled: bool, // Disable FEC for latency pub transport_cc_enabled: bool, // Congestion control // RTCP settings pub rtcp_report_interval_ms: u32, // Frequent: 50-100ms } pub struct VideoCodecConfig { pub name: String, pub clock_rate: u32, pub num_channels: u16, pub parameters: CodecParameters, } impl LowLatencyWebRtcConfig { pub fn for_ultra_low_latency() -> Self { Self { ice_transport_policy: IceTransportPolicy::All, ice_servers: vec![], video_codecs: vec![ VideoCodecConfig { name: "H264".to_string(), clock_rate: 90000, num_channels: 1, parameters: CodecParameters { profile_level_id: "42e01f".to_string(), // Baseline profile packetization_mode: 1, level_asymmetry_allowed: 1, }, }, ], max_bitrate: 8000000, // 8 Mbps max min_bitrate: 500000, // 500 Kbps min start_bitrate: 4000000, // 4 Mbps start // Critical: Minimal playout delay playout_delay_min_ms: 0, // No minimum playout_delay_max_ms: 20, // 20ms maximum // Smaller packets for lower serialization latency rtp_payload_size: 1200, packetization_mode: PacketizationMode::NonInterleaved, // Limited retransmission nack_enabled: true, // But limit retransmission window fec_enabled: false, // Disable FEC - adds latency transport_cc_enabled: true, // More frequent RTCP feedback rtcp_report_interval_ms: 50, } } } ``` #### Packet Loss Handling Strategy ```rust pub enum LossHandlingStrategy { PreferLatency, // Drop late frames, prioritize low latency PreferQuality, // Retransmit, prioritize quality Balanced, // Adaptive based on network conditions } pub struct PacketLossHandler { strategy: LossHandlingStrategy, max_retransmission_delay_ms: u32, nack_window_size: u32, } impl PacketLossHandler { pub fn handle_packet_loss( &mut self, sequence_number: u16, now_ms: u64, ) -> RetransmissionDecision { match self.strategy { LossHandlingStrategy::PreferLatency => { // Don't retransmit if too old if now_ms > self.max_retransmission_delay_ms as u64 { RetransmissionDecision::Drop } else { RetransmissionDecision::None } } LossHandlingStrategy::PreferQuality => { // Always try to retransmit RetransmissionDecision::Request(sequence_number) } LossHandlingStrategy::Balanced => { // Adaptive based on loss rate RetransmissionDecision::None // Placeholder } } } } pub enum RetransmissionDecision { Request(u16), Drop, None, } ``` #### NACK vs FEC Selection **Recommendation for 15-25ms latency:** - **Primary**: Limited NACK - NACK window: 1-2 frames (16-33ms at 60fps) - Max retransmission delay: 20ms - Only retransmit keyframes or critical packets - **Avoid FEC**: - Forward Error Correction adds significant latency - With low-loss LAN, FEC overhead outweighs benefits - Use NACK selectively instead ```rust pub struct NackController { window_size_ms: u32, // 20ms window max_nack_packets_per_second: u32, nack_list: VecDeque<(u16, u64)>, // (seq_num, timestamp_ms) } impl NackController { pub fn should_send_nack(&self, seq_num: u16, now_ms: u64) -> bool { // Check if packet is within NACK window if let Some(&(_, oldest_ts)) = self.nack_list.front() { if now_ms - oldest_ts > self.window_size_ms as u64 { return false; // Too old } } true } } ``` ### 4. Frame Rate and Buffer Strategy #### Dynamic Frame Rate Adjustment ```rust pub struct FrameRateController { target_fps: u32, // Desired FPS (30-60) current_fps: u32, frame_times: VecDeque, last_frame_time: Instant, min_interval: Duration, // 1 / max_fps } impl FrameRateController { pub fn new(target_fps: u32) -> Self { let min_interval = Duration::from_micros(1_000_000 / 60); // Max 60 FPS Self { target_fps, current_fps: 30, frame_times: VecDeque::with_capacity(60), last_frame_time: Instant::now(), min_interval, } } pub fn should_capture(&mut self) -> bool { let now = Instant::now(); let elapsed = now.duration_since(self.last_frame_time); if elapsed < self.min_interval { return false; // Too soon } // Update frame rate based on conditions self.adjust_fps_based_on_conditions(); self.last_frame_time = now; true } pub fn adjust_fps_based_on_conditions(&mut self) { // Check system load, network conditions, etc. let system_load = self.get_system_load(); let network_quality = self.get_network_quality(); if system_load > 0.8 || network_quality.is_poor() { self.current_fps = 30; // Reduce frame rate } else if system_load < 0.5 && network_quality.is_excellent() { self.current_fps = 60; // Increase frame rate } else { self.current_fps = 45; // Balanced } } } ``` #### Fast Frame Dropping Strategy ```rust pub struct FrameDropper { target_fps: u32, adaptive_drop_threshold_ms: u32, consecutive_drops: u32, max_consecutive_drops: u32, } impl FrameDropper { pub fn should_drop(&mut self, queue_latency_ms: u32) -> bool { if queue_latency_ms > self.adaptive_drop_threshold_ms { if self.consecutive_drops < self.max_consecutive_drops { self.consecutive_drops += 1; return true; } } self.consecutive_drops = 0; false } pub fn get_drop_interval(&self) -> u32 { // Calculate how many frames to drop match self.target_fps { 60 => 1, // Drop 1 out of every 2 30 => 1, // Drop 1 out of every 2 _ => 0, } } } ``` #### Minimal Buffering **Sender Side:** ```rust pub struct SenderBuffer { max_size_frames: usize, // Very small: 1-2 frames queue: VecDeque, target_latency_ms: u32, } impl SenderBuffer { pub fn new() -> Self { Self { max_size_frames: 1, // Single frame buffer queue: VecDeque::with_capacity(2), target_latency_ms: 5, // 5ms target } } pub fn push(&mut self, frame: EncodedFrame) -> Result<()> { if self.queue.len() >= self.max_size_frames { // Drop oldest frame to maintain low latency self.queue.pop_front(); } self.queue.push_back(frame); Ok(()) } pub fn pop(&mut self) -> Option { self.queue.pop_front() } } ``` **Receiver Side (Jitter Buffer):** ```rust pub struct MinimalJitterBuffer { target_delay_ms: u32, // 0-10ms min_delay_ms: u32, // 0ms max_delay_ms: u32, // 10-20ms packets: VecDeque, } impl MinimalJitterBuffer { pub fn new() -> Self { Self { target_delay_ms: 5, // 5ms target min_delay_ms: 0, // No minimum max_delay_ms: 10, // 10ms maximum packets: VecDeque::with_capacity(10), } } pub fn push(&mut self, packet: RtpPacket) { if self.packets.len() < self.max_delay_ms as usize / 2 { self.packets.push_back(packet); } else { // Buffer full - drop oldest self.packets.pop_front(); self.packets.push_back(packet); } } pub fn pop(&mut self) -> Option { self.packets.pop_front() } } ``` ### 5. Architecture Adjustments #### Single-Threaded vs Multi-Threaded **Recommendation: Hybrid Approach** - **Capture Thread**: Dedicated thread for PipeWire - **Encoder Thread**: Per-session encoder thread - **Network Thread**: WebRTC transport thread - **Coordination**: Lock-free channels for data passing ```rust pub struct PipelineArchitecture { capture_thread: JoinHandle<()>, encoder_threads: Vec>, network_thread: JoinHandle<()>, // Lock-free communication capture_to_encoder: async_channel::Sender, encoder_to_network: async_channel::Sender, } ``` #### Lock Competition Minimization ```rust // Use lock-free data structures where possible use crossbeam::queue::SegQueue; use crossbeam::channel::{bounded, unbounded}; pub struct LockFreeFrameQueue { queue: SegQueue, max_size: usize, } impl LockFreeFrameQueue { pub fn push(&self, frame: CapturedFrame) -> Result<()> { if self.queue.len() >= self.max_size { return Err(Error::QueueFull); } self.queue.push(frame); Ok(()) } pub fn pop(&self) -> Option { self.queue.pop() } } ``` #### Async Task Scheduling ```rust pub struct LowLatencyScheduler { capture_priority: TaskPriority, encode_priority: TaskPriority, network_priority: TaskPriority, } impl LowLatencyScheduler { pub async fn schedule_pipeline(&self) { tokio::spawn_with_priority(TaskPriority::High, async move { // Critical path: capture -> encode -> send }); tokio::spawn_with_priority(TaskPriority::Medium, async move { // Background tasks: statistics, logging }); } } ``` ### 6. Technology Stack Adjustments #### Encoder Selection for Latency | Encoder | Setup Latency | Per-Frame Latency | Quality | Recommendation | |---------|--------------|------------------|---------|----------------| | VA-API H.264 | 1-2ms | 2-3ms | Medium | Primary (Linux) | | NVENC H.264 | 1-2ms | 1-2ms | High | Primary (NVIDIA) | | x264 (ultrafast) | 0ms | 5-8ms | Low | Fallback | | x264 (superfast) | 0ms | 8-12ms | Medium | Fallback | **Recommendation:** - **Primary**: VA-API or NVENC H.264 with ultrafast preset - **Fallback**: x264 with ultrafast preset (accept 30-50ms latency) #### Direct Wayland vs PipeWire **Use PipeWire (recommended):** - Better DMA-BUF support - Hardware acceleration integration - Zero-copy through ecosystem **Direct Wayland (if needed):** - Lower-level control - Potentially lower capture latency (0.5-1ms) - More complex implementation - No portal integration (security issue) **Recommendation:** Stick with PipeWire for MVP. Consider direct Wayland only if PipeWire latency is unacceptable. #### webrtc-rs Latency Characteristics **Pros:** - Pure Rust, predictable behavior - Good zero-copy support - Customizable buffering **Cons:** - May have default buffer settings optimized for reliability - Need manual configuration for ultra-low latency **Custom WebRTC Layer (advanced):** - Full control over buffering and timing - Can inline packetization - More complex implementation **Recommendation:** Use webrtc-rs with low-latency configuration. Only consider custom layer if webrtc-rs cannot achieve targets. ### 7. Implementation Priority #### P0 (Must-Have for MVP) 1. **Hardware Encoder Integration** - VA-API H.264 with low-latency settings - No B-frames, small GOP (15 frames) - Ultrafast preset 2. **DMA-BUF Zero-Copy** - PipeWire DMA-BUF import - Direct encoder feed - No CPU copies 3. **Minimal Buffering** - Single frame sender buffer - 0-5ms jitter buffer - Fast frame dropping 4. **Low-Latency WebRTC Config** - playout_delay_min: 0ms - playout_delay_max: 20ms - Disable FEC #### P1 (Important for 15-25ms) 1. **Damage Tracking** - Partial region updates - Reduced encoding load 2. **Dynamic Frame Rate** - 30-60 FPS adaptation - Network-aware 3. **NACK Control** - Limited retransmission window (20ms) - Selective NACK #### P2 (Nice-to-Have) 1. **Direct Wayland Capture** - If PipeWire latency insufficient 2. **Custom WebRTC Layer** - If webrtc-rs insufficient 3. **Advanced Congestion Control** - SCReAM or Google Congestion Control ### 8. Testing and Validation #### End-to-End Latency Measurement ```rust pub struct LatencyMeter { timestamps: VecDeque<(u64, LatencyStage)>, } pub enum LatencyStage { Capture, EncodeStart, EncodeEnd, Packetize, NetworkSend, NetworkReceive, Depacketize, DecodeStart, DecodeEnd, Display, } impl LatencyMeter { pub fn mark(&mut self, stage: LatencyStage) { let now = timestamp_ns(); self.timestamps.push_back((now, stage)); } pub fn calculate_total_latency(&self) -> Duration { if self.timestamps.len() < 2 { return Duration::ZERO; } let first = self.timestamps.front().unwrap().0; let last = self.timestamps.back().unwrap().0; Duration::from_nanos(last - first) } } ``` **Measurement Method:** 1. **Timestamp Injection** - Inject frame ID at capture (visible timestamp on screen) - Capture at client with camera - Compare timestamps to calculate round-trip - Divide by 2 for one-way latency 2. **Network Timestamping** - Add frame capture time in RTP header extension - Measure at receiver - Account for clock skew 3. **Hardware Timestamping** - Use kernel packet timestamps (SO_TIMESTAMPING) - Hardware NIC timestamps if available #### Performance Benchmarking ```rust #[bench] fn bench_full_pipeline_latency(b: &mut Bencher) { let mut pipeline = LowLatencyPipeline::new(config).unwrap(); let mut latencies = Vec::new(); b.iter(|| { let start = Instant::now(); let frame = pipeline.capture().unwrap(); let encoded = pipeline.encode(frame).unwrap(); pipeline.send(encoded).unwrap(); latencies.push(start.elapsed()); }); let avg_latency = latencies.iter().sum::() / latencies.len() as u32; println!("Average latency: {:?}", avg_latency); } ``` **Target Benchmarks:** | Metric | Target | Acceptable | |--------|--------|------------| | Capture latency | 2-3ms | <5ms | | Encode latency | 3-5ms | <8ms | | Packetize latency | 1-2ms | <3ms | | Network (LAN) | 0.5-1ms | <2ms | | Decode latency | 1-2ms | <4ms | | Display latency | 1-2ms | <4ms | | **Total** | **15-25ms** | **<30ms** | #### Tuning Strategy 1. **Baseline Measurement** - Measure each stage individually - Identify bottlenecks 2. **Iterative Tuning** - Tune one parameter at a time - Measure impact on total latency - Trade off quality if needed 3. **Validation** - Test under various network conditions - Test under system load - Test with different content (static, dynamic) 4. **Continuous Monitoring** - Track latency in production - Alert on degradation - Adaptive adjustments --- ## Implementation Roadmap (Updated for Low Latency) ### Phase 1: MVP (Minimum Viable Product) - 4-6 weeks **Goal:** Basic screen capture and WebRTC streaming **Week 1-2: Core Infrastructure** - [ ] Project setup (Cargo.toml, directory structure) - [ ] Tokio async runtime setup - [ ] Error handling framework (anyhow/thiserror) - [ ] Logging setup (tracing) - [ ] Configuration management **Week 2-3: Wayland Capture** - [ ] PipeWire xdg-desktop-portal integration - [ ] Basic screen capture (single monitor) - [ ] DMA-BUF import/export - [ ] Frame receiver channel **Week 3-4: Simple Encoding** - [ ] x264 software encoder (fallback) - [ ] Basic frame pipeline (capture → encode) - [ ] Frame rate limiting **Week 4-5: WebRTC Transport** - [ ] webrtc-rs integration - [ ] Basic peer connection - [ ] Video track setup - [ ] Simple signaling (WebSocket) **Week 5-6: Testing & Integration** - [ ] End-to-end test (Wayland → WebRTC → Browser) - [ ] Performance benchmarking - [ ] Bug fixes **MVP Deliverables:** - Working screen capture - WebRTC streaming to browser - 15-30 FPS at 720p - x264 encoding (software) --- ### Phase 2: Hardware Acceleration - 3-4 weeks **Goal:** GPU-accelerated encoding for better performance **Week 1-2: VA-API Integration** - [ ] VA-API encoder implementation - [ ] DMA-BUF to VA-API surface import - [ ] H.264 encoding - [ ] Intel/AMD GPU support **Week 2-3: NVENC Integration** - [ ] NVENC encoder for NVIDIA GPUs - [ ] CUDA memory management - [ ] NVENC H.264 encoding **Week 3-4: Encoder Selection** - [ ] Encoder detection and selection - [ ] Fallback chain (NVENC → VA-API → x264) - [ ] Encoder switching at runtime **Phase 2 Deliverables:** - GPU-accelerated encoding - 30-60 FPS at 1080p - Lower CPU usage - Adaptive encoder selection --- ### Phase 3: Low Latency Optimization - 4-5 weeks **Goal:** Achieve 25-35ms latency on local networks **Week 1: Encoder Low-Latency Configuration (P0)** - [ ] Configure VA-API/NVENC for <5ms encoding - [ ] Disable B-frames, set GOP to 15 frames - [ ] Implement CBR rate control with small VBV buffer - [ ] Tune encoder preset (ultrafast/superfast) - [ ] Measure encoder latency independently **Week 2: Minimal Buffering (P0)** - [ ] Reduce sender buffer to 1 frame - [ ] Implement 0-10ms jitter buffer - [ ] Configure WebRTC playout delay (0-20ms) - [ ] Disable FEC for latency - [ ] Test end-to-end latency **Week 3: Damage Tracking & Partial Updates (P1)** - [ ] Implement region change detection - [ ] Add partial region encoding - [ ] Optimize for static content - [ ] Benchmark latency improvements **Week 4: Dynamic Frame Rate & Quality (P1)** - [ ] Implement adaptive frame rate (30-60fps) - [ ] Network quality detection - [ ] Dynamic bitrate vs latency trade-off - [ ] Fast frame dropping under load **Week 5: Advanced Optimizations (P1/P2)** - [ ] Limited NACK window (20ms) - [ ] Selective packet retransmission - [ ] RTCP fine-tuning (50ms intervals) - [ ] Performance profiling - [ ] Final latency tuning **Phase 3 Deliverables:** - 25-35ms latency on LAN - Zero-copy DMA-BUF pipeline - Hardware encoder with low-latency config - Minimal buffering throughout pipeline - Adaptive quality based on conditions --- ### Phase 4: Production Ready with Ultra Low Latency - 5-7 weeks **Goal:** Achieve 15-25ms latency while ensuring security, reliability, and deployment readiness **Week 1-2: Ultra Low Latency Tuning (P1/P2)** - [ ] Direct Wayland capture evaluation (if needed) - [ ] Custom WebRTC layer evaluation (if needed) - [ ] Advanced congestion control (SCReAM/Google CC) - [ ] Kernel bypass optimization (DPDK/AF_XDP if needed) - [ ] Final latency optimization and tuning **Week 2-3: Security** - [ ] Authentication (JWT, OAuth) - [ ] Encryption (TLS, DTLS) - [ ] Session management - [ ] Access control - [ ] Security audit and penetration testing **Week 3-4: Reliability** - [ ] Error recovery - [ ] Connection health monitoring - [ ] Automatic reconnection - [ ] Graceful degradation with latency awareness - [ ] Failover mechanisms **Week 4-5: Monitoring & Debugging** - [ ] Real-time latency metrics collection - [ ] Per-stage latency tracking - [ ] Logging improvements - [ ] Debug mode with frame inspection - [ ] Performance dashboard with latency visualization - [ ] Alerting for latency degradation **Week 5-6: Deployment** - [ ] Docker containerization - [ ] Systemd service - [ ] Configuration file with low-latency presets - [ ] Installation scripts - [ ] Performance tuning documentation **Week 6-7: Testing** - [ ] Integration tests - [ ] Load testing with latency monitoring - [ ] Cross-browser testing - [ ] Long-running stability tests - [ ] Latency regression tests - [ ] Automated performance benchmarks **Phase 4 Deliverables:** - 15-25ms latency on LAN - Production-ready deployment - Security features - Monitoring and observability - Comprehensive testing - Latency regression testing --- ### Phase 5: Advanced Features (Optional) - Ongoing **Potential Features:** - [ ] Audio capture and streaming - [ ] Bidirectional input (mouse, keyboard) - [ ] Clipboard sharing - [ ] File transfer - [ ] Recording (save sessions) - [ ] Multi-user sessions - [ ] Mobile client support - [ ] WebRTC data channels for control - [ ] WebRTC insertable streams (client-side effects) - [ ] Adaptive resolution - [ ] H.265/HEVC encoding - [ ] AV1 encoding - [ ] Screen region selection - [ ] Virtual display support - [ ] Wayland virtual pointer protocol --- ### Testing Strategy #### Unit Tests ```rust #[cfg(test)] mod tests { use super::*; #[tokio::test] async fn test_dma_buf_lifecycle() { let handle = DmaBufHandle::new(/* ... */); assert_eq!(handle.ref_count(), 1); let handle2 = handle.clone(); assert_eq!(handle.ref_count(), 2); drop(handle); assert_eq!(handle2.ref_count(), 1); drop(handle2); // Buffer freed } #[tokio::test] async fn test_encoder_pipeline() { let config = EncoderConfig { encoder_type: EncoderType::H264_X264, bitrate: 2_000_000, keyframe_interval: 30, preset: EncodePreset::Fast, }; let mut encoder = X264Encoder::new(config).unwrap(); let frame = create_test_frame(1920, 1080); let encoded = encoder.encode(frame).await.unwrap(); assert!(!encoded.data.is_empty()); assert!(encoded.is_keyframe); } } ``` #### Integration Tests ```rust #[tokio::test] async fn test_full_pipeline() { // Setup let capture = WaylandCapture::new(CaptureConfig::default()).await.unwrap(); let encoder = VaapiEncoder::new(EncoderConfig::default()).unwrap(); let webrtc = WebRtcServer::new(WebRtcConfig::default()).await.unwrap(); // Run pipeline for 100 frames for _ in 0..100 { let frame = capture.next_frame().await; let encoded = encoder.encode(frame).await.unwrap(); webrtc.send_video_frame("test-session", encoded).await.unwrap(); } // Verify assert_eq!(webrtc.frames_sent(), 100); } ``` #### Load Testing ```bash # Simulate 10 concurrent connections for i in {1..10}; do cargo test test_full_pipeline --release & done wait ``` #### Performance Benchmarks ```rust #[bench] fn bench_encode_frame(b: &mut Bencher) { let mut encoder = X264Encoder::new(config).unwrap(); let frame = create_test_frame(1920, 1080); b.iter(|| { encoder.encode(frame.clone()).unwrap() }); } ``` --- ## Potential Challenges & Solutions ### 1. Wayland Protocol Limitations **Challenge:** Wayland's security model restricts screen capture **Solution:** - Use xdg-desktop-portal for permission management - Implement user prompts for capture authorization - Support multiple portal backends (GNOME, KDE, etc.) ```rust pub async fn request_capture_permission() -> Result { let portal = Portal::new().await?; let session = portal.create_session(ScreenCaptureType::Monitor).await?; // User will see a dialog asking for permission let sources = portal.request_sources(&session).await?; Ok(!sources.is_empty()) } ``` **Alternative:** Use PipeWire directly with proper authentication ### 2. Hardware Acceleration Compatibility **Challenge:** Different GPUs require different APIs (VA-API, NVENC, etc.) **Solution:** - Implement multiple encoder backends - Runtime detection of available encoders - Graceful fallback to software encoding ```rust pub fn detect_best_encoder() -> EncoderType { // Try NVENC first (NVIDIA) if nvenc::is_available() { return EncoderType::H264_NVENC; } // Try VA-API (Intel/AMD) if vaapi::is_available() { return EncoderType::H264_VAAPI; } // Fallback to software EncoderType::H264_X264 } ``` ### 3. Cross-Browser WebRTC Compatibility **Challenge:** Different browsers have different WebRTC implementations **Solution:** - Use standardized codecs (H.264, VP8, VP9) - Implement codec negotiation - Provide fallback options ```rust pub fn get_supported_codecs() -> Vec { vec![ RTCRtpCodecCapability { mime_type: "video/H264".to_string(), clock_rate: 90000, ..Default::default() }, RTCRtpCodecCapability { mime_type: "video/VP9".to_string(), clock_rate: 90000, ..Default::default() }, ] } ``` ### 4. Security and Authentication **Challenge:** Secure remote access without exposing desktop to unauthorized users **Solution:** - Implement JWT-based authentication - Use DTLS for media encryption - Add rate limiting and access control ```rust pub struct AuthManager { secret: String, sessions: Arc>>, } impl AuthManager { pub fn create_token(&self, user_id: &str) -> Result { let claims = Claims { sub: user_id.to_string(), exp: Utc::now() + chrono::Duration::hours(1), }; encode(&Header::default(), &claims, &EncodingKey::from_secret(self.secret.as_ref())) } pub fn validate_token(&self, token: &str) -> Result { decode::( token, &DecodingKey::from_secret(self.secret.as_ref()), &Validation::default(), ) .map(|data| data.claims) .map_err(|_| AuthError::InvalidToken) } } ``` ### 5. Memory Management **Challenge:** Avoid memory leaks with DMA-BUF and shared memory **Solution:** - Use Rust's ownership system - RAII patterns for resource cleanup - Buffer pools with limits ```rust pub struct ScopedDmaBuf { handle: DmaBufHandle, } impl Drop for ScopedDmaBuf { fn drop(&mut self) { // Automatically release DMA-BUF // File descriptor closed // GPU memory freed } } // Usage ensures cleanup { let buf = ScopedDmaBuf::new(/* ... */); // Use buffer } // Automatically dropped here ``` ### 6. Latency Optimization **Challenge:** Minimize end-to-end latency **Solution:** - Zero-copy pipeline - Hardware acceleration - Adaptive quality - Frame skipping ```rust pub struct LatencyOptimizer { target_latency_ms: u32, current_latency_ms: u32, } impl LatencyOptimizer { pub fn adjust_parameters(&mut self) { if self.current_latency_ms > self.target_latency_ms { // Reduce quality to improve latency self.reduce_bitrate(); self.increase_frame_skipping(); } else { // Increase quality self.increase_bitrate(); } } } ``` ### 7. Network Conditions **Challenge:** Varying bandwidth and network conditions **Solution:** - Adaptive bitrate streaming - Multiple quality presets - Congestion control ```rust pub struct BandwidthMonitor { measurements: VecDeque, window_size: usize, } impl BandwidthMonitor { pub fn update(&mut self, bytes_sent: u32, duration: Duration) { let bandwidth = bytes_sent * 8 / duration.as_secs() as u32; self.measurements.push_back(bandwidth); if self.measurements.len() > self.window_size { self.measurements.pop_front(); } } pub fn average_bandwidth(&self) -> u32 { if self.measurements.is_empty() { return 0; } self.measurements.iter().sum::() / self.measurements.len() as u32 } } ``` ### 8. Cross-Platform Compatibility **Challenge:** Support different Linux distributions and desktop environments **Solution:** - Containerize application - Detect available technologies at runtime - Provide fallback options ```rust pub fn detect_desktop_environment() -> DesktopEnvironment { if std::path::Path::new("/usr/bin/gnome-shell").exists() { DesktopEnvironment::GNOME } else if std::path::Path::new("/usr/bin/plasmashell").exists() { DesktopEnvironment::KDE } else { DesktopEnvironment::Other } } pub fn configure_portal_for_env(env: DesktopEnvironment) -> PortalConfig { match env { DesktopEnvironment::GNOME => PortalConfig::gnome(), DesktopEnvironment::KDE => PortalConfig::kde(), DesktopEnvironment::Other => PortalConfig::generic(), } } ``` ### 9. Debugging and Troubleshooting **Challenge:** Debugging complex pipeline with multiple components **Solution:** - Comprehensive logging - Metrics collection - Debug mode with frame inspection ```rust pub struct DebugLogger { enabled: bool, output: DebugOutput, } pub enum DebugOutput { Console, File(PathBuf), Both, } impl DebugLogger { pub fn log_frame(&self, frame: &CapturedFrame) { if !self.enabled { return; } tracing::debug!( "Frame: {}x{}, format: {:?}, timestamp: {}", frame.width, frame.height, frame.format, frame.timestamp ); } pub fn log_encoding(&self, encoded: &EncodedFrame) { if !self.enabled { return; } tracing::debug!( "Encoded: {} bytes, keyframe: {}, seq: {}", encoded.data.len(), encoded.is_keyframe, encoded.sequence_number ); } } ``` ### 10. Resource Limits **Challenge:** Prevent resource exhaustion (CPU, memory, GPU) **Solution:** - Limit concurrent sessions - Monitor resource usage - Implement graceful degradation ```rust pub struct ResourceManager { max_sessions: usize, active_sessions: Arc>>, cpu_threshold: f32, memory_threshold: u64, } impl ResourceManager { pub async fn can_create_session(&self) -> bool { let sessions = self.active_sessions.lock().await; if sessions.len() >= self.max_sessions { return false; } if self.cpu_usage() > self.cpu_threshold { return false; } if self.memory_usage() > self.memory_threshold { return false; } true } } ``` --- ## Code Examples ### Main Application Entry Point ```rust // src/main.rs mod capture; mod encoder; mod webrtc; mod buffer; mod ipc; mod config; use anyhow::Result; use tracing::{info, error}; use tracing_subscriber; #[tokio::main] async fn main() -> Result<()> { // Initialize logging tracing_subscriber::fmt() .with_max_level(tracing::Level::INFO) .init(); // Load configuration let config = config::load_config("config.toml")?; info!("Starting Wayland WebRTC Backend"); info!("Configuration: {:?}", config); // Initialize components let capture = capture::WaylandCapture::new(config.capture).await?; let encoder = encoder::create_encoder(config.encoder)?; let webrtc = webrtc::WebRtcServer::new(config.webrtc).await?; // Create video track let video_track = webrtc::create_video_track()?; // Run capture pipeline let session_id = uuid::Uuid::new_v4().to_string(); webrtc.create_peer_connection(session_id.clone(), video_track).await?; // Main loop loop { match capture.next_frame().await { Ok(frame) => { match encoder.encode(frame).await { Ok(encoded) => { if let Err(e) = webrtc.send_video_frame(&session_id, encoded).await { error!("Failed to send frame: {}", e); } } Err(e) => { error!("Failed to encode frame: {}", e); } } } Err(e) => { error!("Failed to capture frame: {}", e); } } } } ``` ### Configuration Example ```toml # config.toml - Low Latency Configuration [server] address = "0.0.0.0:8080" max_sessions = 10 log_level = "info" [capture] frame_rate = 60 quality = "high" screen_region = null # screen_region = { x = 0, y = 0, width = 1920, height = 1080 } # Low-latency capture settings zero_copy = true # Always use DMA-BUF zero-copy track_damage = true # Enable damage tracking partial_updates = true # Encode only damaged regions buffer_pool_size = 3 # Small buffer pool for low latency [encoder] type = "auto" # Options: "vaapi", "nvenc", "x264", "auto" bitrate = 4000000 keyframe_interval = 15 # Short GOP for low latency preset = "ultrafast" # Ultrafast for minimal latency # Low-latency specific settings [encoder.low_latency] gop_size = 15 # Short GOP b_frames = 0 # No B-frames for low latency lookahead = 0 # Minimal lookahead rc_mode = "CBR" # Constant bitrate for predictable latency vbv_buffer_size = 4000000 # 1 second VBV buffer max_bitrate = 4000000 # Tight max constraint intra_period = 15 # Keyframe interval # Quality vs latency presets [encoder.quality_presets] low = { bitrate = 1000000, preset = "ultrafast", gop_size = 30 } medium = { bitrate = 2000000, preset = "ultrafast", gop_size = 20 } high = { bitrate = 4000000, preset = "ultrafast", gop_size = 15 } ultra = { bitrate = 8000000, preset = "veryfast", gop_size = 15 } [webrtc] stun_servers = ["stun:stun.l.google.com:19302"] turn_servers = [] [webrtc.low_latency] # Critical: Minimal playout delay playout_delay_min_ms = 0 # No minimum delay playout_delay_max_ms = 20 # 20ms maximum delay # Bitrate settings max_bitrate = 8000000 # 8 Mbps max min_bitrate = 500000 # 500 Kbps min start_bitrate = 4000000 # 4 Mbps start # Packetization rtp_payload_size = 1200 # Smaller packets for lower latency packetization_mode = "non_interleaved" # Retransmission settings nack_enabled = true # Enable NACK nack_window_size_ms = 20 # Only request retransmission for recent packets max_nack_packets_per_second = 50 fec_enabled = false # Disable FEC for low latency # Congestion control transport_cc_enabled = true # Enable transport-wide congestion control # RTCP settings rtcp_report_interval_ms = 50 # Frequent RTCP feedback [webrtc.ice] transport_policy = "all" candidate_network_types = ["udp", "tcp"] [buffer] # Minimal buffering for low latency dma_buf_pool_size = 3 # Small pool encoded_buffer_pool_size = 5 # Very small pool sender_buffer_max_frames = 1 # Single frame sender buffer jitter_buffer_target_delay_ms = 5 # 5ms target jitter buffer jitter_buffer_max_delay_ms = 10 # 10ms maximum jitter buffer [latency] # Latency targets target_latency_ms = 25 # Target end-to-end latency max_acceptable_latency_ms = 50 # Maximum acceptable latency # Adaptive settings adaptive_frame_rate = true adaptive_bitrate = true fast_frame_drop = true # Frame dropping strategy consecutive_drop_limit = 3 # Max consecutive drops drop_threshold_ms = 5 # Drop if queue latency exceeds this [monitoring] enable_metrics = true enable_latency_tracking = true metrics_port = 9090 ``` --- ## Performance Targets ### Latency Targets #### Local Network (LAN) | Scenario | Target | Acceptable | |----------|--------|------------| | Core Target | 25-35ms | 15-25ms (Excellent) | | Minimum | <50ms | 15-25ms (Excellent) | | User Experience | <16ms: Imperceptible | 16-33ms: Very Smooth | | User Experience | 33-50ms: Good | 50-100ms: Acceptable | #### Internet | Scenario | Target | Acceptable | |----------|--------|------------| | Excellent | 40-60ms | <80ms | | Good | 60-80ms | <100ms | ### Performance Metrics by Phase | Metric | MVP | Phase 2 | Phase 3 | Phase 4 | |--------|-----|---------|---------|---------| | FPS (LAN) | 30 | 60 | 60 | 60 | | FPS (Internet) | 15-20 | 30 | 30-60 | 60 | | Resolution | 720p | 1080p | 1080p/4K | 1080p/4K | | Latency (LAN) | <100ms | <50ms | 25-35ms | 15-25ms | | Latency (Internet) | <200ms | <100ms | 60-80ms | 40-60ms | | CPU Usage | 20-30% | 10-15% | 5-10% | <5% | | Memory Usage | 150MB | 250MB | 400MB | <400MB | | Bitrate | 2-4 Mbps | 4-8 Mbps | Adaptive | Adaptive | | Concurrent Sessions | 1 | 3-5 | 5-10 | 10+ | ### Latency Budget Allocation (15-25ms Target) | Component | Time (ms) | Percentage | Optimization Strategy | |-----------|-----------|------------|---------------------| | Wayland Capture | 2-3 | 12-15% | DMA-BUF zero-copy, partial update | | Encoder | 3-5 | 20-25% | Hardware encoder, no B-frames | | Packetization | 1-2 | 6-10% | Inline RTP, minimal buffering | | Network (LAN) | 0.5-1 | 3-5% | UDP direct path, kernel bypass | | Jitter Buffer | 0-2 | 0-10% | Minimal buffer, predictive jitter | | Decoder | 1-2 | 6-10% | Hardware acceleration | | Display | 1-2 | 6-10% | vsync bypass, direct scanout | | **Total** | **15-25** | **100%** | | --- ## Conclusion This design provides a comprehensive blueprint for building an ultra-low-latency Wayland → WebRTC remote desktop backend in Rust. Key highlights: 1. **Zero-Copy Architecture**: Minimizes CPU copies through DMA-BUF and reference-counted buffers, achieving <5ms copy overhead 2. **Hardware Acceleration**: VA-API/NVENC encoders configured for <5ms encoding latency 3. **Minimal Buffering**: Single-frame sender buffer and 0-10ms jitter buffer throughout pipeline 4. **Low-Latency WebRTC**: Custom configuration with 0-20ms playout delay, no FEC, limited NACK 5. **Performance**: Targets 15-25ms latency on local networks at 60 FPS 6. **Adaptive Quality**: Dynamic frame rate (30-60fps) and bitrate adjustment based on network conditions 7. **Damage Tracking**: Partial region updates for static content to reduce encoding load **Latency Budget Breakdown (15-25ms target):** - Capture: 2-3ms (DMA-BUF zero-copy) - Encoder: 3-5ms (hardware, no B-frames) - Packetization: 1-2ms (inline RTP) - Network: 0.5-1ms (LAN) - Jitter Buffer: 0-2ms (minimal) - Decoder: 1-2ms (hardware) - Display: 1-2ms (vsync bypass) The phased implementation approach allows for incremental development and testing: - **Phase 1 (4-6 weeks)**: MVP with <100ms latency - **Phase 2 (3-4 weeks)**: Hardware acceleration, <50ms latency - **Phase 3 (4-5 weeks)**: Low-latency optimizations, 25-35ms latency - **Phase 4 (5-7 weeks)**: Ultra-low latency tuning, 15-25ms latency Critical P0 optimizations for achieving 15-25ms latency: 1. Hardware encoder with zero B-frames, 15-frame GOP 2. DMA-BUF zero-copy capture pipeline 3. Minimal buffering (1 frame sender, 0-10ms jitter) 4. WebRTC low-latency configuration (0-20ms playout delay) --- ## Additional Resources ### Wayland & PipeWire - [Wayland Protocol](https://wayland.freedesktop.org/docs/html/) - [PipeWire Documentation](https://docs.pipewire.org/) - [xdg-desktop-portal](https://flatpak.github.io/xdg-desktop-portal/) ### WebRTC - [WebRTC Specifications](https://www.w3.org/TR/webrtc/) - [webrtc-rs](https://github.com/webrtc-rs/webrtc) - [WebRTC for the Curious](https://webrtcforthecurious.com/) ### Video Encoding - [VA-API](https://github.com/intel/libva) - [NVENC](https://developer.nvidia.com/nvidia-video-codec-sdk) - [x264](https://www.videolan.org/developers/x264.html) ### Rust - [Tokio](https://tokio.rs/) - [Bytes](https://docs.rs/bytes/) - [Async Rust Book](https://rust-lang.github.io/async-book/)