Files
wl-webrtc/DESIGN.md
2026-02-03 11:14:25 +08:00

86 KiB

Wayland → WebRTC Remote Desktop Backend

Technical Design Document

Table of Contents

  1. System Architecture
  2. Technology Stack
  3. Key Components Design
  4. Data Flow Optimization
  5. Low Latency Optimization
  6. Implementation Roadmap
  7. Potential Challenges & Solutions

System Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                          Client Browser                              │
│                    (WebRTC Receiver)                                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │ WebRTC (UDP/TCP)
                              │ Signaling (WebSocket/HTTP)
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Signaling Server                              │
│                    (WebSocket/WebSocket Secure)                     │
│                      - Session Management                           │
│                      - SDP Exchange                                  │
│                      - ICE Candidates                                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Rust Backend Server                               │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Capture    │    │   Encoder    │    │  WebRTC      │          │
│  │   Manager    │───▶│  Pipeline    │───▶│  Transport   │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                  │                  │                      │
│         ▼                  ▼                  ▼                      │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   PipeWire   │    │   Video      │    │  Data        │          │
│  │   Portal     │    │   Encoder    │    │  Channels    │          │
│  │   (xdg-      │    │   (H.264/    │    │  (Input/     │          │
│  │   desktop-   │    │   H.265/VP9) │    │   Control)   │          │
│  │   portal)    │    └──────────────┘    └──────────────┘          │
│  └──────────────┘                                                     │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                Zero-Copy Buffer Manager                     │    │
│  │           - DMA-BUF Import/Export                            │    │
│  │           - Shared Memory Pools                              │    │
│  │           - Memory Ownership Tracking                         │    │
│  └─────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Wayland Compositor                                │
│                  (PipeWire Screen Sharing)                           │
└─────────────────────────────────────────────────────────────────────┘

Component Breakdown

1. Capture Manager

Responsibilities:

  • Interface with PipeWire xdg-desktop-portal
  • Request screen capture permissions
  • Receive DMA-BUF frames
  • Manage frame buffer lifecycle

Key Technologies:

  • pipewire crate for PipeWire protocol
  • wayland-client for Wayland protocol
  • ashpd for desktop portals
pub struct CaptureManager {
    pipewire_connection: Rc<PipewireConnection>,
    stream_handle: Option<StreamHandle>,
    frame_sender: async_channel::Sender<CapturedFrame>,
    config: CaptureConfig,
}

pub struct CaptureConfig {
    pub frame_rate: u32,
    pub quality: QualityLevel,
    pub screen_region: Option<ScreenRegion>,
}

pub enum QualityLevel {
    Low,
    Medium,
    High,
    Ultra,
}

pub struct CapturedFrame {
    pub dma_buf: DmaBufHandle,
    pub width: u32,
    pub height: u32,
    pub format: PixelFormat,
    pub timestamp: u64,
}

2. Encoder Pipeline

Responsibilities:

  • Receive raw frames from capture
  • Encode to H.264/H.265/VP9
  • Hardware acceleration (VA-API, NVENC, VideoToolbox)
  • Bitrate adaptation

Zero-Copy Strategy:

  • Direct DMA-BUF to encoder (no CPU copies)
  • Encoder outputs to memory-mapped buffers
  • WebRTC consumes encoded buffers directly
pub struct EncoderPipeline {
    encoder: Box<dyn VideoEncoder>,
    config: EncoderConfig,
    stats: EncoderStats,
}

pub trait VideoEncoder: Send + Sync {
    fn encode_frame(
        &mut self,
        frame: CapturedFrame,
    ) -> Result<EncodedFrame, EncoderError>;
    
    fn set_bitrate(&mut self, bitrate: u32) -> Result<(), EncoderError>;
    
    fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}

pub struct EncodedFrame {
    pub data: Bytes, // Zero-copy Bytes wrapper
    pub is_keyframe: bool,
    pub timestamp: u64,
    pub sequence_number: u64,
}

3. WebRTC Transport

Responsibilities:

  • WebRTC peer connection management
  • Media track (video) and data channels
  • RTP packetization
  • ICE/STUN/TURN handling
  • Congestion control

Libraries:

  • webrtc crate (webrtc-rs) or custom WebRTC implementation
pub struct WebRtcTransport {
    peer_connection: RTCPeerConnection,
    video_track: RTCVideoTrack,
    data_channel: Option<RTCDataChannel>,
    config: WebRtcConfig,
}

pub struct WebRtcConfig {
    pub stun_servers: Vec<String>,
    pub turn_servers: Vec<TurnServer>,
    pub ice_transport_policy: IceTransportPolicy,
}

pub struct TurnServer {
    pub urls: Vec<String>,
    pub username: String,
    pub credential: String,
}

4. Zero-Copy Buffer Manager

Responsibilities:

  • Manage DMA-BUF lifecycle
  • Pool pre-allocated memory
  • Track ownership via Rust types
  • Coordinate with PipeWire memory pools
pub struct BufferManager {
    dma_buf_pool: Pool<DmaBufHandle>,
    encoded_buffer_pool: Pool<Bytes>,
    max_buffers: usize,
}

impl BufferManager {
    pub fn acquire_dma_buf(&self) -> Option<DmaBufHandle> {
        self.dma_buf_pool.acquire()
    }
    
    pub fn release_dma_buf(&self, handle: DmaBufHandle) {
        self.dma_buf_pool.release(handle)
    }
    
    pub fn acquire_encoded_buffer(&self, size: usize) -> Option<Bytes> {
        self.encoded_buffer_pool.acquire_with_size(size)
    }
}

Data Flow

Wayland Compositor
        │
        │ DMA-BUF (GPU memory)
        ▼
PipeWire Portal
        │
        │ DMA-BUF file descriptor
        ▼
Capture Manager
        │
        │ CapturedFrame { dma_buf, ... }
        │ (Zero-copy ownership transfer)
        ▼
Buffer Manager
        │
        │ DmaBufHandle (moved, not copied)
        ▼
Encoder Pipeline
        │
        │ EncodedFrame { data: Bytes, ... }
        │ (Zero-copy Bytes wrapper)
        ▼
WebRTC Transport
        │
        │ RTP Packets (reference to Bytes)
        ▼
Network (UDP/TCP)
        │
        ▼
Client Browser

Technology Stack

Core Dependencies

[dependencies]
# Async Runtime
tokio = { version = "1.35", features = ["full", "rt-multi-thread"] }
async-trait = "0.1"

# Wayland & PipeWire
wayland-client = "0.31"
wayland-protocols = "0.31"
pipewire = "0.8"
ashpd = "0.8"

# Video Encoding (Low Latency)
openh264 = { version = "0.6", optional = true }
x264 = { version = "0.4", optional = true }
nvenc = { version = "0.1", optional = true }
vpx = { version = "0.1", optional = true }

# Hardware Acceleration (Low Latency)
libva = { version = "0.14", optional = true }     # VA-API
nvidia-encode = { version = "0.5", optional = true }  # NVENC

# WebRTC (Low Latency Configuration)
webrtc = "0.11"  # webrtc-rs

# Memory & Zero-Copy
bytes = "1.5"
memmap2 = "0.9"
shared_memory = "0.12"

# Lock-free data structures for minimal contention
crossbeam = { version = "0.8", features = ["std"] }
crossbeam-channel = "0.5"
crossbeam-queue = "0.3"
parking_lot = "0.12"  # Faster mutexes

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

# Logging & Tracing
tracing = "0.1"
tracing-subscriber = "0.3"
tracing-opentelemetry = "0.22"  # For latency monitoring

# Metrics & Monitoring
prometheus = { version = "0.13", optional = true }
metrics = "0.21"

# Error Handling
anyhow = "1.0"
thiserror = "1.0"

# Utilities
regex = "1.10"
uuid = { version = "1.6", features = ["v4", "serde", "fast-rng"] }
instant = "0.1"  # High-precision timing

[features]
default = ["software-encoder", "webrtc-rs"]

# Encoder Options
software-encoder = ["x264", "openh264"]
hardware-vaapi = ["libva"]
hardware-nvenc = ["nvidia-encode"]
all-encoders = ["software-encoder", "hardware-vaapi", "hardware-nvenc"]

# WebRTC Implementation
webrtc-rs = ["webrtc"]
custom-webrtc = []

# Low Latency Features
low-latency = []
ultra-low-latency = ["low-latency", "all-encoders"]

# Monitoring
monitoring = ["prometheus", "tracing-opentelemetry"]

# Development
dev = ["monitoring", "all-encoders"]

Encoder Options

Encoder Hardware Performance Quality License Use Case
H.264 (x264) CPU Medium High GPL Fallback
H.264 (VA-API) GPU High Medium Open Source Linux Intel/AMD
H.264 (NVENC) GPU (NVIDIA) Very High High Proprietary NVIDIA GPUs
H.265 (HEVC) GPU High Very High Mixed Bandwidth-constrained
VP9 CPU/GPU Medium High BSD Open Web
AV1 GPU Medium Very High Open Source Future-proof

Recommended Primary: VA-API H.264 (Linux), NVENC H.264 (NVIDIA) Recommended Fallback: x264 H.264 (software)

WebRTC Libraries

Option 1: webrtc-rs (Recommended)

  • Pure Rust implementation
  • Active development
  • Good WebRTC spec compliance
  • Zero-copy support for media

Option 2: Custom Implementation

  • Use webrtc crate as base
  • Add specialized zero-copy optimizations
  • Tighter integration with encoder pipeline

Key Components Design

1. Wayland Screen Capture Module

// src/capture/mod.rs
use pipewire as pw;
use pipewire::properties;
use pipewire::spa::param::format::Format;
use pipewire::stream::StreamFlags;
use async_channel::{Sender, Receiver};

pub struct WaylandCapture {
    core: pw::Core,
    context: pw::Context,
    main_loop: pw::MainLoop,
    stream: pw::stream::Stream,
    frame_sender: Sender<CapturedFrame>,
    frame_receiver: Receiver<CapturedFrame>,
}

impl WaylandCapture {
    pub async fn new(config: CaptureConfig) -> Result<Self, CaptureError> {
        let main_loop = pw::MainLoop::new()?;
        let context = pw::Context::new(&main_loop)?;
        let core = context.connect(None)?;
        
        // Request screen capture via xdg-desktop-portal
        let portal = Portal::new().await?;
        let session = portal.create_session(ScreenCaptureType::Monitor).await?;
        let sources = portal.request_sources(&session).await?;
        
        let (sender, receiver) = async_channel::bounded(30);
        
        Ok(Self {
            core,
            context,
            main_loop,
            stream: Self::create_stream(&context, &session, sender.clone())?,
            frame_sender: sender,
            frame_receiver: receiver,
        })
    }
    
    fn create_stream(
        context: &pw::Context,
        session: &Session,
        sender: Sender<CapturedFrame>,
    ) -> Result<pw::stream::Stream, pw::Error> {
        let mut stream = pw::stream::Stream::new(
            context,
            "wl-webrtc-capture",
            properties! {
                *pw::keys::MEDIA_TYPE => "Video",
                *pw::keys::MEDIA_CATEGORY => "Capture",
                *pw::keys::MEDIA_ROLE => "Screen",
            },
        )?;
        
        stream.connect(
            pw::spa::direction::Direction::Input,
            None,
            StreamFlags::AUTOCONNECT | StreamFlags::MAP_BUFFERS,
        )?;
        
        // Set up callback for new frames (zero-copy DMA-BUF)
        let listener = stream.add_local_listener()?;
        listener
            .register(pw::stream::events::Events::param_done, |data| {
                // Handle stream parameter changes
            })
            .register(pw::stream::events::Events::process, |data| {
                // Process new frame - DMA-BUF is already mapped
                Self::process_frame(data, sender.clone());
            })?;
        
        Ok(stream)
    }
    
    fn process_frame(
        stream: &pw::stream::Stream,
        sender: Sender<CapturedFrame>,
    ) {
        // Get buffer without copying - DMA-BUF is in GPU memory
        let buffer = stream.dequeue_buffer().expect("no buffer");
        let datas = buffer.datas();
        let data = &datas[0];
        
        // Create zero-copy frame
        let frame = CapturedFrame {
            dma_buf: DmaBufHandle::from_buffer(buffer),
            width: stream.format().unwrap().size().width,
            height: stream.format().unwrap().size().height,
            format: PixelFormat::from_spa_format(&stream.format().unwrap()),
            timestamp: timestamp_ns(),
        };
        
        // Send frame (ownership transferred via move)
        let _ = sender.try_send(frame);
    }
    
    pub async fn next_frame(&self) -> CapturedFrame {
        self.frame_receiver.recv().await.unwrap()
    }
}

// Zero-copy DMA-BUF handle
pub struct DmaBufHandle {
    fd: RawFd,
    size: usize,
    stride: u32,
    offset: u32,
}

impl DmaBufHandle {
    pub fn from_buffer(buffer: &pw::buffer::Buffer) -> Self {
        let data = &buffer.datas()[0];
        Self {
            fd: data.fd().unwrap(),
            size: data.chunk().size() as usize,
            stride: data.chunk().stride(),
            offset: data.chunk().offset(),
        }
    }
    
    pub unsafe fn as_ptr(&self) -> *mut u8 {
        // Memory map the DMA-BUF
        let ptr = libc::mmap(
            ptr::null_mut(),
            self.size,
            libc::PROT_READ,
            libc::MAP_SHARED,
            self.fd,
            self.offset as i64,
        );
        
        if ptr == libc::MAP_FAILED {
            panic!("Failed to mmap DMA-BUF");
        }
        
        ptr as *mut u8
    }
}

impl Drop for DmaBufHandle {
    fn drop(&mut self) {
        // Unmap and close FD when handle is dropped
        unsafe {
            libc::munmap(ptr::null_mut(), self.size);
            libc::close(self.fd);
        }
    }
}

2. Frame Buffer Management (Zero-Copy)

// src/buffer/mod.rs
use bytes::Bytes;
use std::sync::Arc;
use std::collections::VecDeque;

pub struct FrameBufferPool {
    dma_bufs: VecDeque<DmaBufHandle>,
    encoded_buffers: VecDeque<Bytes>,
    max_dma_bufs: usize,
    max_encoded: usize,
}

impl FrameBufferPool {
    pub fn new(max_dma_bufs: usize, max_encoded: usize) -> Self {
        Self {
            dma_bufs: VecDeque::with_capacity(max_dma_bufs),
            encoded_buffers: VecDeque::with_capacity(max_encoded),
            max_dma_bufs,
            max_encoded,
        }
    }
    
    pub fn acquire_dma_buf(&mut self) -> Option<DmaBufHandle> {
        self.dma_bufs.pop_front()
    }
    
    pub fn release_dma_buf(&mut self, buf: DmaBufHandle) {
        if self.dma_bufs.len() < self.max_dma_bufs {
            self.dma_bufs.push_back(buf);
        }
        // Else: Drop the buffer, let OS reclaim DMA-BUF
    }
    
    pub fn acquire_encoded_buffer(&mut self, size: usize) -> Bytes {
        // Try to reuse existing buffer
        if let Some(mut buf) = self.encoded_buffers.pop_front() {
            if buf.len() >= size {
                // Slice to requested size (zero-copy view)
                return buf.split_to(size);
            }
        }
        
        // Allocate new buffer if needed
        Bytes::from(vec![0u8; size])
    }
    
    pub fn release_encoded_buffer(&mut self, buf: Bytes) {
        if self.encoded_buffers.len() < self.max_encoded {
            self.encoded_buffers.push_back(buf);
        }
        // Else: Drop the buffer, memory freed
    }
}

// Zero-copy frame wrapper
pub struct ZeroCopyFrame {
    pub data: Bytes, // Reference-counted, no copying
    pub metadata: FrameMetadata,
}

pub struct FrameMetadata {
    pub width: u32,
    pub height: u32,
    pub format: PixelFormat,
    pub timestamp: u64,
    pub is_keyframe: bool,
}

// Smart pointer for DMA-BUF
pub struct DmaBufPtr {
    ptr: *mut u8,
    len: usize,
    _marker: PhantomData<&'static mut [u8]>,
}

impl DmaBufPtr {
    pub unsafe fn new(ptr: *mut u8, len: usize) -> Self {
        Self {
            ptr,
            len,
            _marker: PhantomData,
        }
    }
    
    pub fn as_slice(&self) -> &[u8] {
        unsafe { std::slice::from_raw_parts(self.ptr, self.len) }
    }
}

unsafe impl Send for DmaBufPtr {}
unsafe impl Sync for DmaBufPtr {}

impl Drop for DmaBufPtr {
    fn drop(&mut self) {
        // Memory will be unmapped by DmaBufHandle's Drop
    }
}

3. Video Encoder Integration

// src/encoder/mod.rs
use async_trait::async_trait;

pub enum EncoderType {
    H264_VAAPI,
    H264_NVENC,
    H264_X264,
    VP9_VAAPI,
}

pub struct EncoderConfig {
    pub encoder_type: EncoderType,
    pub bitrate: u32,
    pub keyframe_interval: u32,
    pub preset: EncodePreset,
}

pub enum EncodePreset {
    Ultrafast,
    Superfast,
    Veryfast,
    Faster,
    Fast,
    Medium,
    Slow,
    Slower,
    Veryslow,
}

#[async_trait]
pub trait VideoEncoder: Send + Sync {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError>;
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError>;
    async fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}

pub struct VaapiEncoder {
    display: va::Display,
    context: va::Context,
    config: EncoderConfig,
    sequence_number: u64,
}

impl VaapiEncoder {
    pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
        let display = va::Display::open(None)?;
        let context = va::Context::new(&display)?;
        
        Ok(Self {
            display,
            context,
            config,
            sequence_number: 0,
        })
    }
}

#[async_trait]
impl VideoEncoder for VaapiEncoder {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
        // Zero-copy: Import DMA-BUF directly into VA-API surface
        let surface = unsafe {
            self.context.import_dma_buf(
                frame.dma_buf.fd,
                frame.width,
                frame.height,
                frame.format.as_va_format(),
            )?
        };
        
        // Encode frame (hardware accelerated)
        let encoded_data = self.context.encode_surface(surface)?;
        
        // Create zero-copy Bytes wrapper
        let bytes = Bytes::from(encoded_data);
        
        self.sequence_number += 1;
        
        Ok(EncodedFrame {
            data: bytes,
            is_keyframe: surface.is_keyframe(),
            timestamp: frame.timestamp,
            sequence_number: self.sequence_number,
        })
    }
    
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
        self.config = config;
        self.context.set_bitrate(config.bitrate)?;
        self.context.set_preset(config.preset)?;
        Ok(())
    }
    
    async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
        self.context.force_keyframe()?;
        Ok(())
    }
}

// Fallback software encoder
pub struct X264Encoder {
    encoder: x264::Encoder,
    config: EncoderConfig,
    sequence_number: u64,
}

impl X264Encoder {
    pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
        let params = x264::Params::default();
        params.set_width(1920);
        params.set_height(1080);
        params.set_fps(60, 1);
        params.set_bitrate(config.bitrate);
        params.set_preset(config.preset);
        params.set_tune("zerolatency");
        
        let encoder = x264::Encoder::open(&params)?;
        
        Ok(Self {
            encoder,
            config,
            sequence_number: 0,
        })
    }
}

#[async_trait]
impl VideoEncoder for X264Encoder {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
        // Map DMA-BUF to CPU memory (one-time copy)
        let ptr = unsafe { frame.dma_buf.as_ptr() };
        let slice = unsafe { std::slice::from_raw_parts(ptr, frame.dma_buf.size) };
        
        // Convert to YUV if needed
        let yuv_frame = self.convert_to_yuv(slice, frame.width, frame.height)?;
        
        // Encode frame
        let encoded_data = self.encoder.encode(&yuv_frame)?;
        
        self.sequence_number += 1;
        
        Ok(EncodedFrame {
            data: Bytes::from(encoded_data),
            is_keyframe: self.encoder.is_keyframe(),
            timestamp: frame.timestamp,
            sequence_number: self.sequence_number,
        })
    }
    
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
        self.config = config;
        // Reopen encoder with new params
        Ok(())
    }
    
    async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
        self.encoder.force_keyframe();
        Ok(())
    }
}

4. WebRTC Signaling and Data Transport

// src/webrtc/mod.rs
use webrtc::{
    api::APIBuilder,
    ice_transport::ice_server::RTCIceServer,
    media_track::{track_local::track_local_static_sample::TrackLocalStaticSample, TrackLocal},
    peer_connection::{
        configuration::RTCConfiguration,
        peer_connection_state::RTCPeerConnectionState,
        sdp::session_description::RTCSessionDescription,
        RTCPeerConnection,
    },
    rtp_transceiver::rtp_codec::RTCRtpCodecCapability,
};

pub struct WebRtcServer {
    api: webrtc::API,
    peer_connections: Arc<Mutex<HashMap<String, PeerConnection>>>,
    signaling_server: SignalingServer,
}

impl WebRtcServer {
    pub async fn new(config: WebRtcConfig) -> Result<Self, WebRtcError> {
        let mut api = APIBuilder::new().build();
        
        let signaling_server = SignalingServer::new(config.signaling_addr).await?;
        
        Ok(Self {
            api,
            peer_connections: Arc::new(Mutex::new(HashMap::new())),
            signaling_server,
        })
    }
    
    pub async fn create_peer_connection(
        &self,
        session_id: String,
        video_track: TrackLocalStaticSample,
    ) -> Result<String, WebRtcError> {
        let config = RTCConfiguration {
            ice_servers: vec![RTCIceServer {
                urls: self.signaling_server.stun_servers(),
                ..Default::default()
            }],
            ..Default::default()
        };
        
        let pc = self.api.new_peer_connection(config).await?;
        
        // Add video track
        let rtp_transceiver = pc
            .add_track(Arc::new(video_track))
            .await?;
        
        // Set ICE candidate handler
        let peer_connections = self.peer_connections.clone();
        pc.on_ice_candidate(Box::new(move |candidate| {
            let peer_connections = peer_connections.clone();
            Box::pin(async move {
                if let Some(candidate) = candidate {
                    // Send candidate to signaling server
                    // ...
                }
            })
        }))
        .await;
        
        // Store peer connection
        self.peer_connections
            .lock()
            .await
            .insert(session_id.clone(), PeerConnection::new(pc));
        
        Ok(session_id)
    }
    
    pub async fn send_video_frame(
        &self,
        session_id: &str,
        frame: EncodedFrame,
    ) -> Result<(), WebRtcError> {
        let peer_connections = self.peer_connections.lock().await;
        
        if let Some(peer) = peer_connections.get(session_id) {
            peer.video_track.write_sample(&webrtc::media::Sample {
                data: frame.data.to_vec(),
                duration: std::time::Duration::from_nanos(frame.timestamp),
                ..Default::default()
            }).await?;
        }
        
        Ok(())
    }
}

pub struct PeerConnection {
    pc: RTCPeerConnection,
    video_track: Arc<TrackLocalStaticSample>,
    data_channel: Option<Arc<RTCDataChannel>>,
}

impl PeerConnection {
    pub async fn create_offer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
        let offer = self.pc.create_offer(None).await?;
        self.pc.set_local_description(offer.clone()).await?;
        Ok(offer)
    }
    
    pub async fn set_remote_description(
        &mut self,
        desc: RTCSessionDescription,
    ) -> Result<(), WebRtcError> {
        self.pc.set_remote_description(desc).await
    }
    
    pub async fn create_answer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
        let answer = self.pc.create_answer(None).await?;
        self.pc.set_local_description(answer.clone()).await?;
        Ok(answer)
    }
}

// Data channel for input/control
pub struct DataChannelManager {
    channels: HashMap<String, Arc<RTCDataChannel>>,
}

impl DataChannelManager {
    pub async fn send_input(&self, channel_id: &str, input: InputEvent) -> Result<()> {
        if let Some(channel) = self.channels.get(channel_id) {
            let data = serde_json::to_vec(&input)?;
            channel.send(&data).await?;
        }
        Ok(())
    }
    
    pub fn on_input<F>(&mut self, channel_id: String, callback: F)
    where
        F: Fn(InputEvent) + Send + Sync + 'static,
    {
        if let Some(channel) = self.channels.get(&channel_id) {
            channel.on_message(Box::new(move |msg| {
                if let Ok(input) = serde_json::from_slice::<InputEvent>(&msg.data) {
                    callback(input);
                }
            })).unwrap();
        }
    }
}

#[derive(Debug, Serialize, Deserialize)]
pub enum InputEvent {
    MouseMove { x: f32, y: f32 },
    MouseClick { button: MouseButton },
    KeyPress { key: String },
    KeyRelease { key: String },
}

#[derive(Debug, Serialize, Deserialize)]
pub enum MouseButton {
    Left,
    Right,
    Middle,
}

5. IPC Layer (Optional)

// src/ipc/mod.rs
use tokio::net::UnixListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};

pub struct IpcServer {
    listener: UnixListener,
}

impl IpcServer {
    pub async fn new(socket_path: &str) -> Result<Self> {
        // Remove existing socket if present
        let _ = std::fs::remove_file(socket_path);
        
        let listener = UnixListener::bind(socket_path)?;
        
        Ok(Self { listener })
    }
    
    pub async fn run(&self, sender: async_channel::Sender<IpcMessage>) {
        loop {
            match self.listener.accept().await {
                Ok((mut stream, _)) => {
                    let sender = sender.clone();
                    tokio::spawn(async move {
                        let mut buf = [0; 1024];
                        loop {
                            match stream.read(&mut buf).await {
                                Ok(0) => break,
                                Ok(n) => {
                                    if let Ok(msg) = serde_json::from_slice::<IpcMessage>(&buf[..n]) {
                                        let _ = sender.send(msg).await;
                                    }
                                }
                                Err(_) => break,
                            }
                        }
                    });
                }
                Err(_) => continue,
            }
        }
    }
}

#[derive(Debug, Serialize, Deserialize)]
pub enum IpcMessage {
    StartCapture { session_id: String },
    StopCapture { session_id: String },
    SetQuality { level: QualityLevel },
    GetStatus,
}

Data Flow Optimization

Zero-Copy Pipeline Stages

Stage 1: Capture
  Input: Wayland Compositor (GPU memory)
  Output: DMA-BUF file descriptor
  Copy: None (Zero-copy)
  
Stage 2: Buffer Manager
  Input: DMA-BUF FD
  Output: DmaBufHandle (RAII wrapper)
  Copy: None (Zero-copy ownership transfer)
  
Stage 3: Encoder
  Input: DmaBufHandle
  Output: Bytes (reference-counted)
  Copy: None (DMA-BUF imported directly to GPU encoder)
  
Stage 4: WebRTC
  Input: Bytes
  Output: RTP packets (references to Bytes)
  Copy: None (Zero-copy to socket buffers)
  
Stage 5: Network
  Input: RTP packets
  Output: UDP datagrams
  Copy: Minimal (kernel space only)

Memory Ownership Transfer

// Example: Ownership transfer through pipeline
async fn process_frame_pipeline(
    mut capture: WaylandCapture,
    mut encoder: VaapiEncoder,
    mut webrtc: WebRtcServer,
) -> Result<()> {
    loop {
        // Stage 1: Capture (ownership moves from PipeWire to our code)
        let frame = capture.next_frame().await; // CapturedFrame owns DmaBufHandle
        
        // Stage 2: Encode (ownership moved, not copied)
        let encoded = encoder.encode(frame).await?; // EncodedFrame owns Bytes
        
        // Stage 3: Send (Bytes is reference-counted, no copy)
        webrtc.send_video_frame("session-123", encoded).await?;
        
        // Ownership transferred all the way without copying
    }
}

Buffer Sharing Mechanisms

1. DMA-BUF (Primary)

  • GPU memory buffers
  • Exported as file descriptors
  • Zero-copy to hardware encoders
  • Limited to same GPU/driver
pub fn export_dma_buf(surface: &va::Surface) -> Result<DmaBufHandle> {
    let fd = surface.export_dma_buf()?;
    Ok(DmaBufHandle {
        fd,
        size: surface.size(),
        stride: surface.stride(),
        offset: 0,
    })
}

2. Shared Memory (Fallback)

  • POSIX shared memory (shm_open)
  • For software encoding path
  • Copy from DMA-BUF to shared memory
pub fn create_shared_buffer(size: usize) -> Result<SharedBuffer> {
    let name = format!("/wl-webrtc-{}", uuid::Uuid::new_v4());
    let fd = shm_open(&name, O_CREAT | O_RDWR, 0666)?;
    ftruncate(fd, size as i64)?;
    
    let ptr = unsafe { mmap(ptr::null_mut(), size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) };
    
    Ok(SharedBuffer {
        ptr,
        size,
        fd,
        name,
    })
}

3. Memory-Mapped Files (Alternative)

  • For persistent caching
  • Cross-process communication
  • Used for frame buffering
pub struct MappedFile {
    file: File,
    ptr: *mut u8,
    size: usize,
}

impl MappedFile {
    pub fn new(path: &Path, size: usize) -> Result<Self> {
        let file = OpenOptions::new()
            .read(true)
            .write(true)
            .create(true)
            .open(path)?;
        
        file.set_len(size as u64)?;
        
        let ptr = unsafe {
            mmap(
                ptr::null_mut(),
                size,
                PROT_READ | PROT_WRITE,
                MAP_SHARED,
                file.as_raw_fd(),
                0,
            )
        }?;
        
        Ok(Self { file, ptr, size })
    }
}

Pipeline Optimization Strategies

1. Parallel Encoding

// Run multiple encoders in parallel for different quality levels
pub struct AdaptiveEncoder {
    encoders: Vec<Box<dyn VideoEncoder>>,
    active_encoder: usize,
    bandwidth_monitor: BandwidthMonitor,
}

impl AdaptiveEncoder {
    pub async fn encode_adaptive(&mut self, frame: CapturedFrame) -> Result<EncodedFrame> {
        let bandwidth = self.bandwidth_monitor.current_bandwidth();
        
        // Switch encoder based on bandwidth
        let new_encoder = match bandwidth {
            b if b < 500_000 => 0,  // Low bitrate
            b if b < 2_000_000 => 1, // Medium bitrate
            _ => 2,                  // High bitrate
        };
        
        if new_encoder != self.active_encoder {
            self.active_encoder = new_encoder;
        }
        
        self.encoders[self.active_encoder].encode(frame).await
    }
}

2. Frame Skipping

pub struct FrameSkipper {
    target_fps: u32,
    last_frame_time: Instant,
    skip_count: u32,
}

impl FrameSkipper {
    pub fn should_skip(&mut self) -> bool {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_frame_time).as_millis();
        let frame_interval = 1000 / self.target_fps as u128;
        
        if elapsed < frame_interval {
            self.skip_count += 1;
            return true;
        }
        
        self.last_frame_time = now;
        self.skip_count = 0;
        false
    }
}

3. Region of Interest (ROI)

pub struct RegionEncoder {
    full_encoder: Box<dyn VideoEncoder>,
    roi_encoder: Box<dyn VideoEncoder>,
    current_region: Option<ScreenRegion>,
}

impl RegionEncoder {
    pub async fn encode_roi(
        &mut self,
        frame: CapturedFrame,
        roi: Option<ScreenRegion>,
    ) -> Result<EncodedFrame> {
        if let Some(region) = roi {
            // Encode only ROI with higher quality
            let cropped = self.crop_frame(frame, region)?;
            self.roi_encoder.encode(cropped).await
        } else {
            // Encode full frame
            self.full_encoder.encode(frame).await
        }
    }
    
    fn crop_frame(&self, mut frame: CapturedFrame, region: ScreenRegion) -> Result<CapturedFrame> {
        // Adjust DMA-BUF offsets for region
        frame.width = region.width;
        frame.height = region.height;
        Ok(frame)
    }
}

Low Latency Optimization

Design Philosophy

To achieve 15-25ms latency on local networks, we prioritize:

  1. Speed over completeness: Fast, low-latency delivery is more important than perfect reliability
  2. Minimize buffering: Small buffers at every stage
  3. Zero-copy everywhere: Eliminate CPU memory copies
  4. Hardware acceleration: Use GPU for all intensive operations
  5. Predictive timing: Reduce wait times with accurate timing

1. Encoder Optimization

Hardware Encoder Configuration

pub struct LowLatencyEncoderConfig {
    // Codec settings
    pub codec: VideoCodec,
    
    // Low-latency specific
    pub gop_size: u32,           // Small GOP: 8-15 frames
    pub b_frames: u32,           // Zero B-frames for minimal latency
    pub max_b_frames: u32,       // Always 0 for low latency
    pub lookahead: u32,           // Minimal lookahead: 0-2 frames
    
    // Rate control
    pub rc_mode: RateControlMode, // CBR or VBR with strict constraints
    pub bitrate: u32,            // Adaptive bitrate
    pub max_bitrate: u32,        // Tight max constraint
    pub min_bitrate: u32,
    pub vbv_buffer_size: u32,    // Very small VBV buffer
    pub vbv_max_rate: u32,       // Close to bitrate
    
    // Timing
    pub fps: u32,                // Target FPS (30-60)
    pub intra_period: u32,       // Keyframe interval
    
    // Quality vs Latency trade-offs
    pub preset: EncoderPreset,   // Ultrafast/Fast
    pub tune: EncoderTune,       // zerolatency
    pub quality: u8,             // Constant quality (CRF) or CQ
}

pub enum VideoCodec {
    H264,    // Best compatibility, good latency
    H265,    // Better compression, slightly higher latency
    VP9,     // Open alternative
}

pub enum RateControlMode {
    CBR,     // Constant Bitrate - predictable
    VBR,     // Variable Bitrate - better quality
    CQP,     // Constant Quantizer - lowest latency
}

pub enum EncoderPreset {
    Ultrafast,  // Lowest latency, lower quality
    Superfast,
    Veryfast,   // Recommended for 15-25ms
    Faster,
}

pub enum EncoderTune {
    Zerolatency, // Mandatory for low latency
    Film,
    Animation,
}

VA-API (Intel/AMD) - For 15-25ms latency:

// libva-specific low-latency settings
VAConfigAttrib attribs[] = {
    {VAConfigAttribRTFormat, VA_RT_FORMAT_YUV420},
    {VAConfigAttribRateControl, VA_RC_CBR},
    {VAConfigAttribEncMaxRefFrames, {1, 0}},  // Min reference frames
    {VAConfigAttribEncPackedHeaders, VA_ENC_PACKED_HEADER_SEQUENCE},
};

VAEncSequenceParameterBufferH264 seq_param = {
    .intra_period = 15,           // Short GOP
    .ip_period = 1,               // No B-frames
    .bits_per_second = 4000000,
    .max_num_ref_frames = 1,     // Minimal references
    .time_scale = 90000,
    .num_units_in_tick = 1500,   // 60 FPS
};

VAEncPictureParameterBufferH264 pic_param = {
    .reference_frames = {
        {0, VA_FRAME_PICTURE},    // Single reference
    },
    .num_ref_idx_l0_active_minus1 = 0,
    .num_ref_idx_l1_active_minus1 = 0,
    .pic_fields.bits.idr_pic_flag = 0,
    .pic_fields.bits.reference_pic_flag = 1,
};

VAEncSliceParameterBufferH264 slice_param = {
    .num_ref_idx_l0_active_minus1 = 0,
    .num_ref_idx_l1_active_minus1 = 0,
    .disable_deblocking_filter_idc = 1,  // Faster
};

NVENC (NVIDIA) - For 15-25ms latency:

// NVENC low-latency configuration
let mut create_params = NV_ENC_INITIALIZE_PARAMS::default();
create_params.encodeGUID = NV_ENC_CODEC_H264_GUID;
create_params.presetGUID = NV_ENC_PRESET_P4_GUID;  // Low latency

let mut config = NV_ENC_CONFIG::default();
config.profileGUID = NV_ENC_H264_PROFILE_BASELINE_GUID;  // Faster encoding
config.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CBR;
config.rcParams.averageBitRate = 4000000;
config.rcParams.maxBitRate = 4000000;
config.rcParams.vbvBufferSize = 4000000;  // 1 second buffer
config.rcParams.vbvInitialDelay = 0;      // Minimal delay

let mut h264_config = unsafe { config.encodeCodecConfig.h264Config };
h264_config.enableIntraRefresh = 1;
h264_config.idrPeriod = 30;                // Keyframe every 30 frames
h264_config.repeatSPSPPS = 1;
h264_config.enableConstrainedEncoding = 1;
h264_config.frameNumD = 0;
h264_config.sliceMode = NV_ENC_SLICE_MODE_AUTOSELECT;

// Low-latency specific
h264_config.maxNumRefFrames = 1;          // Minimal references
h264_config.idrPeriod = 15;               // Shorter GOP

x264 (Software) - For 50-100ms latency:

// x264 parameters for low latency
let param = x264_param_t {
    i_width: 1920,
    i_height: 1080,
    i_fps_num: 60,
    i_fps_den: 1,
    
    // Rate control
    i_bitrate: 4000,                      // 4 Mbps
    i_keyint_max: 15,                     // Short GOP
    b_intra_refresh: 1,
    
    // Low latency
    b_repeat_headers: 1,
    b_annexb: 1,
    i_scenecut_threshold: 0,             // Disable scene detection
    
    // No B-frames for latency
    i_bframe: 0,
    i_bframe_adaptive: 0,
    i_bframe_pyramid: 0,
    
    // References
    i_frame_reference: 1,                 // Minimal references
    
    // Preset: ultrafast or superfast
    // This is set via preset function
};

// Apply preset
x264_param_apply_preset(&param, "superfast");
x264_param_apply_tune(&param, "zerolatency");

Dynamic Bitrate vs Latency Trade-offs

pub struct AdaptiveBitrateController {
    target_latency_ms: u32,
    current_bitrate: u32,
    frame_rate: u32,
    network_quality: NetworkQuality,
    buffer_depth_ms: u32,
}

pub struct NetworkQuality {
    bandwidth_mbps: f64,
    latency_ms: u32,
    packet_loss_rate: f64,
    jitter_ms: u32,
}

impl AdaptiveBitrateController {
    pub fn update_target_bitrate(&mut self, measured_latency_ms: u32) -> u32 {
        let latency_ratio = measured_latency_ms as f64 / self.target_latency_ms as f64;
        
        if latency_ratio > 1.5 {
            // Latency too high - reduce bitrate aggressively
            self.current_bitrate = (self.current_bitrate as f64 * 0.7) as u32;
        } else if latency_ratio > 1.2 {
            // Moderately high - reduce bitrate
            self.current_bitrate = (self.current_bitrate as f64 * 0.85) as u32;
        } else if latency_ratio < 0.8 {
            // Can increase bitrate
            self.current_bitrate = (self.current_bitrate as f64 * 1.1) as u32;
        }
        
        // Clamp to reasonable bounds
        self.current_bitrate = self.current_bitrate.clamp(1000000, 8000000);
        
        self.current_bitrate
    }
}

2. Capture Optimization

PipeWire DMA-BUF Zero-Copy

pub struct LowLatencyCaptureConfig {
    pub frame_rate: u32,           // 30-60 FPS
    pub zero_copy: bool,           // Always true
    pub track_damage: bool,        // Enable damage tracking
    pub partial_updates: bool,     // Encode only damaged regions
    pub buffer_pool_size: usize,   // Small pool: 3-5 buffers
}

pub struct DamageTracker {
    damaged_regions: VecDeque<ScreenRegion>,
    last_frame: Option<DmaBufHandle>,
    threshold: u32,                // Minimum change size to encode
}

impl DamageTracker {
    pub fn update(&mut self, new_frame: &CapturedFrame) -> Vec<ScreenRegion> {
        match &self.last_frame {
            Some(last) => {
                let regions = self.compute_damage_regions(last, new_frame);
                self.last_frame = Some(new_frame.dma_buf.clone());
                regions
            }
            None => {
                self.last_frame = Some(new_frame.dma_buf.clone());
                vec![ScreenRegion {
                    x: 0,
                    y: 0,
                    width: new_frame.width,
                    height: new_frame.height,
                }]
            }
        }
    }
    
    fn compute_damage_regions(&self, last: &DmaBufHandle, new: &CapturedFrame) -> Vec<ScreenRegion> {
        // Compare frames and find changed regions
        // This can be done efficiently with GPU
        // For MVP, we can use a simple block-based comparison
        
        // Block size for comparison (e.g., 16x16 pixels)
        let block_size = 16;
        let blocks_x = (new.width as usize + block_size - 1) / block_size;
        let blocks_y = (new.height as usize + block_size - 1) / block_size;
        
        // Merge adjacent damaged blocks into regions
        // ...
        
        vec![]  // Placeholder
    }
}

Partial Region Encoding

pub struct RegionEncoder {
    full_encoder: Box<dyn VideoEncoder>,
    tile_encoder: Box<dyn VideoEncoder>,
    current_regions: Vec<ScreenRegion>,
}

impl RegionEncoder {
    pub async fn encode_with_regions(
        &mut self,
        frame: CapturedFrame,
        regions: Vec<ScreenRegion>,
    ) -> Result<Vec<EncodedTile>> {
        let mut encoded_tiles = Vec::new();
        
        if regions.is_empty() || regions.len() > 4 {
            // Too many regions or no changes - encode full frame
            let encoded = self.full_encoder.encode(frame).await?;
            encoded_tiles.push(EncodedTile {
                region: ScreenRegion {
                    x: 0,
                    y: 0,
                    width: frame.width,
                    height: frame.height,
                },
                data: encoded.data,
                is_keyframe: encoded.is_keyframe,
            });
        } else {
            // Encode each damaged region separately
            for region in regions {
                let cropped = self.crop_frame(&frame, &region)?;
                let encoded = self.tile_encoder.encode(cropped).await?;
                
                encoded_tiles.push(EncodedTile {
                    region,
                    data: encoded.data,
                    is_keyframe: encoded.is_keyframe,
                });
            }
        }
        
        Ok(encoded_tiles)
    }
    
    fn crop_frame(&self, frame: &CapturedFrame, region: &ScreenRegion) -> Result<CapturedFrame> {
        // Adjust DMA-BUF offsets for the region
        // This is a zero-copy operation - just metadata changes
        
        Ok(CapturedFrame {
            dma_buf: DmaBufHandle::from_region(&frame.dma_buf, region)?,
            width: region.width,
            height: region.height,
            format: frame.format,
            timestamp: frame.timestamp,
        })
    }
}

3. WebRTC Transport Layer Optimization

Low-Latency WebRTC Configuration

pub struct LowLatencyWebRtcConfig {
    // ICE and transport
    pub ice_transport_policy: IceTransportPolicy,
    pub ice_servers: Vec<IceServer>,
    
    // Media settings
    pub video_codecs: Vec<VideoCodecConfig>,
    pub max_bitrate: u32,
    pub min_bitrate: u32,
    pub start_bitrate: u32,
    
    // Buffering - minimize for low latency
    pub playout_delay_min_ms: u16,    // 0-10ms (default 50ms)
    pub playout_delay_max_ms: u16,    // 10-20ms (default 200ms)
    
    // Packetization
    pub rtp_payload_size: u16,        // Smaller packets: 1200 bytes
    pub packetization_mode: PacketizationMode,
    
    // Feedback and retransmission
    pub nack_enabled: bool,           // Limited NACK
    pub fec_enabled: bool,            // Disable FEC for latency
    pub transport_cc_enabled: bool,   // Congestion control
    
    // RTCP settings
    pub rtcp_report_interval_ms: u32,  // Frequent: 50-100ms
}

pub struct VideoCodecConfig {
    pub name: String,
    pub clock_rate: u32,
    pub num_channels: u16,
    pub parameters: CodecParameters,
}

impl LowLatencyWebRtcConfig {
    pub fn for_ultra_low_latency() -> Self {
        Self {
            ice_transport_policy: IceTransportPolicy::All,
            ice_servers: vec![],
            
            video_codecs: vec![
                VideoCodecConfig {
                    name: "H264".to_string(),
                    clock_rate: 90000,
                    num_channels: 1,
                    parameters: CodecParameters {
                        profile_level_id: "42e01f".to_string(),  // Baseline profile
                        packetization_mode: 1,
                        level_asymmetry_allowed: 1,
                    },
                },
            ],
            
            max_bitrate: 8000000,      // 8 Mbps max
            min_bitrate: 500000,       // 500 Kbps min
            start_bitrate: 4000000,    // 4 Mbps start
            
            // Critical: Minimal playout delay
            playout_delay_min_ms: 0,   // No minimum
            playout_delay_max_ms: 20,  // 20ms maximum
            
            // Smaller packets for lower serialization latency
            rtp_payload_size: 1200,
            packetization_mode: PacketizationMode::NonInterleaved,
            
            // Limited retransmission
            nack_enabled: true,        // But limit retransmission window
            fec_enabled: false,        // Disable FEC - adds latency
            transport_cc_enabled: true,
            
            // More frequent RTCP feedback
            rtcp_report_interval_ms: 50,
        }
    }
}

Packet Loss Handling Strategy

pub enum LossHandlingStrategy {
    PreferLatency,    // Drop late frames, prioritize low latency
    PreferQuality,     // Retransmit, prioritize quality
    Balanced,          // Adaptive based on network conditions
}

pub struct PacketLossHandler {
    strategy: LossHandlingStrategy,
    max_retransmission_delay_ms: u32,
    nack_window_size: u32,
}

impl PacketLossHandler {
    pub fn handle_packet_loss(
        &mut self,
        sequence_number: u16,
        now_ms: u64,
    ) -> RetransmissionDecision {
        match self.strategy {
            LossHandlingStrategy::PreferLatency => {
                // Don't retransmit if too old
                if now_ms > self.max_retransmission_delay_ms as u64 {
                    RetransmissionDecision::Drop
                } else {
                    RetransmissionDecision::None
                }
            }
            LossHandlingStrategy::PreferQuality => {
                // Always try to retransmit
                RetransmissionDecision::Request(sequence_number)
            }
            LossHandlingStrategy::Balanced => {
                // Adaptive based on loss rate
                RetransmissionDecision::None  // Placeholder
            }
        }
    }
}

pub enum RetransmissionDecision {
    Request(u16),
    Drop,
    None,
}

NACK vs FEC Selection

Recommendation for 15-25ms latency:

  • Primary: Limited NACK

    • NACK window: 1-2 frames (16-33ms at 60fps)
    • Max retransmission delay: 20ms
    • Only retransmit keyframes or critical packets
  • Avoid FEC:

    • Forward Error Correction adds significant latency
    • With low-loss LAN, FEC overhead outweighs benefits
    • Use NACK selectively instead
pub struct NackController {
    window_size_ms: u32,           // 20ms window
    max_nack_packets_per_second: u32,
    nack_list: VecDeque<(u16, u64)>, // (seq_num, timestamp_ms)
}

impl NackController {
    pub fn should_send_nack(&self, seq_num: u16, now_ms: u64) -> bool {
        // Check if packet is within NACK window
        if let Some(&(_, oldest_ts)) = self.nack_list.front() {
            if now_ms - oldest_ts > self.window_size_ms as u64 {
                return false;  // Too old
            }
        }
        
        true
    }
}

4. Frame Rate and Buffer Strategy

Dynamic Frame Rate Adjustment

pub struct FrameRateController {
    target_fps: u32,              // Desired FPS (30-60)
    current_fps: u32,
    frame_times: VecDeque<Instant>,
    last_frame_time: Instant,
    min_interval: Duration,       // 1 / max_fps
}

impl FrameRateController {
    pub fn new(target_fps: u32) -> Self {
        let min_interval = Duration::from_micros(1_000_000 / 60);  // Max 60 FPS
        
        Self {
            target_fps,
            current_fps: 30,
            frame_times: VecDeque::with_capacity(60),
            last_frame_time: Instant::now(),
            min_interval,
        }
    }
    
    pub fn should_capture(&mut self) -> bool {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_frame_time);
        
        if elapsed < self.min_interval {
            return false;  // Too soon
        }
        
        // Update frame rate based on conditions
        self.adjust_fps_based_on_conditions();
        
        self.last_frame_time = now;
        true
    }
    
    pub fn adjust_fps_based_on_conditions(&mut self) {
        // Check system load, network conditions, etc.
        let system_load = self.get_system_load();
        let network_quality = self.get_network_quality();
        
        if system_load > 0.8 || network_quality.is_poor() {
            self.current_fps = 30;  // Reduce frame rate
        } else if system_load < 0.5 && network_quality.is_excellent() {
            self.current_fps = 60;  // Increase frame rate
        } else {
            self.current_fps = 45;  // Balanced
        }
    }
}

Fast Frame Dropping Strategy

pub struct FrameDropper {
    target_fps: u32,
    adaptive_drop_threshold_ms: u32,
    consecutive_drops: u32,
    max_consecutive_drops: u32,
}

impl FrameDropper {
    pub fn should_drop(&mut self, queue_latency_ms: u32) -> bool {
        if queue_latency_ms > self.adaptive_drop_threshold_ms {
            if self.consecutive_drops < self.max_consecutive_drops {
                self.consecutive_drops += 1;
                return true;
            }
        }
        
        self.consecutive_drops = 0;
        false
    }
    
    pub fn get_drop_interval(&self) -> u32 {
        // Calculate how many frames to drop
        match self.target_fps {
            60 => 1,   // Drop 1 out of every 2
            30 => 1,   // Drop 1 out of every 2
            _ => 0,
        }
    }
}

Minimal Buffering

Sender Side:

pub struct SenderBuffer {
    max_size_frames: usize,        // Very small: 1-2 frames
    queue: VecDeque<EncodedFrame>,
    target_latency_ms: u32,
}

impl SenderBuffer {
    pub fn new() -> Self {
        Self {
            max_size_frames: 1,      // Single frame buffer
            queue: VecDeque::with_capacity(2),
            target_latency_ms: 5,    // 5ms target
        }
    }
    
    pub fn push(&mut self, frame: EncodedFrame) -> Result<()> {
        if self.queue.len() >= self.max_size_frames {
            // Drop oldest frame to maintain low latency
            self.queue.pop_front();
        }
        
        self.queue.push_back(frame);
        Ok(())
    }
    
    pub fn pop(&mut self) -> Option<EncodedFrame> {
        self.queue.pop_front()
    }
}

Receiver Side (Jitter Buffer):

pub struct MinimalJitterBuffer {
    target_delay_ms: u32,           // 0-10ms
    min_delay_ms: u32,             // 0ms
    max_delay_ms: u32,             // 10-20ms
    packets: VecDeque<RtpPacket>,
}

impl MinimalJitterBuffer {
    pub fn new() -> Self {
        Self {
            target_delay_ms: 5,     // 5ms target
            min_delay_ms: 0,        // No minimum
            max_delay_ms: 10,      // 10ms maximum
            packets: VecDeque::with_capacity(10),
        }
    }
    
    pub fn push(&mut self, packet: RtpPacket) {
        if self.packets.len() < self.max_delay_ms as usize / 2 {
            self.packets.push_back(packet);
        } else {
            // Buffer full - drop oldest
            self.packets.pop_front();
            self.packets.push_back(packet);
        }
    }
    
    pub fn pop(&mut self) -> Option<RtpPacket> {
        self.packets.pop_front()
    }
}

5. Architecture Adjustments

Single-Threaded vs Multi-Threaded

Recommendation: Hybrid Approach

  • Capture Thread: Dedicated thread for PipeWire
  • Encoder Thread: Per-session encoder thread
  • Network Thread: WebRTC transport thread
  • Coordination: Lock-free channels for data passing
pub struct PipelineArchitecture {
    capture_thread: JoinHandle<()>,
    encoder_threads: Vec<JoinHandle<()>>,
    network_thread: JoinHandle<()>,
    
    // Lock-free communication
    capture_to_encoder: async_channel::Sender<CapturedFrame>,
    encoder_to_network: async_channel::Sender<EncodedFrame>,
}

Lock Competition Minimization

// Use lock-free data structures where possible
use crossbeam::queue::SegQueue;
use crossbeam::channel::{bounded, unbounded};

pub struct LockFreeFrameQueue {
    queue: SegQueue<CapturedFrame>,
    max_size: usize,
}

impl LockFreeFrameQueue {
    pub fn push(&self, frame: CapturedFrame) -> Result<()> {
        if self.queue.len() >= self.max_size {
            return Err(Error::QueueFull);
        }
        self.queue.push(frame);
        Ok(())
    }
    
    pub fn pop(&self) -> Option<CapturedFrame> {
        self.queue.pop()
    }
}

Async Task Scheduling

pub struct LowLatencyScheduler {
    capture_priority: TaskPriority,
    encode_priority: TaskPriority,
    network_priority: TaskPriority,
}

impl LowLatencyScheduler {
    pub async fn schedule_pipeline(&self) {
        tokio::spawn_with_priority(TaskPriority::High, async move {
            // Critical path: capture -> encode -> send
        });
        
        tokio::spawn_with_priority(TaskPriority::Medium, async move {
            // Background tasks: statistics, logging
        });
    }
}

6. Technology Stack Adjustments

Encoder Selection for Latency

Encoder Setup Latency Per-Frame Latency Quality Recommendation
VA-API H.264 1-2ms 2-3ms Medium Primary (Linux)
NVENC H.264 1-2ms 1-2ms High Primary (NVIDIA)
x264 (ultrafast) 0ms 5-8ms Low Fallback
x264 (superfast) 0ms 8-12ms Medium Fallback

Recommendation:

  • Primary: VA-API or NVENC H.264 with ultrafast preset
  • Fallback: x264 with ultrafast preset (accept 30-50ms latency)

Direct Wayland vs PipeWire

Use PipeWire (recommended):

  • Better DMA-BUF support
  • Hardware acceleration integration
  • Zero-copy through ecosystem

Direct Wayland (if needed):

  • Lower-level control
  • Potentially lower capture latency (0.5-1ms)
  • More complex implementation
  • No portal integration (security issue)

Recommendation: Stick with PipeWire for MVP. Consider direct Wayland only if PipeWire latency is unacceptable.

webrtc-rs Latency Characteristics

Pros:

  • Pure Rust, predictable behavior
  • Good zero-copy support
  • Customizable buffering

Cons:

  • May have default buffer settings optimized for reliability
  • Need manual configuration for ultra-low latency

Custom WebRTC Layer (advanced):

  • Full control over buffering and timing
  • Can inline packetization
  • More complex implementation

Recommendation: Use webrtc-rs with low-latency configuration. Only consider custom layer if webrtc-rs cannot achieve targets.

7. Implementation Priority

P0 (Must-Have for MVP)

  1. Hardware Encoder Integration

    • VA-API H.264 with low-latency settings
    • No B-frames, small GOP (15 frames)
    • Ultrafast preset
  2. DMA-BUF Zero-Copy

    • PipeWire DMA-BUF import
    • Direct encoder feed
    • No CPU copies
  3. Minimal Buffering

    • Single frame sender buffer
    • 0-5ms jitter buffer
    • Fast frame dropping
  4. Low-Latency WebRTC Config

    • playout_delay_min: 0ms
    • playout_delay_max: 20ms
    • Disable FEC

P1 (Important for 15-25ms)

  1. Damage Tracking

    • Partial region updates
    • Reduced encoding load
  2. Dynamic Frame Rate

    • 30-60 FPS adaptation
    • Network-aware
  3. NACK Control

    • Limited retransmission window (20ms)
    • Selective NACK

P2 (Nice-to-Have)

  1. Direct Wayland Capture

    • If PipeWire latency insufficient
  2. Custom WebRTC Layer

    • If webrtc-rs insufficient
  3. Advanced Congestion Control

    • SCReAM or Google Congestion Control

8. Testing and Validation

End-to-End Latency Measurement

pub struct LatencyMeter {
    timestamps: VecDeque<(u64, LatencyStage)>,
}

pub enum LatencyStage {
    Capture,
    EncodeStart,
    EncodeEnd,
    Packetize,
    NetworkSend,
    NetworkReceive,
    Depacketize,
    DecodeStart,
    DecodeEnd,
    Display,
}

impl LatencyMeter {
    pub fn mark(&mut self, stage: LatencyStage) {
        let now = timestamp_ns();
        self.timestamps.push_back((now, stage));
    }
    
    pub fn calculate_total_latency(&self) -> Duration {
        if self.timestamps.len() < 2 {
            return Duration::ZERO;
        }
        
        let first = self.timestamps.front().unwrap().0;
        let last = self.timestamps.back().unwrap().0;
        Duration::from_nanos(last - first)
    }
}

Measurement Method:

  1. Timestamp Injection

    • Inject frame ID at capture (visible timestamp on screen)
    • Capture at client with camera
    • Compare timestamps to calculate round-trip
    • Divide by 2 for one-way latency
  2. Network Timestamping

    • Add frame capture time in RTP header extension
    • Measure at receiver
    • Account for clock skew
  3. Hardware Timestamping

    • Use kernel packet timestamps (SO_TIMESTAMPING)
    • Hardware NIC timestamps if available

Performance Benchmarking

#[bench]
fn bench_full_pipeline_latency(b: &mut Bencher) {
    let mut pipeline = LowLatencyPipeline::new(config).unwrap();
    let mut latencies = Vec::new();
    
    b.iter(|| {
        let start = Instant::now();
        
        let frame = pipeline.capture().unwrap();
        let encoded = pipeline.encode(frame).unwrap();
        pipeline.send(encoded).unwrap();
        
        latencies.push(start.elapsed());
    });
    
    let avg_latency = latencies.iter().sum::<Duration>() / latencies.len() as u32;
    println!("Average latency: {:?}", avg_latency);
}

Target Benchmarks:

Metric Target Acceptable
Capture latency 2-3ms <5ms
Encode latency 3-5ms <8ms
Packetize latency 1-2ms <3ms
Network (LAN) 0.5-1ms <2ms
Decode latency 1-2ms <4ms
Display latency 1-2ms <4ms
Total 15-25ms <30ms

Tuning Strategy

  1. Baseline Measurement

    • Measure each stage individually
    • Identify bottlenecks
  2. Iterative Tuning

    • Tune one parameter at a time
    • Measure impact on total latency
    • Trade off quality if needed
  3. Validation

    • Test under various network conditions
    • Test under system load
    • Test with different content (static, dynamic)
  4. Continuous Monitoring

    • Track latency in production
    • Alert on degradation
    • Adaptive adjustments

Implementation Roadmap (Updated for Low Latency)

Phase 1: MVP (Minimum Viable Product) - 4-6 weeks

Goal: Basic screen capture and WebRTC streaming

Week 1-2: Core Infrastructure

  • Project setup (Cargo.toml, directory structure)
  • Tokio async runtime setup
  • Error handling framework (anyhow/thiserror)
  • Logging setup (tracing)
  • Configuration management

Week 2-3: Wayland Capture

  • PipeWire xdg-desktop-portal integration
  • Basic screen capture (single monitor)
  • DMA-BUF import/export
  • Frame receiver channel

Week 3-4: Simple Encoding

  • x264 software encoder (fallback)
  • Basic frame pipeline (capture → encode)
  • Frame rate limiting

Week 4-5: WebRTC Transport

  • webrtc-rs integration
  • Basic peer connection
  • Video track setup
  • Simple signaling (WebSocket)

Week 5-6: Testing & Integration

  • End-to-end test (Wayland → WebRTC → Browser)
  • Performance benchmarking
  • Bug fixes

MVP Deliverables:

  • Working screen capture
  • WebRTC streaming to browser
  • 15-30 FPS at 720p
  • x264 encoding (software)

Phase 2: Hardware Acceleration - 3-4 weeks

Goal: GPU-accelerated encoding for better performance

Week 1-2: VA-API Integration

  • VA-API encoder implementation
  • DMA-BUF to VA-API surface import
  • H.264 encoding
  • Intel/AMD GPU support

Week 2-3: NVENC Integration

  • NVENC encoder for NVIDIA GPUs
  • CUDA memory management
  • NVENC H.264 encoding

Week 3-4: Encoder Selection

  • Encoder detection and selection
  • Fallback chain (NVENC → VA-API → x264)
  • Encoder switching at runtime

Phase 2 Deliverables:

  • GPU-accelerated encoding
  • 30-60 FPS at 1080p
  • Lower CPU usage
  • Adaptive encoder selection

Phase 3: Low Latency Optimization - 4-5 weeks

Goal: Achieve 25-35ms latency on local networks

Week 1: Encoder Low-Latency Configuration (P0)

  • Configure VA-API/NVENC for <5ms encoding
  • Disable B-frames, set GOP to 15 frames
  • Implement CBR rate control with small VBV buffer
  • Tune encoder preset (ultrafast/superfast)
  • Measure encoder latency independently

Week 2: Minimal Buffering (P0)

  • Reduce sender buffer to 1 frame
  • Implement 0-10ms jitter buffer
  • Configure WebRTC playout delay (0-20ms)
  • Disable FEC for latency
  • Test end-to-end latency

Week 3: Damage Tracking & Partial Updates (P1)

  • Implement region change detection
  • Add partial region encoding
  • Optimize for static content
  • Benchmark latency improvements

Week 4: Dynamic Frame Rate & Quality (P1)

  • Implement adaptive frame rate (30-60fps)
  • Network quality detection
  • Dynamic bitrate vs latency trade-off
  • Fast frame dropping under load

Week 5: Advanced Optimizations (P1/P2)

  • Limited NACK window (20ms)
  • Selective packet retransmission
  • RTCP fine-tuning (50ms intervals)
  • Performance profiling
  • Final latency tuning

Phase 3 Deliverables:

  • 25-35ms latency on LAN
  • Zero-copy DMA-BUF pipeline
  • Hardware encoder with low-latency config
  • Minimal buffering throughout pipeline
  • Adaptive quality based on conditions

Phase 4: Production Ready with Ultra Low Latency - 5-7 weeks

Goal: Achieve 15-25ms latency while ensuring security, reliability, and deployment readiness

Week 1-2: Ultra Low Latency Tuning (P1/P2)

  • Direct Wayland capture evaluation (if needed)
  • Custom WebRTC layer evaluation (if needed)
  • Advanced congestion control (SCReAM/Google CC)
  • Kernel bypass optimization (DPDK/AF_XDP if needed)
  • Final latency optimization and tuning

Week 2-3: Security

  • Authentication (JWT, OAuth)
  • Encryption (TLS, DTLS)
  • Session management
  • Access control
  • Security audit and penetration testing

Week 3-4: Reliability

  • Error recovery
  • Connection health monitoring
  • Automatic reconnection
  • Graceful degradation with latency awareness
  • Failover mechanisms

Week 4-5: Monitoring & Debugging

  • Real-time latency metrics collection
  • Per-stage latency tracking
  • Logging improvements
  • Debug mode with frame inspection
  • Performance dashboard with latency visualization
  • Alerting for latency degradation

Week 5-6: Deployment

  • Docker containerization
  • Systemd service
  • Configuration file with low-latency presets
  • Installation scripts
  • Performance tuning documentation

Week 6-7: Testing

  • Integration tests
  • Load testing with latency monitoring
  • Cross-browser testing
  • Long-running stability tests
  • Latency regression tests
  • Automated performance benchmarks

Phase 4 Deliverables:

  • 15-25ms latency on LAN
  • Production-ready deployment
  • Security features
  • Monitoring and observability
  • Comprehensive testing
  • Latency regression testing

Phase 5: Advanced Features (Optional) - Ongoing

Potential Features:

  • Audio capture and streaming
  • Bidirectional input (mouse, keyboard)
  • Clipboard sharing
  • File transfer
  • Recording (save sessions)
  • Multi-user sessions
  • Mobile client support
  • WebRTC data channels for control
  • WebRTC insertable streams (client-side effects)
  • Adaptive resolution
  • H.265/HEVC encoding
  • AV1 encoding
  • Screen region selection
  • Virtual display support
  • Wayland virtual pointer protocol

Testing Strategy

Unit Tests

#[cfg(test)]
mod tests {
    use super::*;
    
    #[tokio::test]
    async fn test_dma_buf_lifecycle() {
        let handle = DmaBufHandle::new(/* ... */);
        assert_eq!(handle.ref_count(), 1);
        
        let handle2 = handle.clone();
        assert_eq!(handle.ref_count(), 2);
        
        drop(handle);
        assert_eq!(handle2.ref_count(), 1);
        
        drop(handle2);
        // Buffer freed
    }
    
    #[tokio::test]
    async fn test_encoder_pipeline() {
        let config = EncoderConfig {
            encoder_type: EncoderType::H264_X264,
            bitrate: 2_000_000,
            keyframe_interval: 30,
            preset: EncodePreset::Fast,
        };
        
        let mut encoder = X264Encoder::new(config).unwrap();
        
        let frame = create_test_frame(1920, 1080);
        let encoded = encoder.encode(frame).await.unwrap();
        
        assert!(!encoded.data.is_empty());
        assert!(encoded.is_keyframe);
    }
}

Integration Tests

#[tokio::test]
async fn test_full_pipeline() {
    // Setup
    let capture = WaylandCapture::new(CaptureConfig::default()).await.unwrap();
    let encoder = VaapiEncoder::new(EncoderConfig::default()).unwrap();
    let webrtc = WebRtcServer::new(WebRtcConfig::default()).await.unwrap();
    
    // Run pipeline for 100 frames
    for _ in 0..100 {
        let frame = capture.next_frame().await;
        let encoded = encoder.encode(frame).await.unwrap();
        webrtc.send_video_frame("test-session", encoded).await.unwrap();
    }
    
    // Verify
    assert_eq!(webrtc.frames_sent(), 100);
}

Load Testing

# Simulate 10 concurrent connections
for i in {1..10}; do
    cargo test test_full_pipeline --release &
done
wait

Performance Benchmarks

#[bench]
fn bench_encode_frame(b: &mut Bencher) {
    let mut encoder = X264Encoder::new(config).unwrap();
    let frame = create_test_frame(1920, 1080);
    
    b.iter(|| {
        encoder.encode(frame.clone()).unwrap()
    });
}

Potential Challenges & Solutions

1. Wayland Protocol Limitations

Challenge: Wayland's security model restricts screen capture

Solution:

  • Use xdg-desktop-portal for permission management
  • Implement user prompts for capture authorization
  • Support multiple portal backends (GNOME, KDE, etc.)
pub async fn request_capture_permission() -> Result<bool> {
    let portal = Portal::new().await?;
    let session = portal.create_session(ScreenCaptureType::Monitor).await?;
    
    // User will see a dialog asking for permission
    let sources = portal.request_sources(&session).await?;
    
    Ok(!sources.is_empty())
}

Alternative: Use PipeWire directly with proper authentication

2. Hardware Acceleration Compatibility

Challenge: Different GPUs require different APIs (VA-API, NVENC, etc.)

Solution:

  • Implement multiple encoder backends
  • Runtime detection of available encoders
  • Graceful fallback to software encoding
pub fn detect_best_encoder() -> EncoderType {
    // Try NVENC first (NVIDIA)
    if nvenc::is_available() {
        return EncoderType::H264_NVENC;
    }
    
    // Try VA-API (Intel/AMD)
    if vaapi::is_available() {
        return EncoderType::H264_VAAPI;
    }
    
    // Fallback to software
    EncoderType::H264_X264
}

3. Cross-Browser WebRTC Compatibility

Challenge: Different browsers have different WebRTC implementations

Solution:

  • Use standardized codecs (H.264, VP8, VP9)
  • Implement codec negotiation
  • Provide fallback options
pub fn get_supported_codecs() -> Vec<RTCRtpCodecCapability> {
    vec![
        RTCRtpCodecCapability {
            mime_type: "video/H264".to_string(),
            clock_rate: 90000,
            ..Default::default()
        },
        RTCRtpCodecCapability {
            mime_type: "video/VP9".to_string(),
            clock_rate: 90000,
            ..Default::default()
        },
    ]
}

4. Security and Authentication

Challenge: Secure remote access without exposing desktop to unauthorized users

Solution:

  • Implement JWT-based authentication
  • Use DTLS for media encryption
  • Add rate limiting and access control
pub struct AuthManager {
    secret: String,
    sessions: Arc<Mutex<HashMap<String, Session>>>,
}

impl AuthManager {
    pub fn create_token(&self, user_id: &str) -> Result<String> {
        let claims = Claims {
            sub: user_id.to_string(),
            exp: Utc::now() + chrono::Duration::hours(1),
        };
        
        encode(&Header::default(), &claims, &EncodingKey::from_secret(self.secret.as_ref()))
    }
    
    pub fn validate_token(&self, token: &str) -> Result<Claims> {
        decode::<Claims>(
            token,
            &DecodingKey::from_secret(self.secret.as_ref()),
            &Validation::default(),
        )
        .map(|data| data.claims)
        .map_err(|_| AuthError::InvalidToken)
    }
}

5. Memory Management

Challenge: Avoid memory leaks with DMA-BUF and shared memory

Solution:

  • Use Rust's ownership system
  • RAII patterns for resource cleanup
  • Buffer pools with limits
pub struct ScopedDmaBuf {
    handle: DmaBufHandle,
}

impl Drop for ScopedDmaBuf {
    fn drop(&mut self) {
        // Automatically release DMA-BUF
        // File descriptor closed
        // GPU memory freed
    }
}

// Usage ensures cleanup
{
    let buf = ScopedDmaBuf::new(/* ... */);
    // Use buffer
} // Automatically dropped here

6. Latency Optimization

Challenge: Minimize end-to-end latency

Solution:

  • Zero-copy pipeline
  • Hardware acceleration
  • Adaptive quality
  • Frame skipping
pub struct LatencyOptimizer {
    target_latency_ms: u32,
    current_latency_ms: u32,
}

impl LatencyOptimizer {
    pub fn adjust_parameters(&mut self) {
        if self.current_latency_ms > self.target_latency_ms {
            // Reduce quality to improve latency
            self.reduce_bitrate();
            self.increase_frame_skipping();
        } else {
            // Increase quality
            self.increase_bitrate();
        }
    }
}

7. Network Conditions

Challenge: Varying bandwidth and network conditions

Solution:

  • Adaptive bitrate streaming
  • Multiple quality presets
  • Congestion control
pub struct BandwidthMonitor {
    measurements: VecDeque<u32>,
    window_size: usize,
}

impl BandwidthMonitor {
    pub fn update(&mut self, bytes_sent: u32, duration: Duration) {
        let bandwidth = bytes_sent * 8 / duration.as_secs() as u32;
        self.measurements.push_back(bandwidth);
        
        if self.measurements.len() > self.window_size {
            self.measurements.pop_front();
        }
    }
    
    pub fn average_bandwidth(&self) -> u32 {
        if self.measurements.is_empty() {
            return 0;
        }
        
        self.measurements.iter().sum::<u32>() / self.measurements.len() as u32
    }
}

8. Cross-Platform Compatibility

Challenge: Support different Linux distributions and desktop environments

Solution:

  • Containerize application
  • Detect available technologies at runtime
  • Provide fallback options
pub fn detect_desktop_environment() -> DesktopEnvironment {
    if std::path::Path::new("/usr/bin/gnome-shell").exists() {
        DesktopEnvironment::GNOME
    } else if std::path::Path::new("/usr/bin/plasmashell").exists() {
        DesktopEnvironment::KDE
    } else {
        DesktopEnvironment::Other
    }
}

pub fn configure_portal_for_env(env: DesktopEnvironment) -> PortalConfig {
    match env {
        DesktopEnvironment::GNOME => PortalConfig::gnome(),
        DesktopEnvironment::KDE => PortalConfig::kde(),
        DesktopEnvironment::Other => PortalConfig::generic(),
    }
}

9. Debugging and Troubleshooting

Challenge: Debugging complex pipeline with multiple components

Solution:

  • Comprehensive logging
  • Metrics collection
  • Debug mode with frame inspection
pub struct DebugLogger {
    enabled: bool,
    output: DebugOutput,
}

pub enum DebugOutput {
    Console,
    File(PathBuf),
    Both,
}

impl DebugLogger {
    pub fn log_frame(&self, frame: &CapturedFrame) {
        if !self.enabled {
            return;
        }
        
        tracing::debug!(
            "Frame: {}x{}, format: {:?}, timestamp: {}",
            frame.width,
            frame.height,
            frame.format,
            frame.timestamp
        );
    }
    
    pub fn log_encoding(&self, encoded: &EncodedFrame) {
        if !self.enabled {
            return;
        }
        
        tracing::debug!(
            "Encoded: {} bytes, keyframe: {}, seq: {}",
            encoded.data.len(),
            encoded.is_keyframe,
            encoded.sequence_number
        );
    }
}

10. Resource Limits

Challenge: Prevent resource exhaustion (CPU, memory, GPU)

Solution:

  • Limit concurrent sessions
  • Monitor resource usage
  • Implement graceful degradation
pub struct ResourceManager {
    max_sessions: usize,
    active_sessions: Arc<Mutex<HashSet<String>>>,
    cpu_threshold: f32,
    memory_threshold: u64,
}

impl ResourceManager {
    pub async fn can_create_session(&self) -> bool {
        let sessions = self.active_sessions.lock().await;
        
        if sessions.len() >= self.max_sessions {
            return false;
        }
        
        if self.cpu_usage() > self.cpu_threshold {
            return false;
        }
        
        if self.memory_usage() > self.memory_threshold {
            return false;
        }
        
        true
    }
}

Code Examples

Main Application Entry Point

// src/main.rs
mod capture;
mod encoder;
mod webrtc;
mod buffer;
mod ipc;
mod config;

use anyhow::Result;
use tracing::{info, error};
use tracing_subscriber;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize logging
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::INFO)
        .init();
    
    // Load configuration
    let config = config::load_config("config.toml")?;
    
    info!("Starting Wayland WebRTC Backend");
    info!("Configuration: {:?}", config);
    
    // Initialize components
    let capture = capture::WaylandCapture::new(config.capture).await?;
    let encoder = encoder::create_encoder(config.encoder)?;
    let webrtc = webrtc::WebRtcServer::new(config.webrtc).await?;
    
    // Create video track
    let video_track = webrtc::create_video_track()?;
    
    // Run capture pipeline
    let session_id = uuid::Uuid::new_v4().to_string();
    webrtc.create_peer_connection(session_id.clone(), video_track).await?;
    
    // Main loop
    loop {
        match capture.next_frame().await {
            Ok(frame) => {
                match encoder.encode(frame).await {
                    Ok(encoded) => {
                        if let Err(e) = webrtc.send_video_frame(&session_id, encoded).await {
                            error!("Failed to send frame: {}", e);
                        }
                    }
                    Err(e) => {
                        error!("Failed to encode frame: {}", e);
                    }
                }
            }
            Err(e) => {
                error!("Failed to capture frame: {}", e);
            }
        }
    }
}

Configuration Example

# config.toml - Low Latency Configuration

[server]
address = "0.0.0.0:8080"
max_sessions = 10
log_level = "info"

[capture]
frame_rate = 60
quality = "high"
screen_region = null
# screen_region = { x = 0, y = 0, width = 1920, height = 1080 }

# Low-latency capture settings
zero_copy = true  # Always use DMA-BUF zero-copy
track_damage = true  # Enable damage tracking
partial_updates = true  # Encode only damaged regions
buffer_pool_size = 3  # Small buffer pool for low latency

[encoder]
type = "auto"
# Options: "vaapi", "nvenc", "x264", "auto"
bitrate = 4000000
keyframe_interval = 15  # Short GOP for low latency
preset = "ultrafast"  # Ultrafast for minimal latency

# Low-latency specific settings
[encoder.low_latency]
gop_size = 15  # Short GOP
b_frames = 0  # No B-frames for low latency
lookahead = 0  # Minimal lookahead
rc_mode = "CBR"  # Constant bitrate for predictable latency
vbv_buffer_size = 4000000  # 1 second VBV buffer
max_bitrate = 4000000  # Tight max constraint
intra_period = 15  # Keyframe interval

# Quality vs latency presets
[encoder.quality_presets]
low = { bitrate = 1000000, preset = "ultrafast", gop_size = 30 }
medium = { bitrate = 2000000, preset = "ultrafast", gop_size = 20 }
high = { bitrate = 4000000, preset = "ultrafast", gop_size = 15 }
ultra = { bitrate = 8000000, preset = "veryfast", gop_size = 15 }

[webrtc]
stun_servers = ["stun:stun.l.google.com:19302"]
turn_servers = []

[webrtc.low_latency]
# Critical: Minimal playout delay
playout_delay_min_ms = 0  # No minimum delay
playout_delay_max_ms = 20  # 20ms maximum delay

# Bitrate settings
max_bitrate = 8000000  # 8 Mbps max
min_bitrate = 500000  # 500 Kbps min
start_bitrate = 4000000  # 4 Mbps start

# Packetization
rtp_payload_size = 1200  # Smaller packets for lower latency
packetization_mode = "non_interleaved"

# Retransmission settings
nack_enabled = true  # Enable NACK
nack_window_size_ms = 20  # Only request retransmission for recent packets
max_nack_packets_per_second = 50
fec_enabled = false  # Disable FEC for low latency

# Congestion control
transport_cc_enabled = true  # Enable transport-wide congestion control

# RTCP settings
rtcp_report_interval_ms = 50  # Frequent RTCP feedback

[webrtc.ice]
transport_policy = "all"
candidate_network_types = ["udp", "tcp"]

[buffer]
# Minimal buffering for low latency
dma_buf_pool_size = 3  # Small pool
encoded_buffer_pool_size = 5  # Very small pool
sender_buffer_max_frames = 1  # Single frame sender buffer
jitter_buffer_target_delay_ms = 5  # 5ms target jitter buffer
jitter_buffer_max_delay_ms = 10  # 10ms maximum jitter buffer

[latency]
# Latency targets
target_latency_ms = 25  # Target end-to-end latency
max_acceptable_latency_ms = 50  # Maximum acceptable latency

# Adaptive settings
adaptive_frame_rate = true
adaptive_bitrate = true
fast_frame_drop = true

# Frame dropping strategy
consecutive_drop_limit = 3  # Max consecutive drops
drop_threshold_ms = 5  # Drop if queue latency exceeds this

[monitoring]
enable_metrics = true
enable_latency_tracking = true
metrics_port = 9090

Performance Targets

Latency Targets

Local Network (LAN)

Scenario Target Acceptable
Core Target 25-35ms 15-25ms (Excellent)
Minimum <50ms 15-25ms (Excellent)
User Experience <16ms: Imperceptible 16-33ms: Very Smooth
User Experience 33-50ms: Good 50-100ms: Acceptable

Internet

Scenario Target Acceptable
Excellent 40-60ms <80ms
Good 60-80ms <100ms

Performance Metrics by Phase

Metric MVP Phase 2 Phase 3 Phase 4
FPS (LAN) 30 60 60 60
FPS (Internet) 15-20 30 30-60 60
Resolution 720p 1080p 1080p/4K 1080p/4K
Latency (LAN) <100ms <50ms 25-35ms 15-25ms
Latency (Internet) <200ms <100ms 60-80ms 40-60ms
CPU Usage 20-30% 10-15% 5-10% <5%
Memory Usage 150MB 250MB 400MB <400MB
Bitrate 2-4 Mbps 4-8 Mbps Adaptive Adaptive
Concurrent Sessions 1 3-5 5-10 10+

Latency Budget Allocation (15-25ms Target)

Component Time (ms) Percentage Optimization Strategy
Wayland Capture 2-3 12-15% DMA-BUF zero-copy, partial update
Encoder 3-5 20-25% Hardware encoder, no B-frames
Packetization 1-2 6-10% Inline RTP, minimal buffering
Network (LAN) 0.5-1 3-5% UDP direct path, kernel bypass
Jitter Buffer 0-2 0-10% Minimal buffer, predictive jitter
Decoder 1-2 6-10% Hardware acceleration
Display 1-2 6-10% vsync bypass, direct scanout
Total 15-25 100%

Conclusion

This design provides a comprehensive blueprint for building an ultra-low-latency Wayland → WebRTC remote desktop backend in Rust. Key highlights:

  1. Zero-Copy Architecture: Minimizes CPU copies through DMA-BUF and reference-counted buffers, achieving <5ms copy overhead
  2. Hardware Acceleration: VA-API/NVENC encoders configured for <5ms encoding latency
  3. Minimal Buffering: Single-frame sender buffer and 0-10ms jitter buffer throughout pipeline
  4. Low-Latency WebRTC: Custom configuration with 0-20ms playout delay, no FEC, limited NACK
  5. Performance: Targets 15-25ms latency on local networks at 60 FPS
  6. Adaptive Quality: Dynamic frame rate (30-60fps) and bitrate adjustment based on network conditions
  7. Damage Tracking: Partial region updates for static content to reduce encoding load

Latency Budget Breakdown (15-25ms target):

  • Capture: 2-3ms (DMA-BUF zero-copy)
  • Encoder: 3-5ms (hardware, no B-frames)
  • Packetization: 1-2ms (inline RTP)
  • Network: 0.5-1ms (LAN)
  • Jitter Buffer: 0-2ms (minimal)
  • Decoder: 1-2ms (hardware)
  • Display: 1-2ms (vsync bypass)

The phased implementation approach allows for incremental development and testing:

  • Phase 1 (4-6 weeks): MVP with <100ms latency
  • Phase 2 (3-4 weeks): Hardware acceleration, <50ms latency
  • Phase 3 (4-5 weeks): Low-latency optimizations, 25-35ms latency
  • Phase 4 (5-7 weeks): Ultra-low latency tuning, 15-25ms latency

Critical P0 optimizations for achieving 15-25ms latency:

  1. Hardware encoder with zero B-frames, 15-frame GOP
  2. DMA-BUF zero-copy capture pipeline
  3. Minimal buffering (1 frame sender, 0-10ms jitter)
  4. WebRTC low-latency configuration (0-20ms playout delay)

Additional Resources

Wayland & PipeWire

WebRTC

Video Encoding

Rust