dailz/wl-webrtc

Fork 0

Files

dailz 8d6a720e8d initial

2026-02-03 11:14:25 +08:00

86 KiB

Raw Permalink Blame History

Wayland → WebRTC Remote Desktop Backend

Technical Design Document

System Architecture
Technology Stack
Key Components Design
Data Flow Optimization
Low Latency Optimization
Implementation Roadmap
Potential Challenges & Solutions

System Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                          Client Browser                              │
│                    (WebRTC Receiver)                                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │ WebRTC (UDP/TCP)
                              │ Signaling (WebSocket/HTTP)
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Signaling Server                              │
│                    (WebSocket/WebSocket Secure)                     │
│                      - Session Management                           │
│                      - SDP Exchange                                  │
│                      - ICE Candidates                                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Rust Backend Server                               │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Capture    │    │   Encoder    │    │  WebRTC      │          │
│  │   Manager    │───▶│  Pipeline    │───▶│  Transport   │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                  │                  │                      │
│         ▼                  ▼                  ▼                      │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   PipeWire   │    │   Video      │    │  Data        │          │
│  │   Portal     │    │   Encoder    │    │  Channels    │          │
│  │   (xdg-      │    │   (H.264/    │    │  (Input/     │          │
│  │   desktop-   │    │   H.265/VP9) │    │   Control)   │          │
│  │   portal)    │    └──────────────┘    └──────────────┘          │
│  └──────────────┘                                                     │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                Zero-Copy Buffer Manager                     │    │
│  │           - DMA-BUF Import/Export                            │    │
│  │           - Shared Memory Pools                              │    │
│  │           - Memory Ownership Tracking                         │    │
│  └─────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Wayland Compositor                                │
│                  (PipeWire Screen Sharing)                           │
└─────────────────────────────────────────────────────────────────────┘

Component Breakdown

1. Capture Manager

Responsibilities:

Interface with PipeWire xdg-desktop-portal
Request screen capture permissions
Receive DMA-BUF frames
Manage frame buffer lifecycle

Key Technologies:

pipewire crate for PipeWire protocol
wayland-client for Wayland protocol
ashpd for desktop portals

pub struct CaptureManager {
    pipewire_connection: Rc<PipewireConnection>,
    stream_handle: Option<StreamHandle>,
    frame_sender: async_channel::Sender<CapturedFrame>,
    config: CaptureConfig,
}

pub struct CaptureConfig {
    pub frame_rate: u32,
    pub quality: QualityLevel,
    pub screen_region: Option<ScreenRegion>,
}

pub enum QualityLevel {
    Low,
    Medium,
    High,
    Ultra,
}

pub struct CapturedFrame {
    pub dma_buf: DmaBufHandle,
    pub width: u32,
    pub height: u32,
    pub format: PixelFormat,
    pub timestamp: u64,
}

2. Encoder Pipeline

Responsibilities:

Receive raw frames from capture
Encode to H.264/H.265/VP9
Hardware acceleration (VA-API, NVENC, VideoToolbox)
Bitrate adaptation

Zero-Copy Strategy:

Direct DMA-BUF to encoder (no CPU copies)
Encoder outputs to memory-mapped buffers
WebRTC consumes encoded buffers directly

pub struct EncoderPipeline {
    encoder: Box<dyn VideoEncoder>,
    config: EncoderConfig,
    stats: EncoderStats,
}

pub trait VideoEncoder: Send + Sync {
    fn encode_frame(
        &mut self,
        frame: CapturedFrame,
    ) -> Result<EncodedFrame, EncoderError>;
    
    fn set_bitrate(&mut self, bitrate: u32) -> Result<(), EncoderError>;
    
    fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}

pub struct EncodedFrame {
    pub data: Bytes, // Zero-copy Bytes wrapper
    pub is_keyframe: bool,
    pub timestamp: u64,
    pub sequence_number: u64,
}

3. WebRTC Transport

Responsibilities:

WebRTC peer connection management
Media track (video) and data channels
RTP packetization
ICE/STUN/TURN handling
Congestion control

Libraries:

webrtc crate (webrtc-rs) or custom WebRTC implementation

pub struct WebRtcTransport {
    peer_connection: RTCPeerConnection,
    video_track: RTCVideoTrack,
    data_channel: Option<RTCDataChannel>,
    config: WebRtcConfig,
}

pub struct WebRtcConfig {
    pub stun_servers: Vec<String>,
    pub turn_servers: Vec<TurnServer>,
    pub ice_transport_policy: IceTransportPolicy,
}

pub struct TurnServer {
    pub urls: Vec<String>,
    pub username: String,
    pub credential: String,
}

4. Zero-Copy Buffer Manager

Responsibilities:

Manage DMA-BUF lifecycle
Pool pre-allocated memory
Track ownership via Rust types
Coordinate with PipeWire memory pools

pub struct BufferManager {
    dma_buf_pool: Pool<DmaBufHandle>,
    encoded_buffer_pool: Pool<Bytes>,
    max_buffers: usize,
}

impl BufferManager {
    pub fn acquire_dma_buf(&self) -> Option<DmaBufHandle> {
        self.dma_buf_pool.acquire()
    }
    
    pub fn release_dma_buf(&self, handle: DmaBufHandle) {
        self.dma_buf_pool.release(handle)
    }
    
    pub fn acquire_encoded_buffer(&self, size: usize) -> Option<Bytes> {
        self.encoded_buffer_pool.acquire_with_size(size)
    }
}

Data Flow

Wayland Compositor
        │
        │ DMA-BUF (GPU memory)
        ▼
PipeWire Portal
        │
        │ DMA-BUF file descriptor
        ▼
Capture Manager
        │
        │ CapturedFrame { dma_buf, ... }
        │ (Zero-copy ownership transfer)
        ▼
Buffer Manager
        │
        │ DmaBufHandle (moved, not copied)
        ▼
Encoder Pipeline
        │
        │ EncodedFrame { data: Bytes, ... }
        │ (Zero-copy Bytes wrapper)
        ▼
WebRTC Transport
        │
        │ RTP Packets (reference to Bytes)
        ▼
Network (UDP/TCP)
        │
        ▼
Client Browser

Technology Stack

Core Dependencies

[dependencies]
# Async Runtime
tokio = { version = "1.35", features = ["full", "rt-multi-thread"] }
async-trait = "0.1"

# Wayland & PipeWire
wayland-client = "0.31"
wayland-protocols = "0.31"
pipewire = "0.8"
ashpd = "0.8"

# Video Encoding (Low Latency)
openh264 = { version = "0.6", optional = true }
x264 = { version = "0.4", optional = true }
nvenc = { version = "0.1", optional = true }
vpx = { version = "0.1", optional = true }

# Hardware Acceleration (Low Latency)
libva = { version = "0.14", optional = true }     # VA-API
nvidia-encode = { version = "0.5", optional = true }  # NVENC

# WebRTC (Low Latency Configuration)
webrtc = "0.11"  # webrtc-rs

# Memory & Zero-Copy
bytes = "1.5"
memmap2 = "0.9"
shared_memory = "0.12"

# Lock-free data structures for minimal contention
crossbeam = { version = "0.8", features = ["std"] }
crossbeam-channel = "0.5"
crossbeam-queue = "0.3"
parking_lot = "0.12"  # Faster mutexes

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

# Logging & Tracing
tracing = "0.1"
tracing-subscriber = "0.3"
tracing-opentelemetry = "0.22"  # For latency monitoring

# Metrics & Monitoring
prometheus = { version = "0.13", optional = true }
metrics = "0.21"

# Error Handling
anyhow = "1.0"
thiserror = "1.0"

# Utilities
regex = "1.10"
uuid = { version = "1.6", features = ["v4", "serde", "fast-rng"] }
instant = "0.1"  # High-precision timing

[features]
default = ["software-encoder", "webrtc-rs"]

# Encoder Options
software-encoder = ["x264", "openh264"]
hardware-vaapi = ["libva"]
hardware-nvenc = ["nvidia-encode"]
all-encoders = ["software-encoder", "hardware-vaapi", "hardware-nvenc"]

# WebRTC Implementation
webrtc-rs = ["webrtc"]
custom-webrtc = []

# Low Latency Features
low-latency = []
ultra-low-latency = ["low-latency", "all-encoders"]

# Monitoring
monitoring = ["prometheus", "tracing-opentelemetry"]

# Development
dev = ["monitoring", "all-encoders"]

Encoder Options

Encoder	Hardware	Performance	Quality	License	Use Case
H.264 (x264)	CPU	Medium	High	GPL	Fallback
H.264 (VA-API)	GPU	High	Medium	Open Source	Linux Intel/AMD
H.264 (NVENC)	GPU (NVIDIA)	Very High	High	Proprietary	NVIDIA GPUs
H.265 (HEVC)	GPU	High	Very High	Mixed	Bandwidth-constrained
VP9	CPU/GPU	Medium	High	BSD	Open Web
AV1	GPU	Medium	Very High	Open Source	Future-proof

Recommended Primary: VA-API H.264 (Linux), NVENC H.264 (NVIDIA) Recommended Fallback: x264 H.264 (software)

WebRTC Libraries

Option 1: webrtc-rs (Recommended)

Pure Rust implementation
Active development
Good WebRTC spec compliance
Zero-copy support for media

Option 2: Custom Implementation

Use webrtc crate as base
Add specialized zero-copy optimizations
Tighter integration with encoder pipeline

Key Components Design

1. Wayland Screen Capture Module

// src/capture/mod.rs
use pipewire as pw;
use pipewire::properties;
use pipewire::spa::param::format::Format;
use pipewire::stream::StreamFlags;
use async_channel::{Sender, Receiver};

pub struct WaylandCapture {
    core: pw::Core,
    context: pw::Context,
    main_loop: pw::MainLoop,
    stream: pw::stream::Stream,
    frame_sender: Sender<CapturedFrame>,
    frame_receiver: Receiver<CapturedFrame>,
}

impl WaylandCapture {
    pub async fn new(config: CaptureConfig) -> Result<Self, CaptureError> {
        let main_loop = pw::MainLoop::new()?;
        let context = pw::Context::new(&main_loop)?;
        let core = context.connect(None)?;
        
        // Request screen capture via xdg-desktop-portal
        let portal = Portal::new().await?;
        let session = portal.create_session(ScreenCaptureType::Monitor).await?;
        let sources = portal.request_sources(&session).await?;
        
        let (sender, receiver) = async_channel::bounded(30);
        
        Ok(Self {
            core,
            context,
            main_loop,
            stream: Self::create_stream(&context, &session, sender.clone())?,
            frame_sender: sender,
            frame_receiver: receiver,
        })
    }
    
    fn create_stream(
        context: &pw::Context,
        session: &Session,
        sender: Sender<CapturedFrame>,
    ) -> Result<pw::stream::Stream, pw::Error> {
        let mut stream = pw::stream::Stream::new(
            context,
            "wl-webrtc-capture",
            properties! {
                *pw::keys::MEDIA_TYPE => "Video",
                *pw::keys::MEDIA_CATEGORY => "Capture",
                *pw::keys::MEDIA_ROLE => "Screen",
            },
        )?;
        
        stream.connect(
            pw::spa::direction::Direction::Input,
            None,
            StreamFlags::AUTOCONNECT | StreamFlags::MAP_BUFFERS,
        )?;
        
        // Set up callback for new frames (zero-copy DMA-BUF)
        let listener = stream.add_local_listener()?;
        listener
            .register(pw::stream::events::Events::param_done, |data| {
                // Handle stream parameter changes
            })
            .register(pw::stream::events::Events::process, |data| {
                // Process new frame - DMA-BUF is already mapped
                Self::process_frame(data, sender.clone());
            })?;
        
        Ok(stream)
    }
    
    fn process_frame(
        stream: &pw::stream::Stream,
        sender: Sender<CapturedFrame>,
    ) {
        // Get buffer without copying - DMA-BUF is in GPU memory
        let buffer = stream.dequeue_buffer().expect("no buffer");
        let datas = buffer.datas();
        let data = &datas[0];
        
        // Create zero-copy frame
        let frame = CapturedFrame {
            dma_buf: DmaBufHandle::from_buffer(buffer),
            width: stream.format().unwrap().size().width,
            height: stream.format().unwrap().size().height,
            format: PixelFormat::from_spa_format(&stream.format().unwrap()),
            timestamp: timestamp_ns(),
        };
        
        // Send frame (ownership transferred via move)
        let _ = sender.try_send(frame);
    }
    
    pub async fn next_frame(&self) -> CapturedFrame {
        self.frame_receiver.recv().await.unwrap()
    }
}

// Zero-copy DMA-BUF handle
pub struct DmaBufHandle {
    fd: RawFd,
    size: usize,
    stride: u32,
    offset: u32,
}

impl DmaBufHandle {
    pub fn from_buffer(buffer: &pw::buffer::Buffer) -> Self {
        let data = &buffer.datas()[0];
        Self {
            fd: data.fd().unwrap(),
            size: data.chunk().size() as usize,
            stride: data.chunk().stride(),
            offset: data.chunk().offset(),
        }
    }
    
    pub unsafe fn as_ptr(&self) -> *mut u8 {
        // Memory map the DMA-BUF
        let ptr = libc::mmap(
            ptr::null_mut(),
            self.size,
            libc::PROT_READ,
            libc::MAP_SHARED,
            self.fd,
            self.offset as i64,
        );
        
        if ptr == libc::MAP_FAILED {
            panic!("Failed to mmap DMA-BUF");
        }
        
        ptr as *mut u8
    }
}

impl Drop for DmaBufHandle {
    fn drop(&mut self) {
        // Unmap and close FD when handle is dropped
        unsafe {
            libc::munmap(ptr::null_mut(), self.size);
            libc::close(self.fd);
        }
    }
}

2. Frame Buffer Management (Zero-Copy)

// src/buffer/mod.rs
use bytes::Bytes;
use std::sync::Arc;
use std::collections::VecDeque;

pub struct FrameBufferPool {
    dma_bufs: VecDeque<DmaBufHandle>,
    encoded_buffers: VecDeque<Bytes>,
    max_dma_bufs: usize,
    max_encoded: usize,
}

impl FrameBufferPool {
    pub fn new(max_dma_bufs: usize, max_encoded: usize) -> Self {
        Self {
            dma_bufs: VecDeque::with_capacity(max_dma_bufs),
            encoded_buffers: VecDeque::with_capacity(max_encoded),
            max_dma_bufs,
            max_encoded,
        }
    }
    
    pub fn acquire_dma_buf(&mut self) -> Option<DmaBufHandle> {
        self.dma_bufs.pop_front()
    }
    
    pub fn release_dma_buf(&mut self, buf: DmaBufHandle) {
        if self.dma_bufs.len() < self.max_dma_bufs {
            self.dma_bufs.push_back(buf);
        }
        // Else: Drop the buffer, let OS reclaim DMA-BUF
    }
    
    pub fn acquire_encoded_buffer(&mut self, size: usize) -> Bytes {
        // Try to reuse existing buffer
        if let Some(mut buf) = self.encoded_buffers.pop_front() {
            if buf.len() >= size {
                // Slice to requested size (zero-copy view)
                return buf.split_to(size);
            }
        }
        
        // Allocate new buffer if needed
        Bytes::from(vec![0u8; size])
    }
    
    pub fn release_encoded_buffer(&mut self, buf: Bytes) {
        if self.encoded_buffers.len() < self.max_encoded {
            self.encoded_buffers.push_back(buf);
        }
        // Else: Drop the buffer, memory freed
    }
}

// Zero-copy frame wrapper
pub struct ZeroCopyFrame {
    pub data: Bytes, // Reference-counted, no copying
    pub metadata: FrameMetadata,
}

pub struct FrameMetadata {
    pub width: u32,
    pub height: u32,
    pub format: PixelFormat,
    pub timestamp: u64,
    pub is_keyframe: bool,
}

// Smart pointer for DMA-BUF
pub struct DmaBufPtr {
    ptr: *mut u8,
    len: usize,
    _marker: PhantomData<&'static mut [u8]>,
}

impl DmaBufPtr {
    pub unsafe fn new(ptr: *mut u8, len: usize) -> Self {
        Self {
            ptr,
            len,
            _marker: PhantomData,
        }
    }
    
    pub fn as_slice(&self) -> &[u8] {
        unsafe { std::slice::from_raw_parts(self.ptr, self.len) }
    }
}

unsafe impl Send for DmaBufPtr {}
unsafe impl Sync for DmaBufPtr {}

impl Drop for DmaBufPtr {
    fn drop(&mut self) {
        // Memory will be unmapped by DmaBufHandle's Drop
    }
}

3. Video Encoder Integration

// src/encoder/mod.rs
use async_trait::async_trait;

pub enum EncoderType {
    H264_VAAPI,
    H264_NVENC,
    H264_X264,
    VP9_VAAPI,
}

pub struct EncoderConfig {
    pub encoder_type: EncoderType,
    pub bitrate: u32,
    pub keyframe_interval: u32,
    pub preset: EncodePreset,
}

pub enum EncodePreset {
    Ultrafast,
    Superfast,
    Veryfast,
    Faster,
    Fast,
    Medium,
    Slow,
    Slower,
    Veryslow,
}

#[async_trait]
pub trait VideoEncoder: Send + Sync {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError>;
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError>;
    async fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}

pub struct VaapiEncoder {
    display: va::Display,
    context: va::Context,
    config: EncoderConfig,
    sequence_number: u64,
}

impl VaapiEncoder {
    pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
        let display = va::Display::open(None)?;
        let context = va::Context::new(&display)?;
        
        Ok(Self {
            display,
            context,
            config,
            sequence_number: 0,
        })
    }
}

#[async_trait]
impl VideoEncoder for VaapiEncoder {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
        // Zero-copy: Import DMA-BUF directly into VA-API surface
        let surface = unsafe {
            self.context.import_dma_buf(
                frame.dma_buf.fd,
                frame.width,
                frame.height,
                frame.format.as_va_format(),
            )?
        };
        
        // Encode frame (hardware accelerated)
        let encoded_data = self.context.encode_surface(surface)?;
        
        // Create zero-copy Bytes wrapper
        let bytes = Bytes::from(encoded_data);
        
        self.sequence_number += 1;
        
        Ok(EncodedFrame {
            data: bytes,
            is_keyframe: surface.is_keyframe(),
            timestamp: frame.timestamp,
            sequence_number: self.sequence_number,
        })
    }
    
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
        self.config = config;
        self.context.set_bitrate(config.bitrate)?;
        self.context.set_preset(config.preset)?;
        Ok(())
    }
    
    async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
        self.context.force_keyframe()?;
        Ok(())
    }
}

// Fallback software encoder
pub struct X264Encoder {
    encoder: x264::Encoder,
    config: EncoderConfig,
    sequence_number: u64,
}

impl X264Encoder {
    pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
        let params = x264::Params::default();
        params.set_width(1920);
        params.set_height(1080);
        params.set_fps(60, 1);
        params.set_bitrate(config.bitrate);
        params.set_preset(config.preset);
        params.set_tune("zerolatency");
        
        let encoder = x264::Encoder::open(&params)?;
        
        Ok(Self {
            encoder,
            config,
            sequence_number: 0,
        })
    }
}

#[async_trait]
impl VideoEncoder for X264Encoder {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
        // Map DMA-BUF to CPU memory (one-time copy)
        let ptr = unsafe { frame.dma_buf.as_ptr() };
        let slice = unsafe { std::slice::from_raw_parts(ptr, frame.dma_buf.size) };
        
        // Convert to YUV if needed
        let yuv_frame = self.convert_to_yuv(slice, frame.width, frame.height)?;
        
        // Encode frame
        let encoded_data = self.encoder.encode(&yuv_frame)?;
        
        self.sequence_number += 1;
        
        Ok(EncodedFrame {
            data: Bytes::from(encoded_data),
            is_keyframe: self.encoder.is_keyframe(),
            timestamp: frame.timestamp,
            sequence_number: self.sequence_number,
        })
    }
    
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
        self.config = config;
        // Reopen encoder with new params
        Ok(())
    }
    
    async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
        self.encoder.force_keyframe();
        Ok(())
    }
}

4. WebRTC Signaling and Data Transport

// src/webrtc/mod.rs
use webrtc::{
    api::APIBuilder,
    ice_transport::ice_server::RTCIceServer,
    media_track::{track_local::track_local_static_sample::TrackLocalStaticSample, TrackLocal},
    peer_connection::{
        configuration::RTCConfiguration,
        peer_connection_state::RTCPeerConnectionState,
        sdp::session_description::RTCSessionDescription,
        RTCPeerConnection,
    },
    rtp_transceiver::rtp_codec::RTCRtpCodecCapability,
};

pub struct WebRtcServer {
    api: webrtc::API,
    peer_connections: Arc<Mutex<HashMap<String, PeerConnection>>>,
    signaling_server: SignalingServer,
}

impl WebRtcServer {
    pub async fn new(config: WebRtcConfig) -> Result<Self, WebRtcError> {
        let mut api = APIBuilder::new().build();
        
        let signaling_server = SignalingServer::new(config.signaling_addr).await?;
        
        Ok(Self {
            api,
            peer_connections: Arc::new(Mutex::new(HashMap::new())),
            signaling_server,
        })
    }
    
    pub async fn create_peer_connection(
        &self,
        session_id: String,
        video_track: TrackLocalStaticSample,
    ) -> Result<String, WebRtcError> {
        let config = RTCConfiguration {
            ice_servers: vec![RTCIceServer {
                urls: self.signaling_server.stun_servers(),
                ..Default::default()
            }],
            ..Default::default()
        };
        
        let pc = self.api.new_peer_connection(config).await?;
        
        // Add video track
        let rtp_transceiver = pc
            .add_track(Arc::new(video_track))
            .await?;
        
        // Set ICE candidate handler
        let peer_connections = self.peer_connections.clone();
        pc.on_ice_candidate(Box::new(move |candidate| {
            let peer_connections = peer_connections.clone();
            Box::pin(async move {
                if let Some(candidate) = candidate {
                    // Send candidate to signaling server
                    // ...
                }
            })
        }))
        .await;
        
        // Store peer connection
        self.peer_connections
            .lock()
            .await
            .insert(session_id.clone(), PeerConnection::new(pc));
        
        Ok(session_id)
    }
    
    pub async fn send_video_frame(
        &self,
        session_id: &str,
        frame: EncodedFrame,
    ) -> Result<(), WebRtcError> {
        let peer_connections = self.peer_connections.lock().await;
        
        if let Some(peer) = peer_connections.get(session_id) {
            peer.video_track.write_sample(&webrtc::media::Sample {
                data: frame.data.to_vec(),
                duration: std::time::Duration::from_nanos(frame.timestamp),
                ..Default::default()
            }).await?;
        }
        
        Ok(())
    }
}

pub struct PeerConnection {
    pc: RTCPeerConnection,
    video_track: Arc<TrackLocalStaticSample>,
    data_channel: Option<Arc<RTCDataChannel>>,
}

impl PeerConnection {
    pub async fn create_offer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
        let offer = self.pc.create_offer(None).await?;
        self.pc.set_local_description(offer.clone()).await?;
        Ok(offer)
    }
    
    pub async fn set_remote_description(
        &mut self,
        desc: RTCSessionDescription,
    ) -> Result<(), WebRtcError> {
        self.pc.set_remote_description(desc).await
    }
    
    pub async fn create_answer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
        let answer = self.pc.create_answer(None).await?;
        self.pc.set_local_description(answer.clone()).await?;
        Ok(answer)
    }
}

// Data channel for input/control
pub struct DataChannelManager {
    channels: HashMap<String, Arc<RTCDataChannel>>,
}

impl DataChannelManager {
    pub async fn send_input(&self, channel_id: &str, input: InputEvent) -> Result<()> {
        if let Some(channel) = self.channels.get(channel_id) {
            let data = serde_json::to_vec(&input)?;
            channel.send(&data).await?;
        }
        Ok(())
    }
    
    pub fn on_input<F>(&mut self, channel_id: String, callback: F)
    where
        F: Fn(InputEvent) + Send + Sync + 'static,
    {
        if let Some(channel) = self.channels.get(&channel_id) {
            channel.on_message(Box::new(move |msg| {
                if let Ok(input) = serde_json::from_slice::<InputEvent>(&msg.data) {
                    callback(input);
                }
            })).unwrap();
        }
    }
}

#[derive(Debug, Serialize, Deserialize)]
pub enum InputEvent {
    MouseMove { x: f32, y: f32 },
    MouseClick { button: MouseButton },
    KeyPress { key: String },
    KeyRelease { key: String },
}

#[derive(Debug, Serialize, Deserialize)]
pub enum MouseButton {
    Left,
    Right,
    Middle,
}

5. IPC Layer (Optional)

// src/ipc/mod.rs
use tokio::net::UnixListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};

pub struct IpcServer {
    listener: UnixListener,
}

impl IpcServer {
    pub async fn new(socket_path: &str) -> Result<Self> {
        // Remove existing socket if present
        let _ = std::fs::remove_file(socket_path);
        
        let listener = UnixListener::bind(socket_path)?;
        
        Ok(Self { listener })
    }
    
    pub async fn run(&self, sender: async_channel::Sender<IpcMessage>) {
        loop {
            match self.listener.accept().await {
                Ok((mut stream, _)) => {
                    let sender = sender.clone();
                    tokio::spawn(async move {
                        let mut buf = [0; 1024];
                        loop {
                            match stream.read(&mut buf).await {
                                Ok(0) => break,
                                Ok(n) => {
                                    if let Ok(msg) = serde_json::from_slice::<IpcMessage>(&buf[..n]) {
                                        let _ = sender.send(msg).await;
                                    }
                                }
                                Err(_) => break,
                            }
                        }
                    });
                }
                Err(_) => continue,
            }
        }
    }
}

#[derive(Debug, Serialize, Deserialize)]
pub enum IpcMessage {
    StartCapture { session_id: String },
    StopCapture { session_id: String },
    SetQuality { level: QualityLevel },
    GetStatus,
}

Data Flow Optimization

Zero-Copy Pipeline Stages

Stage 1: Capture
  Input: Wayland Compositor (GPU memory)
  Output: DMA-BUF file descriptor
  Copy: None (Zero-copy)
  
Stage 2: Buffer Manager
  Input: DMA-BUF FD
  Output: DmaBufHandle (RAII wrapper)
  Copy: None (Zero-copy ownership transfer)
  
Stage 3: Encoder
  Input: DmaBufHandle
  Output: Bytes (reference-counted)
  Copy: None (DMA-BUF imported directly to GPU encoder)
  
Stage 4: WebRTC
  Input: Bytes
  Output: RTP packets (references to Bytes)
  Copy: None (Zero-copy to socket buffers)
  
Stage 5: Network
  Input: RTP packets
  Output: UDP datagrams
  Copy: Minimal (kernel space only)

Memory Ownership Transfer

// Example: Ownership transfer through pipeline
async fn process_frame_pipeline(
    mut capture: WaylandCapture,
    mut encoder: VaapiEncoder,
    mut webrtc: WebRtcServer,
) -> Result<()> {
    loop {
        // Stage 1: Capture (ownership moves from PipeWire to our code)
        let frame = capture.next_frame().await; // CapturedFrame owns DmaBufHandle
        
        // Stage 2: Encode (ownership moved, not copied)
        let encoded = encoder.encode(frame).await?; // EncodedFrame owns Bytes
        
        // Stage 3: Send (Bytes is reference-counted, no copy)
        webrtc.send_video_frame("session-123", encoded).await?;
        
        // Ownership transferred all the way without copying
    }
}

1. DMA-BUF (Primary)

GPU memory buffers
Exported as file descriptors
Zero-copy to hardware encoders
Limited to same GPU/driver

pub fn export_dma_buf(surface: &va::Surface) -> Result<DmaBufHandle> {
    let fd = surface.export_dma_buf()?;
    Ok(DmaBufHandle {
        fd,
        size: surface.size(),
        stride: surface.stride(),
        offset: 0,
    })
}

2. Shared Memory (Fallback)

POSIX shared memory (shm_open)
For software encoding path
Copy from DMA-BUF to shared memory

pub fn create_shared_buffer(size: usize) -> Result<SharedBuffer> {
    let name = format!("/wl-webrtc-{}", uuid::Uuid::new_v4());
    let fd = shm_open(&name, O_CREAT | O_RDWR, 0666)?;
    ftruncate(fd, size as i64)?;
    
    let ptr = unsafe { mmap(ptr::null_mut(), size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) };
    
    Ok(SharedBuffer {
        ptr,
        size,
        fd,
        name,
    })
}

3. Memory-Mapped Files (Alternative)

For persistent caching
Cross-process communication
Used for frame buffering

pub struct MappedFile {
    file: File,
    ptr: *mut u8,
    size: usize,
}

impl MappedFile {
    pub fn new(path: &Path, size: usize) -> Result<Self> {
        let file = OpenOptions::new()
            .read(true)
            .write(true)
            .create(true)
            .open(path)?;
        
        file.set_len(size as u64)?;
        
        let ptr = unsafe {
            mmap(
                ptr::null_mut(),
                size,
                PROT_READ | PROT_WRITE,
                MAP_SHARED,
                file.as_raw_fd(),
                0,
            )
        }?;
        
        Ok(Self { file, ptr, size })
    }
}

Pipeline Optimization Strategies

1. Parallel Encoding

// Run multiple encoders in parallel for different quality levels
pub struct AdaptiveEncoder {
    encoders: Vec<Box<dyn VideoEncoder>>,
    active_encoder: usize,
    bandwidth_monitor: BandwidthMonitor,
}

impl AdaptiveEncoder {
    pub async fn encode_adaptive(&mut self, frame: CapturedFrame) -> Result<EncodedFrame> {
        let bandwidth = self.bandwidth_monitor.current_bandwidth();
        
        // Switch encoder based on bandwidth
        let new_encoder = match bandwidth {
            b if b < 500_000 => 0,  // Low bitrate
            b if b < 2_000_000 => 1, // Medium bitrate
            _ => 2,                  // High bitrate
        };
        
        if new_encoder != self.active_encoder {
            self.active_encoder = new_encoder;
        }
        
        self.encoders[self.active_encoder].encode(frame).await
    }
}

2. Frame Skipping

pub struct FrameSkipper {
    target_fps: u32,
    last_frame_time: Instant,
    skip_count: u32,
}

impl FrameSkipper {
    pub fn should_skip(&mut self) -> bool {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_frame_time).as_millis();
        let frame_interval = 1000 / self.target_fps as u128;
        
        if elapsed < frame_interval {
            self.skip_count += 1;
            return true;
        }
        
        self.last_frame_time = now;
        self.skip_count = 0;
        false
    }
}

3. Region of Interest (ROI)

pub struct RegionEncoder {
    full_encoder: Box<dyn VideoEncoder>,
    roi_encoder: Box<dyn VideoEncoder>,
    current_region: Option<ScreenRegion>,
}

impl RegionEncoder {
    pub async fn encode_roi(
        &mut self,
        frame: CapturedFrame,
        roi: Option<ScreenRegion>,
    ) -> Result<EncodedFrame> {
        if let Some(region) = roi {
            // Encode only ROI with higher quality
            let cropped = self.crop_frame(frame, region)?;
            self.roi_encoder.encode(cropped).await
        } else {
            // Encode full frame
            self.full_encoder.encode(frame).await
        }
    }
    
    fn crop_frame(&self, mut frame: CapturedFrame, region: ScreenRegion) -> Result<CapturedFrame> {
        // Adjust DMA-BUF offsets for region
        frame.width = region.width;
        frame.height = region.height;
        Ok(frame)
    }
}

Low Latency Optimization

Design Philosophy

To achieve 15-25ms latency on local networks, we prioritize:

Speed over completeness: Fast, low-latency delivery is more important than perfect reliability
Minimize buffering: Small buffers at every stage
Zero-copy everywhere: Eliminate CPU memory copies
Hardware acceleration: Use GPU for all intensive operations
Predictive timing: Reduce wait times with accurate timing

1. Encoder Optimization

Hardware Encoder Configuration

pub struct LowLatencyEncoderConfig {
    // Codec settings
    pub codec: VideoCodec,
    
    // Low-latency specific
    pub gop_size: u32,           // Small GOP: 8-15 frames
    pub b_frames: u32,           // Zero B-frames for minimal latency
    pub max_b_frames: u32,       // Always 0 for low latency
    pub lookahead: u32,           // Minimal lookahead: 0-2 frames
    
    // Rate control
    pub rc_mode: RateControlMode, // CBR or VBR with strict constraints
    pub bitrate: u32,            // Adaptive bitrate
    pub max_bitrate: u32,        // Tight max constraint
    pub min_bitrate: u32,
    pub vbv_buffer_size: u32,    // Very small VBV buffer
    pub vbv_max_rate: u32,       // Close to bitrate
    
    // Timing
    pub fps: u32,                // Target FPS (30-60)
    pub intra_period: u32,       // Keyframe interval
    
    // Quality vs Latency trade-offs
    pub preset: EncoderPreset,   // Ultrafast/Fast
    pub tune: EncoderTune,       // zerolatency
    pub quality: u8,             // Constant quality (CRF) or CQ
}

pub enum VideoCodec {
    H264,    // Best compatibility, good latency
    H265,    // Better compression, slightly higher latency
    VP9,     // Open alternative
}

pub enum RateControlMode {
    CBR,     // Constant Bitrate - predictable
    VBR,     // Variable Bitrate - better quality
    CQP,     // Constant Quantizer - lowest latency
}

pub enum EncoderPreset {
    Ultrafast,  // Lowest latency, lower quality
    Superfast,
    Veryfast,   // Recommended for 15-25ms
    Faster,
}

pub enum EncoderTune {
    Zerolatency, // Mandatory for low latency
    Film,
    Animation,
}

Recommended Encoder Settings

VA-API (Intel/AMD) - For 15-25ms latency:

// libva-specific low-latency settings
VAConfigAttrib attribs[] = {
    {VAConfigAttribRTFormat, VA_RT_FORMAT_YUV420},
    {VAConfigAttribRateControl, VA_RC_CBR},
    {VAConfigAttribEncMaxRefFrames, {1, 0}},  // Min reference frames
    {VAConfigAttribEncPackedHeaders, VA_ENC_PACKED_HEADER_SEQUENCE},
};

VAEncSequenceParameterBufferH264 seq_param = {
    .intra_period = 15,           // Short GOP
    .ip_period = 1,               // No B-frames
    .bits_per_second = 4000000,
    .max_num_ref_frames = 1,     // Minimal references
    .time_scale = 90000,
    .num_units_in_tick = 1500,   // 60 FPS
};

VAEncPictureParameterBufferH264 pic_param = {
    .reference_frames = {
        {0, VA_FRAME_PICTURE},    // Single reference
    },
    .num_ref_idx_l0_active_minus1 = 0,
    .num_ref_idx_l1_active_minus1 = 0,
    .pic_fields.bits.idr_pic_flag = 0,
    .pic_fields.bits.reference_pic_flag = 1,
};

VAEncSliceParameterBufferH264 slice_param = {
    .num_ref_idx_l0_active_minus1 = 0,
    .num_ref_idx_l1_active_minus1 = 0,
    .disable_deblocking_filter_idc = 1,  // Faster
};

NVENC (NVIDIA) - For 15-25ms latency:

// NVENC low-latency configuration
let mut create_params = NV_ENC_INITIALIZE_PARAMS::default();
create_params.encodeGUID = NV_ENC_CODEC_H264_GUID;
create_params.presetGUID = NV_ENC_PRESET_P4_GUID;  // Low latency

let mut config = NV_ENC_CONFIG::default();
config.profileGUID = NV_ENC_H264_PROFILE_BASELINE_GUID;  // Faster encoding
config.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CBR;
config.rcParams.averageBitRate = 4000000;
config.rcParams.maxBitRate = 4000000;
config.rcParams.vbvBufferSize = 4000000;  // 1 second buffer
config.rcParams.vbvInitialDelay = 0;      // Minimal delay

let mut h264_config = unsafe { config.encodeCodecConfig.h264Config };
h264_config.enableIntraRefresh = 1;
h264_config.idrPeriod = 30;                // Keyframe every 30 frames
h264_config.repeatSPSPPS = 1;
h264_config.enableConstrainedEncoding = 1;
h264_config.frameNumD = 0;
h264_config.sliceMode = NV_ENC_SLICE_MODE_AUTOSELECT;

// Low-latency specific
h264_config.maxNumRefFrames = 1;          // Minimal references
h264_config.idrPeriod = 15;               // Shorter GOP

x264 (Software) - For 50-100ms latency:

// x264 parameters for low latency
let param = x264_param_t {
    i_width: 1920,
    i_height: 1080,
    i_fps_num: 60,
    i_fps_den: 1,
    
    // Rate control
    i_bitrate: 4000,                      // 4 Mbps
    i_keyint_max: 15,                     // Short GOP
    b_intra_refresh: 1,
    
    // Low latency
    b_repeat_headers: 1,
    b_annexb: 1,
    i_scenecut_threshold: 0,             // Disable scene detection
    
    // No B-frames for latency
    i_bframe: 0,
    i_bframe_adaptive: 0,
    i_bframe_pyramid: 0,
    
    // References
    i_frame_reference: 1,                 // Minimal references
    
    // Preset: ultrafast or superfast
    // This is set via preset function
};

// Apply preset
x264_param_apply_preset(&param, "superfast");
x264_param_apply_tune(&param, "zerolatency");

Dynamic Bitrate vs Latency Trade-offs

pub struct AdaptiveBitrateController {
    target_latency_ms: u32,
    current_bitrate: u32,
    frame_rate: u32,
    network_quality: NetworkQuality,
    buffer_depth_ms: u32,
}

pub struct NetworkQuality {
    bandwidth_mbps: f64,
    latency_ms: u32,
    packet_loss_rate: f64,
    jitter_ms: u32,
}

impl AdaptiveBitrateController {
    pub fn update_target_bitrate(&mut self, measured_latency_ms: u32) -> u32 {
        let latency_ratio = measured_latency_ms as f64 / self.target_latency_ms as f64;
        
        if latency_ratio > 1.5 {
            // Latency too high - reduce bitrate aggressively
            self.current_bitrate = (self.current_bitrate as f64 * 0.7) as u32;
        } else if latency_ratio > 1.2 {
            // Moderately high - reduce bitrate
            self.current_bitrate = (self.current_bitrate as f64 * 0.85) as u32;
        } else if latency_ratio < 0.8 {
            // Can increase bitrate
            self.current_bitrate = (self.current_bitrate as f64 * 1.1) as u32;
        }
        
        // Clamp to reasonable bounds
        self.current_bitrate = self.current_bitrate.clamp(1000000, 8000000);
        
        self.current_bitrate
    }
}

2. Capture Optimization

PipeWire DMA-BUF Zero-Copy

pub struct LowLatencyCaptureConfig {
    pub frame_rate: u32,           // 30-60 FPS
    pub zero_copy: bool,           // Always true
    pub track_damage: bool,        // Enable damage tracking
    pub partial_updates: bool,     // Encode only damaged regions
    pub buffer_pool_size: usize,   // Small pool: 3-5 buffers
}

pub struct DamageTracker {
    damaged_regions: VecDeque<ScreenRegion>,
    last_frame: Option<DmaBufHandle>,
    threshold: u32,                // Minimum change size to encode
}

impl DamageTracker {
    pub fn update(&mut self, new_frame: &CapturedFrame) -> Vec<ScreenRegion> {
        match &self.last_frame {
            Some(last) => {
                let regions = self.compute_damage_regions(last, new_frame);
                self.last_frame = Some(new_frame.dma_buf.clone());
                regions
            }
            None => {
                self.last_frame = Some(new_frame.dma_buf.clone());
                vec![ScreenRegion {
                    x: 0,
                    y: 0,
                    width: new_frame.width,
                    height: new_frame.height,
                }]
            }
        }
    }
    
    fn compute_damage_regions(&self, last: &DmaBufHandle, new: &CapturedFrame) -> Vec<ScreenRegion> {
        // Compare frames and find changed regions
        // This can be done efficiently with GPU
        // For MVP, we can use a simple block-based comparison
        
        // Block size for comparison (e.g., 16x16 pixels)
        let block_size = 16;
        let blocks_x = (new.width as usize + block_size - 1) / block_size;
        let blocks_y = (new.height as usize + block_size - 1) / block_size;
        
        // Merge adjacent damaged blocks into regions
        // ...
        
        vec![]  // Placeholder
    }
}

Partial Region Encoding

pub struct RegionEncoder {
    full_encoder: Box<dyn VideoEncoder>,
    tile_encoder: Box<dyn VideoEncoder>,
    current_regions: Vec<ScreenRegion>,
}

impl RegionEncoder {
    pub async fn encode_with_regions(
        &mut self,
        frame: CapturedFrame,
        regions: Vec<ScreenRegion>,
    ) -> Result<Vec<EncodedTile>> {
        let mut encoded_tiles = Vec::new();
        
        if regions.is_empty() || regions.len() > 4 {
            // Too many regions or no changes - encode full frame
            let encoded = self.full_encoder.encode(frame).await?;
            encoded_tiles.push(EncodedTile {
                region: ScreenRegion {
                    x: 0,
                    y: 0,
                    width: frame.width,
                    height: frame.height,
                },
                data: encoded.data,
                is_keyframe: encoded.is_keyframe,
            });
        } else {
            // Encode each damaged region separately
            for region in regions {
                let cropped = self.crop_frame(&frame, &region)?;
                let encoded = self.tile_encoder.encode(cropped).await?;
                
                encoded_tiles.push(EncodedTile {
                    region,
                    data: encoded.data,
                    is_keyframe: encoded.is_keyframe,
                });
            }
        }
        
        Ok(encoded_tiles)
    }
    
    fn crop_frame(&self, frame: &CapturedFrame, region: &ScreenRegion) -> Result<CapturedFrame> {
        // Adjust DMA-BUF offsets for the region
        // This is a zero-copy operation - just metadata changes
        
        Ok(CapturedFrame {
            dma_buf: DmaBufHandle::from_region(&frame.dma_buf, region)?,
            width: region.width,
            height: region.height,
            format: frame.format,
            timestamp: frame.timestamp,
        })
    }
}

3. WebRTC Transport Layer Optimization

Low-Latency WebRTC Configuration

pub struct LowLatencyWebRtcConfig {
    // ICE and transport
    pub ice_transport_policy: IceTransportPolicy,
    pub ice_servers: Vec<IceServer>,
    
    // Media settings
    pub video_codecs: Vec<VideoCodecConfig>,
    pub max_bitrate: u32,
    pub min_bitrate: u32,
    pub start_bitrate: u32,
    
    // Buffering - minimize for low latency
    pub playout_delay_min_ms: u16,    // 0-10ms (default 50ms)
    pub playout_delay_max_ms: u16,    // 10-20ms (default 200ms)
    
    // Packetization
    pub rtp_payload_size: u16,        // Smaller packets: 1200 bytes
    pub packetization_mode: PacketizationMode,
    
    // Feedback and retransmission
    pub nack_enabled: bool,           // Limited NACK
    pub fec_enabled: bool,            // Disable FEC for latency
    pub transport_cc_enabled: bool,   // Congestion control
    
    // RTCP settings
    pub rtcp_report_interval_ms: u32,  // Frequent: 50-100ms
}

pub struct VideoCodecConfig {
    pub name: String,
    pub clock_rate: u32,
    pub num_channels: u16,
    pub parameters: CodecParameters,
}

impl LowLatencyWebRtcConfig {
    pub fn for_ultra_low_latency() -> Self {
        Self {
            ice_transport_policy: IceTransportPolicy::All,
            ice_servers: vec![],
            
            video_codecs: vec![
                VideoCodecConfig {
                    name: "H264".to_string(),
                    clock_rate: 90000,
                    num_channels: 1,
                    parameters: CodecParameters {
                        profile_level_id: "42e01f".to_string(),  // Baseline profile
                        packetization_mode: 1,
                        level_asymmetry_allowed: 1,
                    },
                },
            ],
            
            max_bitrate: 8000000,      // 8 Mbps max
            min_bitrate: 500000,       // 500 Kbps min
            start_bitrate: 4000000,    // 4 Mbps start
            
            // Critical: Minimal playout delay
            playout_delay_min_ms: 0,   // No minimum
            playout_delay_max_ms: 20,  // 20ms maximum
            
            // Smaller packets for lower serialization latency
            rtp_payload_size: 1200,
            packetization_mode: PacketizationMode::NonInterleaved,
            
            // Limited retransmission
            nack_enabled: true,        // But limit retransmission window
            fec_enabled: false,        // Disable FEC - adds latency
            transport_cc_enabled: true,
            
            // More frequent RTCP feedback
            rtcp_report_interval_ms: 50,
        }
    }
}

Packet Loss Handling Strategy

pub enum LossHandlingStrategy {
    PreferLatency,    // Drop late frames, prioritize low latency
    PreferQuality,     // Retransmit, prioritize quality
    Balanced,          // Adaptive based on network conditions
}

pub struct PacketLossHandler {
    strategy: LossHandlingStrategy,
    max_retransmission_delay_ms: u32,
    nack_window_size: u32,
}

impl PacketLossHandler {
    pub fn handle_packet_loss(
        &mut self,
        sequence_number: u16,
        now_ms: u64,
    ) -> RetransmissionDecision {
        match self.strategy {
            LossHandlingStrategy::PreferLatency => {
                // Don't retransmit if too old
                if now_ms > self.max_retransmission_delay_ms as u64 {
                    RetransmissionDecision::Drop
                } else {
                    RetransmissionDecision::None
                }
            }
            LossHandlingStrategy::PreferQuality => {
                // Always try to retransmit
                RetransmissionDecision::Request(sequence_number)
            }
            LossHandlingStrategy::Balanced => {
                // Adaptive based on loss rate
                RetransmissionDecision::None  // Placeholder
            }
        }
    }
}

pub enum RetransmissionDecision {
    Request(u16),
    Drop,
    None,
}

NACK vs FEC Selection

Recommendation for 15-25ms latency:

Primary: Limited NACK
- NACK window: 1-2 frames (16-33ms at 60fps)
- Max retransmission delay: 20ms
- Only retransmit keyframes or critical packets
Avoid FEC:
- Forward Error Correction adds significant latency
- With low-loss LAN, FEC overhead outweighs benefits
- Use NACK selectively instead

pub struct NackController {
    window_size_ms: u32,           // 20ms window
    max_nack_packets_per_second: u32,
    nack_list: VecDeque<(u16, u64)>, // (seq_num, timestamp_ms)
}

impl NackController {
    pub fn should_send_nack(&self, seq_num: u16, now_ms: u64) -> bool {
        // Check if packet is within NACK window
        if let Some(&(_, oldest_ts)) = self.nack_list.front() {
            if now_ms - oldest_ts > self.window_size_ms as u64 {
                return false;  // Too old
            }
        }
        
        true
    }
}

4. Frame Rate and Buffer Strategy

Dynamic Frame Rate Adjustment

pub struct FrameRateController {
    target_fps: u32,              // Desired FPS (30-60)
    current_fps: u32,
    frame_times: VecDeque<Instant>,
    last_frame_time: Instant,
    min_interval: Duration,       // 1 / max_fps
}

impl FrameRateController {
    pub fn new(target_fps: u32) -> Self {
        let min_interval = Duration::from_micros(1_000_000 / 60);  // Max 60 FPS
        
        Self {
            target_fps,
            current_fps: 30,
            frame_times: VecDeque::with_capacity(60),
            last_frame_time: Instant::now(),
            min_interval,
        }
    }
    
    pub fn should_capture(&mut self) -> bool {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_frame_time);
        
        if elapsed < self.min_interval {
            return false;  // Too soon
        }
        
        // Update frame rate based on conditions
        self.adjust_fps_based_on_conditions();
        
        self.last_frame_time = now;
        true
    }
    
    pub fn adjust_fps_based_on_conditions(&mut self) {
        // Check system load, network conditions, etc.
        let system_load = self.get_system_load();
        let network_quality = self.get_network_quality();
        
        if system_load > 0.8 || network_quality.is_poor() {
            self.current_fps = 30;  // Reduce frame rate
        } else if system_load < 0.5 && network_quality.is_excellent() {
            self.current_fps = 60;  // Increase frame rate
        } else {
            self.current_fps = 45;  // Balanced
        }
    }
}

Fast Frame Dropping Strategy

pub struct FrameDropper {
    target_fps: u32,
    adaptive_drop_threshold_ms: u32,
    consecutive_drops: u32,
    max_consecutive_drops: u32,
}

impl FrameDropper {
    pub fn should_drop(&mut self, queue_latency_ms: u32) -> bool {
        if queue_latency_ms > self.adaptive_drop_threshold_ms {
            if self.consecutive_drops < self.max_consecutive_drops {
                self.consecutive_drops += 1;
                return true;
            }
        }
        
        self.consecutive_drops = 0;
        false
    }
    
    pub fn get_drop_interval(&self) -> u32 {
        // Calculate how many frames to drop
        match self.target_fps {
            60 => 1,   // Drop 1 out of every 2
            30 => 1,   // Drop 1 out of every 2
            _ => 0,
        }
    }
}

Minimal Buffering

Sender Side:

pub struct SenderBuffer {
    max_size_frames: usize,        // Very small: 1-2 frames
    queue: VecDeque<EncodedFrame>,
    target_latency_ms: u32,
}

impl SenderBuffer {
    pub fn new() -> Self {
        Self {
            max_size_frames: 1,      // Single frame buffer
            queue: VecDeque::with_capacity(2),
            target_latency_ms: 5,    // 5ms target
        }
    }
    
    pub fn push(&mut self, frame: EncodedFrame) -> Result<()> {
        if self.queue.len() >= self.max_size_frames {
            // Drop oldest frame to maintain low latency
            self.queue.pop_front();
        }
        
        self.queue.push_back(frame);
        Ok(())
    }
    
    pub fn pop(&mut self) -> Option<EncodedFrame> {
        self.queue.pop_front()
    }
}

Receiver Side (Jitter Buffer):

pub struct MinimalJitterBuffer {
    target_delay_ms: u32,           // 0-10ms
    min_delay_ms: u32,             // 0ms
    max_delay_ms: u32,             // 10-20ms
    packets: VecDeque<RtpPacket>,
}

impl MinimalJitterBuffer {
    pub fn new() -> Self {
        Self {
            target_delay_ms: 5,     // 5ms target
            min_delay_ms: 0,        // No minimum
            max_delay_ms: 10,      // 10ms maximum
            packets: VecDeque::with_capacity(10),
        }
    }
    
    pub fn push(&mut self, packet: RtpPacket) {
        if self.packets.len() < self.max_delay_ms as usize / 2 {
            self.packets.push_back(packet);
        } else {
            // Buffer full - drop oldest
            self.packets.pop_front();
            self.packets.push_back(packet);
        }
    }
    
    pub fn pop(&mut self) -> Option<RtpPacket> {
        self.packets.pop_front()
    }
}

5. Architecture Adjustments

Single-Threaded vs Multi-Threaded

Recommendation: Hybrid Approach

Capture Thread: Dedicated thread for PipeWire
Encoder Thread: Per-session encoder thread
Network Thread: WebRTC transport thread
Coordination: Lock-free channels for data passing

pub struct PipelineArchitecture {
    capture_thread: JoinHandle<()>,
    encoder_threads: Vec<JoinHandle<()>>,
    network_thread: JoinHandle<()>,
    
    // Lock-free communication
    capture_to_encoder: async_channel::Sender<CapturedFrame>,
    encoder_to_network: async_channel::Sender<EncodedFrame>,
}

Lock Competition Minimization

// Use lock-free data structures where possible
use crossbeam::queue::SegQueue;
use crossbeam::channel::{bounded, unbounded};

pub struct LockFreeFrameQueue {
    queue: SegQueue<CapturedFrame>,
    max_size: usize,
}

impl LockFreeFrameQueue {
    pub fn push(&self, frame: CapturedFrame) -> Result<()> {
        if self.queue.len() >= self.max_size {
            return Err(Error::QueueFull);
        }
        self.queue.push(frame);
        Ok(())
    }
    
    pub fn pop(&self) -> Option<CapturedFrame> {
        self.queue.pop()
    }
}

Async Task Scheduling

pub struct LowLatencyScheduler {
    capture_priority: TaskPriority,
    encode_priority: TaskPriority,
    network_priority: TaskPriority,
}

impl LowLatencyScheduler {
    pub async fn schedule_pipeline(&self) {
        tokio::spawn_with_priority(TaskPriority::High, async move {
            // Critical path: capture -> encode -> send
        });
        
        tokio::spawn_with_priority(TaskPriority::Medium, async move {
            // Background tasks: statistics, logging
        });
    }
}

6. Technology Stack Adjustments

Encoder Selection for Latency

Encoder	Setup Latency	Per-Frame Latency	Quality	Recommendation
VA-API H.264	1-2ms	2-3ms	Medium	Primary (Linux)
NVENC H.264	1-2ms	1-2ms	High	Primary (NVIDIA)
x264 (ultrafast)	0ms	5-8ms	Low	Fallback
x264 (superfast)	0ms	8-12ms	Medium	Fallback

Recommendation:

Primary: VA-API or NVENC H.264 with ultrafast preset
Fallback: x264 with ultrafast preset (accept 30-50ms latency)

Direct Wayland vs PipeWire

Use PipeWire (recommended):

Better DMA-BUF support
Hardware acceleration integration
Zero-copy through ecosystem

Direct Wayland (if needed):

Lower-level control
Potentially lower capture latency (0.5-1ms)
More complex implementation
No portal integration (security issue)

Recommendation: Stick with PipeWire for MVP. Consider direct Wayland only if PipeWire latency is unacceptable.

webrtc-rs Latency Characteristics

Pros:

Pure Rust, predictable behavior
Good zero-copy support
Customizable buffering

Cons:

May have default buffer settings optimized for reliability
Need manual configuration for ultra-low latency

Custom WebRTC Layer (advanced):

Full control over buffering and timing
Can inline packetization
More complex implementation

Recommendation: Use webrtc-rs with low-latency configuration. Only consider custom layer if webrtc-rs cannot achieve targets.

7. Implementation Priority

P0 (Must-Have for MVP)

Hardware Encoder Integration
- VA-API H.264 with low-latency settings
- No B-frames, small GOP (15 frames)
- Ultrafast preset
DMA-BUF Zero-Copy
- PipeWire DMA-BUF import
- Direct encoder feed
- No CPU copies
Minimal Buffering
- Single frame sender buffer
- 0-5ms jitter buffer
- Fast frame dropping
Low-Latency WebRTC Config
- playout_delay_min: 0ms
- playout_delay_max: 20ms
- Disable FEC

P1 (Important for 15-25ms)

Damage Tracking
- Partial region updates
- Reduced encoding load
Dynamic Frame Rate
- 30-60 FPS adaptation
- Network-aware
NACK Control
- Limited retransmission window (20ms)
- Selective NACK

P2 (Nice-to-Have)

Direct Wayland Capture
- If PipeWire latency insufficient
Custom WebRTC Layer
- If webrtc-rs insufficient
Advanced Congestion Control
- SCReAM or Google Congestion Control

8. Testing and Validation

End-to-End Latency Measurement

pub struct LatencyMeter {
    timestamps: VecDeque<(u64, LatencyStage)>,
}

pub enum LatencyStage {
    Capture,
    EncodeStart,
    EncodeEnd,
    Packetize,
    NetworkSend,
    NetworkReceive,
    Depacketize,
    DecodeStart,
    DecodeEnd,
    Display,
}

impl LatencyMeter {
    pub fn mark(&mut self, stage: LatencyStage) {
        let now = timestamp_ns();
        self.timestamps.push_back((now, stage));
    }
    
    pub fn calculate_total_latency(&self) -> Duration {
        if self.timestamps.len() < 2 {
            return Duration::ZERO;
        }
        
        let first = self.timestamps.front().unwrap().0;
        let last = self.timestamps.back().unwrap().0;
        Duration::from_nanos(last - first)
    }
}

Measurement Method:

Timestamp Injection
- Inject frame ID at capture (visible timestamp on screen)
- Capture at client with camera
- Compare timestamps to calculate round-trip
- Divide by 2 for one-way latency
Network Timestamping
- Add frame capture time in RTP header extension
- Measure at receiver
- Account for clock skew
Hardware Timestamping
- Use kernel packet timestamps (SO_TIMESTAMPING)
- Hardware NIC timestamps if available

Performance Benchmarking

#[bench]
fn bench_full_pipeline_latency(b: &mut Bencher) {
    let mut pipeline = LowLatencyPipeline::new(config).unwrap();
    let mut latencies = Vec::new();
    
    b.iter(|| {
        let start = Instant::now();
        
        let frame = pipeline.capture().unwrap();
        let encoded = pipeline.encode(frame).unwrap();
        pipeline.send(encoded).unwrap();
        
        latencies.push(start.elapsed());
    });
    
    let avg_latency = latencies.iter().sum::<Duration>() / latencies.len() as u32;
    println!("Average latency: {:?}", avg_latency);
}

Target Benchmarks:

Metric	Target	Acceptable
Capture latency	2-3ms	<5ms
Encode latency	3-5ms	<8ms
Packetize latency	1-2ms	<3ms
Network (LAN)	0.5-1ms	<2ms
Decode latency	1-2ms	<4ms
Display latency	1-2ms	<4ms
Total	15-25ms	<30ms

Tuning Strategy

Baseline Measurement
- Measure each stage individually
- Identify bottlenecks
Iterative Tuning
- Tune one parameter at a time
- Measure impact on total latency
- Trade off quality if needed
Validation
- Test under various network conditions
- Test under system load
- Test with different content (static, dynamic)
Continuous Monitoring
- Track latency in production
- Alert on degradation
- Adaptive adjustments

Implementation Roadmap (Updated for Low Latency)

Phase 1: MVP (Minimum Viable Product) - 4-6 weeks

Goal: Basic screen capture and WebRTC streaming

Week 1-2: Core Infrastructure

Project setup (Cargo.toml, directory structure)
Tokio async runtime setup
Error handling framework (anyhow/thiserror)
Logging setup (tracing)
Configuration management

Week 2-3: Wayland Capture

PipeWire xdg-desktop-portal integration
Basic screen capture (single monitor)
DMA-BUF import/export
Frame receiver channel

Week 3-4: Simple Encoding

x264 software encoder (fallback)
Basic frame pipeline (capture → encode)
Frame rate limiting

Week 4-5: WebRTC Transport

webrtc-rs integration
Basic peer connection
Video track setup
Simple signaling (WebSocket)

Week 5-6: Testing & Integration

End-to-end test (Wayland → WebRTC → Browser)
Performance benchmarking
Bug fixes

MVP Deliverables:

Working screen capture
WebRTC streaming to browser
15-30 FPS at 720p
x264 encoding (software)

Phase 2: Hardware Acceleration - 3-4 weeks

Goal: GPU-accelerated encoding for better performance

Week 1-2: VA-API Integration

VA-API encoder implementation
DMA-BUF to VA-API surface import
H.264 encoding
Intel/AMD GPU support

Week 2-3: NVENC Integration

NVENC encoder for NVIDIA GPUs
CUDA memory management
NVENC H.264 encoding

Week 3-4: Encoder Selection

Encoder detection and selection
Fallback chain (NVENC → VA-API → x264)
Encoder switching at runtime

Phase 2 Deliverables:

GPU-accelerated encoding
30-60 FPS at 1080p
Lower CPU usage
Adaptive encoder selection

Phase 3: Low Latency Optimization - 4-5 weeks

Goal: Achieve 25-35ms latency on local networks

Week 1: Encoder Low-Latency Configuration (P0)

Configure VA-API/NVENC for <5ms encoding
Disable B-frames, set GOP to 15 frames
Implement CBR rate control with small VBV buffer
Tune encoder preset (ultrafast/superfast)
Measure encoder latency independently

Week 2: Minimal Buffering (P0)

Reduce sender buffer to 1 frame
Implement 0-10ms jitter buffer
Configure WebRTC playout delay (0-20ms)
Disable FEC for latency
Test end-to-end latency

Week 3: Damage Tracking & Partial Updates (P1)

Implement region change detection
Add partial region encoding
Optimize for static content
Benchmark latency improvements

Week 4: Dynamic Frame Rate & Quality (P1)

Implement adaptive frame rate (30-60fps)
Network quality detection
Dynamic bitrate vs latency trade-off
Fast frame dropping under load

Week 5: Advanced Optimizations (P1/P2)

Limited NACK window (20ms)
Selective packet retransmission
RTCP fine-tuning (50ms intervals)
Performance profiling
Final latency tuning

Phase 3 Deliverables:

25-35ms latency on LAN
Zero-copy DMA-BUF pipeline
Hardware encoder with low-latency config
Minimal buffering throughout pipeline
Adaptive quality based on conditions

Phase 4: Production Ready with Ultra Low Latency - 5-7 weeks

Goal: Achieve 15-25ms latency while ensuring security, reliability, and deployment readiness

Week 1-2: Ultra Low Latency Tuning (P1/P2)

Direct Wayland capture evaluation (if needed)
Custom WebRTC layer evaluation (if needed)
Advanced congestion control (SCReAM/Google CC)
Kernel bypass optimization (DPDK/AF_XDP if needed)
Final latency optimization and tuning

Week 2-3: Security

Authentication (JWT, OAuth)
Encryption (TLS, DTLS)
Session management
Access control
Security audit and penetration testing

Week 3-4: Reliability

Error recovery
Connection health monitoring
Automatic reconnection
Graceful degradation with latency awareness
Failover mechanisms

Week 4-5: Monitoring & Debugging

Real-time latency metrics collection
Per-stage latency tracking
Logging improvements
Debug mode with frame inspection
Performance dashboard with latency visualization
Alerting for latency degradation

Week 5-6: Deployment

Docker containerization
Systemd service
Configuration file with low-latency presets
Installation scripts
Performance tuning documentation

Week 6-7: Testing

Integration tests
Load testing with latency monitoring
Cross-browser testing
Long-running stability tests
Latency regression tests
Automated performance benchmarks

Phase 4 Deliverables:

15-25ms latency on LAN
Production-ready deployment
Security features
Monitoring and observability
Comprehensive testing
Latency regression testing

Phase 5: Advanced Features (Optional) - Ongoing

Potential Features:

Audio capture and streaming
Bidirectional input (mouse, keyboard)
Clipboard sharing
File transfer
Recording (save sessions)
Multi-user sessions
Mobile client support
WebRTC data channels for control
WebRTC insertable streams (client-side effects)
Adaptive resolution
H.265/HEVC encoding
AV1 encoding
Screen region selection
Virtual display support
Wayland virtual pointer protocol

Testing Strategy

Unit Tests

#[cfg(test)]
mod tests {
    use super::*;
    
    #[tokio::test]
    async fn test_dma_buf_lifecycle() {
        let handle = DmaBufHandle::new(/* ... */);
        assert_eq!(handle.ref_count(), 1);
        
        let handle2 = handle.clone();
        assert_eq!(handle.ref_count(), 2);
        
        drop(handle);
        assert_eq!(handle2.ref_count(), 1);
        
        drop(handle2);
        // Buffer freed
    }
    
    #[tokio::test]
    async fn test_encoder_pipeline() {
        let config = EncoderConfig {
            encoder_type: EncoderType::H264_X264,
            bitrate: 2_000_000,
            keyframe_interval: 30,
            preset: EncodePreset::Fast,
        };
        
        let mut encoder = X264Encoder::new(config).unwrap();
        
        let frame = create_test_frame(1920, 1080);
        let encoded = encoder.encode(frame).await.unwrap();
        
        assert!(!encoded.data.is_empty());
        assert!(encoded.is_keyframe);
    }
}

Integration Tests

#[tokio::test]
async fn test_full_pipeline() {
    // Setup
    let capture = WaylandCapture::new(CaptureConfig::default()).await.unwrap();
    let encoder = VaapiEncoder::new(EncoderConfig::default()).unwrap();
    let webrtc = WebRtcServer::new(WebRtcConfig::default()).await.unwrap();
    
    // Run pipeline for 100 frames
    for _ in 0..100 {
        let frame = capture.next_frame().await;
        let encoded = encoder.encode(frame).await.unwrap();
        webrtc.send_video_frame("test-session", encoded).await.unwrap();
    }
    
    // Verify
    assert_eq!(webrtc.frames_sent(), 100);
}

Load Testing

# Simulate 10 concurrent connections
for i in {1..10}; do
    cargo test test_full_pipeline --release &
done
wait

Performance Benchmarks

#[bench]
fn bench_encode_frame(b: &mut Bencher) {
    let mut encoder = X264Encoder::new(config).unwrap();
    let frame = create_test_frame(1920, 1080);
    
    b.iter(|| {
        encoder.encode(frame.clone()).unwrap()
    });
}

Potential Challenges & Solutions

1. Wayland Protocol Limitations

Challenge: Wayland's security model restricts screen capture

Solution:

Use xdg-desktop-portal for permission management
Implement user prompts for capture authorization
Support multiple portal backends (GNOME, KDE, etc.)

pub async fn request_capture_permission() -> Result<bool> {
    let portal = Portal::new().await?;
    let session = portal.create_session(ScreenCaptureType::Monitor).await?;
    
    // User will see a dialog asking for permission
    let sources = portal.request_sources(&session).await?;
    
    Ok(!sources.is_empty())
}

Alternative: Use PipeWire directly with proper authentication

2. Hardware Acceleration Compatibility

Challenge: Different GPUs require different APIs (VA-API, NVENC, etc.)

Solution:

Implement multiple encoder backends
Runtime detection of available encoders
Graceful fallback to software encoding

pub fn detect_best_encoder() -> EncoderType {
    // Try NVENC first (NVIDIA)
    if nvenc::is_available() {
        return EncoderType::H264_NVENC;
    }
    
    // Try VA-API (Intel/AMD)
    if vaapi::is_available() {
        return EncoderType::H264_VAAPI;
    }
    
    // Fallback to software
    EncoderType::H264_X264
}

3. Cross-Browser WebRTC Compatibility

Challenge: Different browsers have different WebRTC implementations

Solution:

Use standardized codecs (H.264, VP8, VP9)
Implement codec negotiation
Provide fallback options

pub fn get_supported_codecs() -> Vec<RTCRtpCodecCapability> {
    vec![
        RTCRtpCodecCapability {
            mime_type: "video/H264".to_string(),
            clock_rate: 90000,
            ..Default::default()
        },
        RTCRtpCodecCapability {
            mime_type: "video/VP9".to_string(),
            clock_rate: 90000,
            ..Default::default()
        },
    ]
}

4. Security and Authentication

Challenge: Secure remote access without exposing desktop to unauthorized users

Solution:

Implement JWT-based authentication
Use DTLS for media encryption
Add rate limiting and access control

pub struct AuthManager {
    secret: String,
    sessions: Arc<Mutex<HashMap<String, Session>>>,
}

impl AuthManager {
    pub fn create_token(&self, user_id: &str) -> Result<String> {
        let claims = Claims {
            sub: user_id.to_string(),
            exp: Utc::now() + chrono::Duration::hours(1),
        };
        
        encode(&Header::default(), &claims, &EncodingKey::from_secret(self.secret.as_ref()))
    }
    
    pub fn validate_token(&self, token: &str) -> Result<Claims> {
        decode::<Claims>(
            token,
            &DecodingKey::from_secret(self.secret.as_ref()),
            &Validation::default(),
        )
        .map(|data| data.claims)
        .map_err(|_| AuthError::InvalidToken)
    }
}

5. Memory Management

Challenge: Avoid memory leaks with DMA-BUF and shared memory

Solution:

Use Rust's ownership system
RAII patterns for resource cleanup
Buffer pools with limits

pub struct ScopedDmaBuf {
    handle: DmaBufHandle,
}

impl Drop for ScopedDmaBuf {
    fn drop(&mut self) {
        // Automatically release DMA-BUF
        // File descriptor closed
        // GPU memory freed
    }
}

// Usage ensures cleanup
{
    let buf = ScopedDmaBuf::new(/* ... */);
    // Use buffer
} // Automatically dropped here

6. Latency Optimization

Challenge: Minimize end-to-end latency

Solution:

Zero-copy pipeline
Hardware acceleration
Adaptive quality
Frame skipping

pub struct LatencyOptimizer {
    target_latency_ms: u32,
    current_latency_ms: u32,
}

impl LatencyOptimizer {
    pub fn adjust_parameters(&mut self) {
        if self.current_latency_ms > self.target_latency_ms {
            // Reduce quality to improve latency
            self.reduce_bitrate();
            self.increase_frame_skipping();
        } else {
            // Increase quality
            self.increase_bitrate();
        }
    }
}

7. Network Conditions

Challenge: Varying bandwidth and network conditions

Solution:

Adaptive bitrate streaming
Multiple quality presets
Congestion control

pub struct BandwidthMonitor {
    measurements: VecDeque<u32>,
    window_size: usize,
}

impl BandwidthMonitor {
    pub fn update(&mut self, bytes_sent: u32, duration: Duration) {
        let bandwidth = bytes_sent * 8 / duration.as_secs() as u32;
        self.measurements.push_back(bandwidth);
        
        if self.measurements.len() > self.window_size {
            self.measurements.pop_front();
        }
    }
    
    pub fn average_bandwidth(&self) -> u32 {
        if self.measurements.is_empty() {
            return 0;
        }
        
        self.measurements.iter().sum::<u32>() / self.measurements.len() as u32
    }
}

8. Cross-Platform Compatibility

Challenge: Support different Linux distributions and desktop environments

Solution:

Containerize application
Detect available technologies at runtime
Provide fallback options

pub fn detect_desktop_environment() -> DesktopEnvironment {
    if std::path::Path::new("/usr/bin/gnome-shell").exists() {
        DesktopEnvironment::GNOME
    } else if std::path::Path::new("/usr/bin/plasmashell").exists() {
        DesktopEnvironment::KDE
    } else {
        DesktopEnvironment::Other
    }
}

pub fn configure_portal_for_env(env: DesktopEnvironment) -> PortalConfig {
    match env {
        DesktopEnvironment::GNOME => PortalConfig::gnome(),
        DesktopEnvironment::KDE => PortalConfig::kde(),
        DesktopEnvironment::Other => PortalConfig::generic(),
    }
}

9. Debugging and Troubleshooting

Challenge: Debugging complex pipeline with multiple components

Solution:

Comprehensive logging
Metrics collection
Debug mode with frame inspection

pub struct DebugLogger {
    enabled: bool,
    output: DebugOutput,
}

pub enum DebugOutput {
    Console,
    File(PathBuf),
    Both,
}

impl DebugLogger {
    pub fn log_frame(&self, frame: &CapturedFrame) {
        if !self.enabled {
            return;
        }
        
        tracing::debug!(
            "Frame: {}x{}, format: {:?}, timestamp: {}",
            frame.width,
            frame.height,
            frame.format,
            frame.timestamp
        );
    }
    
    pub fn log_encoding(&self, encoded: &EncodedFrame) {
        if !self.enabled {
            return;
        }
        
        tracing::debug!(
            "Encoded: {} bytes, keyframe: {}, seq: {}",
            encoded.data.len(),
            encoded.is_keyframe,
            encoded.sequence_number
        );
    }
}

10. Resource Limits

Challenge: Prevent resource exhaustion (CPU, memory, GPU)

Solution:

Limit concurrent sessions
Monitor resource usage
Implement graceful degradation

pub struct ResourceManager {
    max_sessions: usize,
    active_sessions: Arc<Mutex<HashSet<String>>>,
    cpu_threshold: f32,
    memory_threshold: u64,
}

impl ResourceManager {
    pub async fn can_create_session(&self) -> bool {
        let sessions = self.active_sessions.lock().await;
        
        if sessions.len() >= self.max_sessions {
            return false;
        }
        
        if self.cpu_usage() > self.cpu_threshold {
            return false;
        }
        
        if self.memory_usage() > self.memory_threshold {
            return false;
        }
        
        true
    }
}

Code Examples

Main Application Entry Point

// src/main.rs
mod capture;
mod encoder;
mod webrtc;
mod buffer;
mod ipc;
mod config;

use anyhow::Result;
use tracing::{info, error};
use tracing_subscriber;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize logging
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::INFO)
        .init();
    
    // Load configuration
    let config = config::load_config("config.toml")?;
    
    info!("Starting Wayland WebRTC Backend");
    info!("Configuration: {:?}", config);
    
    // Initialize components
    let capture = capture::WaylandCapture::new(config.capture).await?;
    let encoder = encoder::create_encoder(config.encoder)?;
    let webrtc = webrtc::WebRtcServer::new(config.webrtc).await?;
    
    // Create video track
    let video_track = webrtc::create_video_track()?;
    
    // Run capture pipeline
    let session_id = uuid::Uuid::new_v4().to_string();
    webrtc.create_peer_connection(session_id.clone(), video_track).await?;
    
    // Main loop
    loop {
        match capture.next_frame().await {
            Ok(frame) => {
                match encoder.encode(frame).await {
                    Ok(encoded) => {
                        if let Err(e) = webrtc.send_video_frame(&session_id, encoded).await {
                            error!("Failed to send frame: {}", e);
                        }
                    }
                    Err(e) => {
                        error!("Failed to encode frame: {}", e);
                    }
                }
            }
            Err(e) => {
                error!("Failed to capture frame: {}", e);
            }
        }
    }
}

Configuration Example

# config.toml - Low Latency Configuration

[server]
address = "0.0.0.0:8080"
max_sessions = 10
log_level = "info"

[capture]
frame_rate = 60
quality = "high"
screen_region = null
# screen_region = { x = 0, y = 0, width = 1920, height = 1080 }

# Low-latency capture settings
zero_copy = true  # Always use DMA-BUF zero-copy
track_damage = true  # Enable damage tracking
partial_updates = true  # Encode only damaged regions
buffer_pool_size = 3  # Small buffer pool for low latency

[encoder]
type = "auto"
# Options: "vaapi", "nvenc", "x264", "auto"
bitrate = 4000000
keyframe_interval = 15  # Short GOP for low latency
preset = "ultrafast"  # Ultrafast for minimal latency

# Low-latency specific settings
[encoder.low_latency]
gop_size = 15  # Short GOP
b_frames = 0  # No B-frames for low latency
lookahead = 0  # Minimal lookahead
rc_mode = "CBR"  # Constant bitrate for predictable latency
vbv_buffer_size = 4000000  # 1 second VBV buffer
max_bitrate = 4000000  # Tight max constraint
intra_period = 15  # Keyframe interval

# Quality vs latency presets
[encoder.quality_presets]
low = { bitrate = 1000000, preset = "ultrafast", gop_size = 30 }
medium = { bitrate = 2000000, preset = "ultrafast", gop_size = 20 }
high = { bitrate = 4000000, preset = "ultrafast", gop_size = 15 }
ultra = { bitrate = 8000000, preset = "veryfast", gop_size = 15 }

[webrtc]
stun_servers = ["stun:stun.l.google.com:19302"]
turn_servers = []

[webrtc.low_latency]
# Critical: Minimal playout delay
playout_delay_min_ms = 0  # No minimum delay
playout_delay_max_ms = 20  # 20ms maximum delay

# Bitrate settings
max_bitrate = 8000000  # 8 Mbps max
min_bitrate = 500000  # 500 Kbps min
start_bitrate = 4000000  # 4 Mbps start

# Packetization
rtp_payload_size = 1200  # Smaller packets for lower latency
packetization_mode = "non_interleaved"

# Retransmission settings
nack_enabled = true  # Enable NACK
nack_window_size_ms = 20  # Only request retransmission for recent packets
max_nack_packets_per_second = 50
fec_enabled = false  # Disable FEC for low latency

# Congestion control
transport_cc_enabled = true  # Enable transport-wide congestion control

# RTCP settings
rtcp_report_interval_ms = 50  # Frequent RTCP feedback

[webrtc.ice]
transport_policy = "all"
candidate_network_types = ["udp", "tcp"]

[buffer]
# Minimal buffering for low latency
dma_buf_pool_size = 3  # Small pool
encoded_buffer_pool_size = 5  # Very small pool
sender_buffer_max_frames = 1  # Single frame sender buffer
jitter_buffer_target_delay_ms = 5  # 5ms target jitter buffer
jitter_buffer_max_delay_ms = 10  # 10ms maximum jitter buffer

[latency]
# Latency targets
target_latency_ms = 25  # Target end-to-end latency
max_acceptable_latency_ms = 50  # Maximum acceptable latency

# Adaptive settings
adaptive_frame_rate = true
adaptive_bitrate = true
fast_frame_drop = true

# Frame dropping strategy
consecutive_drop_limit = 3  # Max consecutive drops
drop_threshold_ms = 5  # Drop if queue latency exceeds this

[monitoring]
enable_metrics = true
enable_latency_tracking = true
metrics_port = 9090

Performance Targets

Latency Targets

Local Network (LAN)

Scenario	Target	Acceptable
Core Target	25-35ms	15-25ms (Excellent)
Minimum	<50ms	15-25ms (Excellent)
User Experience	<16ms: Imperceptible	16-33ms: Very Smooth
User Experience	33-50ms: Good	50-100ms: Acceptable

Internet

Scenario	Target	Acceptable
Excellent	40-60ms	<80ms
Good	60-80ms	<100ms

Performance Metrics by Phase

Metric	MVP	Phase 2	Phase 3	Phase 4
FPS (LAN)	30	60	60	60
FPS (Internet)	15-20	30	30-60	60
Resolution	720p	1080p	1080p/4K	1080p/4K
Latency (LAN)	<100ms	<50ms	25-35ms	15-25ms
Latency (Internet)	<200ms	<100ms	60-80ms	40-60ms
CPU Usage	20-30%	10-15%	5-10%	<5%
Memory Usage	150MB	250MB	400MB	<400MB
Bitrate	2-4 Mbps	4-8 Mbps	Adaptive	Adaptive
Concurrent Sessions	1	3-5	5-10	10+

Latency Budget Allocation (15-25ms Target)

Component	Time (ms)	Percentage	Optimization Strategy
Wayland Capture	2-3	12-15%	DMA-BUF zero-copy, partial update
Encoder	3-5	20-25%	Hardware encoder, no B-frames
Packetization	1-2	6-10%	Inline RTP, minimal buffering
Network (LAN)	0.5-1	3-5%	UDP direct path, kernel bypass
Jitter Buffer	0-2	0-10%	Minimal buffer, predictive jitter
Decoder	1-2	6-10%	Hardware acceleration
Display	1-2	6-10%	vsync bypass, direct scanout
Total	15-25	100%

Conclusion

This design provides a comprehensive blueprint for building an ultra-low-latency Wayland → WebRTC remote desktop backend in Rust. Key highlights:

Zero-Copy Architecture: Minimizes CPU copies through DMA-BUF and reference-counted buffers, achieving <5ms copy overhead
Hardware Acceleration: VA-API/NVENC encoders configured for <5ms encoding latency
Minimal Buffering: Single-frame sender buffer and 0-10ms jitter buffer throughout pipeline
Low-Latency WebRTC: Custom configuration with 0-20ms playout delay, no FEC, limited NACK
Performance: Targets 15-25ms latency on local networks at 60 FPS
Adaptive Quality: Dynamic frame rate (30-60fps) and bitrate adjustment based on network conditions
Damage Tracking: Partial region updates for static content to reduce encoding load

Latency Budget Breakdown (15-25ms target):

Capture: 2-3ms (DMA-BUF zero-copy)
Encoder: 3-5ms (hardware, no B-frames)
Packetization: 1-2ms (inline RTP)
Network: 0.5-1ms (LAN)
Jitter Buffer: 0-2ms (minimal)
Decoder: 1-2ms (hardware)
Display: 1-2ms (vsync bypass)

The phased implementation approach allows for incremental development and testing:

86 KiB Raw Permalink Blame History

Wayland → WebRTC Remote Desktop Backend

Technical Design Document

Table of Contents

System Architecture

High-Level Architecture

Component Breakdown

1. Capture Manager

2. Encoder Pipeline

3. WebRTC Transport

4. Zero-Copy Buffer Manager

Data Flow

Technology Stack

Core Dependencies

Encoder Options

WebRTC Libraries

Key Components Design

1. Wayland Screen Capture Module

2. Frame Buffer Management (Zero-Copy)

3. Video Encoder Integration

4. WebRTC Signaling and Data Transport

5. IPC Layer (Optional)

Data Flow Optimization

Zero-Copy Pipeline Stages

Memory Ownership Transfer

Buffer Sharing Mechanisms

1. DMA-BUF (Primary)

2. Shared Memory (Fallback)

3. Memory-Mapped Files (Alternative)

Pipeline Optimization Strategies

1. Parallel Encoding

2. Frame Skipping

3. Region of Interest (ROI)

Low Latency Optimization

Design Philosophy

1. Encoder Optimization

Hardware Encoder Configuration

Recommended Encoder Settings

Dynamic Bitrate vs Latency Trade-offs

2. Capture Optimization

PipeWire DMA-BUF Zero-Copy

Partial Region Encoding

3. WebRTC Transport Layer Optimization

Low-Latency WebRTC Configuration

Packet Loss Handling Strategy

NACK vs FEC Selection

4. Frame Rate and Buffer Strategy

Dynamic Frame Rate Adjustment

Fast Frame Dropping Strategy

Minimal Buffering

5. Architecture Adjustments

Single-Threaded vs Multi-Threaded

Lock Competition Minimization

Async Task Scheduling

6. Technology Stack Adjustments

Encoder Selection for Latency

Direct Wayland vs PipeWire

webrtc-rs Latency Characteristics

7. Implementation Priority

P0 (Must-Have for MVP)

P1 (Important for 15-25ms)

P2 (Nice-to-Have)

8. Testing and Validation

End-to-End Latency Measurement

Performance Benchmarking

Tuning Strategy

Implementation Roadmap (Updated for Low Latency)

Phase 1: MVP (Minimum Viable Product) - 4-6 weeks

Phase 2: Hardware Acceleration - 3-4 weeks

Phase 3: Low Latency Optimization - 4-5 weeks

Phase 4: Production Ready with Ultra Low Latency - 5-7 weeks

Phase 5: Advanced Features (Optional) - Ongoing

Testing Strategy

Unit Tests

Integration Tests

Load Testing

Performance Benchmarks

Potential Challenges & Solutions

1. Wayland Protocol Limitations

2. Hardware Acceleration Compatibility

86 KiB

Raw Permalink Blame History