# Wayland → WebRTC Remote Desktop Backend
## Technical Design Document

## Table of Contents
1. [System Architecture](#system-architecture)
2. [Technology Stack](#technology-stack)
3. [Key Components Design](#key-components-design)
4. [Data Flow Optimization](#data-flow-optimization)
5. [Low Latency Optimization](#low-latency-optimization)
6. [Implementation Roadmap](#implementation-roadmap)
7. [Potential Challenges & Solutions](#potential-challenges--solutions)

---

## System Architecture

### High-Level Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                          Client Browser                              │
│                    (WebRTC Receiver)                                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │ WebRTC (UDP/TCP)
                              │ Signaling (WebSocket/HTTP)
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Signaling Server                              │
│                    (WebSocket/WebSocket Secure)                     │
│                      - Session Management                           │
│                      - SDP Exchange                                  │
│                      - ICE Candidates                                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Rust Backend Server                               │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Capture    │    │   Encoder    │    │  WebRTC      │          │
│  │   Manager    │───▶│  Pipeline    │───▶│  Transport   │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                  │                  │                      │
│         ▼                  ▼                  ▼                      │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   PipeWire   │    │   Video      │    │  Data        │          │
│  │   Portal     │    │   Encoder    │    │  Channels    │          │
│  │   (xdg-      │    │   (H.264/    │    │  (Input/     │          │
│  │   desktop-   │    │   H.265/VP9) │    │   Control)   │          │
│  │   portal)    │    └──────────────┘    └──────────────┘          │
│  └──────────────┘                                                     │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                Zero-Copy Buffer Manager                     │    │
│  │           - DMA-BUF Import/Export                            │    │
│  │           - Shared Memory Pools                              │    │
│  │           - Memory Ownership Tracking                         │    │
│  └─────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    Wayland Compositor                                │
│                  (PipeWire Screen Sharing)                           │
└─────────────────────────────────────────────────────────────────────┘
```

### Component Breakdown

#### 1. Capture Manager
**Responsibilities:**
- Interface with PipeWire xdg-desktop-portal
- Request screen capture permissions
- Receive DMA-BUF frames
- Manage frame buffer lifecycle

**Key Technologies:**
- `pipewire` crate for PipeWire protocol
- `wayland-client` for Wayland protocol
- `ashpd` for desktop portals

```rust
pub struct CaptureManager {
    pipewire_connection: Rc<PipewireConnection>,
    stream_handle: Option<StreamHandle>,
    frame_sender: async_channel::Sender<CapturedFrame>,
    config: CaptureConfig,
}

pub struct CaptureConfig {
    pub frame_rate: u32,
    pub quality: QualityLevel,
    pub screen_region: Option<ScreenRegion>,
}

pub enum QualityLevel {
    Low,
    Medium,
    High,
    Ultra,
}

pub struct CapturedFrame {
    pub dma_buf: DmaBufHandle,
    pub width: u32,
    pub height: u32,
    pub format: PixelFormat,
    pub timestamp: u64,
}
```

#### 2. Encoder Pipeline
**Responsibilities:**
- Receive raw frames from capture
- Encode to H.264/H.265/VP9
- Hardware acceleration (VA-API, NVENC, VideoToolbox)
- Bitrate adaptation

**Zero-Copy Strategy:**
- Direct DMA-BUF to encoder (no CPU copies)
- Encoder outputs to memory-mapped buffers
- WebRTC consumes encoded buffers directly

```rust
pub struct EncoderPipeline {
    encoder: Box<dyn VideoEncoder>,
    config: EncoderConfig,
    stats: EncoderStats,
}

pub trait VideoEncoder: Send + Sync {
    fn encode_frame(
        &mut self,
        frame: CapturedFrame,
    ) -> Result<EncodedFrame, EncoderError>;
    
    fn set_bitrate(&mut self, bitrate: u32) -> Result<(), EncoderError>;
    
    fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}

pub struct EncodedFrame {
    pub data: Bytes, // Zero-copy Bytes wrapper
    pub is_keyframe: bool,
    pub timestamp: u64,
    pub sequence_number: u64,
}
```

#### 3. WebRTC Transport
**Responsibilities:**
- WebRTC peer connection management
- Media track (video) and data channels
- RTP packetization
- ICE/STUN/TURN handling
- Congestion control

**Libraries:**
- `webrtc` crate (webrtc-rs) or custom WebRTC implementation

```rust
pub struct WebRtcTransport {
    peer_connection: RTCPeerConnection,
    video_track: RTCVideoTrack,
    data_channel: Option<RTCDataChannel>,
    config: WebRtcConfig,
}

pub struct WebRtcConfig {
    pub stun_servers: Vec<String>,
    pub turn_servers: Vec<TurnServer>,
    pub ice_transport_policy: IceTransportPolicy,
}

pub struct TurnServer {
    pub urls: Vec<String>,
    pub username: String,
    pub credential: String,
}
```

#### 4. Zero-Copy Buffer Manager
**Responsibilities:**
- Manage DMA-BUF lifecycle
- Pool pre-allocated memory
- Track ownership via Rust types
- Coordinate with PipeWire memory pools

```rust
pub struct BufferManager {
    dma_buf_pool: Pool<DmaBufHandle>,
    encoded_buffer_pool: Pool<Bytes>,
    max_buffers: usize,
}

impl BufferManager {
    pub fn acquire_dma_buf(&self) -> Option<DmaBufHandle> {
        self.dma_buf_pool.acquire()
    }
    
    pub fn release_dma_buf(&self, handle: DmaBufHandle) {
        self.dma_buf_pool.release(handle)
    }
    
    pub fn acquire_encoded_buffer(&self, size: usize) -> Option<Bytes> {
        self.encoded_buffer_pool.acquire_with_size(size)
    }
}
```

### Data Flow

```
Wayland Compositor
        │
        │ DMA-BUF (GPU memory)
        ▼
PipeWire Portal
        │
        │ DMA-BUF file descriptor
        ▼
Capture Manager
        │
        │ CapturedFrame { dma_buf, ... }
        │ (Zero-copy ownership transfer)
        ▼
Buffer Manager
        │
        │ DmaBufHandle (moved, not copied)
        ▼
Encoder Pipeline
        │
        │ EncodedFrame { data: Bytes, ... }
        │ (Zero-copy Bytes wrapper)
        ▼
WebRTC Transport
        │
        │ RTP Packets (reference to Bytes)
        ▼
Network (UDP/TCP)
        │
        ▼
Client Browser
```

---

## Technology Stack

### Core Dependencies

```toml
[dependencies]
# Async Runtime
tokio = { version = "1.35", features = ["full", "rt-multi-thread"] }
async-trait = "0.1"

# Wayland & PipeWire
wayland-client = "0.31"
wayland-protocols = "0.31"
pipewire = "0.8"
ashpd = "0.8"

# Video Encoding (Low Latency)
openh264 = { version = "0.6", optional = true }
x264 = { version = "0.4", optional = true }
nvenc = { version = "0.1", optional = true }
vpx = { version = "0.1", optional = true }

# Hardware Acceleration (Low Latency)
libva = { version = "0.14", optional = true }     # VA-API
nvidia-encode = { version = "0.5", optional = true }  # NVENC

# WebRTC (Low Latency Configuration)
webrtc = "0.11"  # webrtc-rs

# Memory & Zero-Copy
bytes = "1.5"
memmap2 = "0.9"
shared_memory = "0.12"

# Lock-free data structures for minimal contention
crossbeam = { version = "0.8", features = ["std"] }
crossbeam-channel = "0.5"
crossbeam-queue = "0.3"
parking_lot = "0.12"  # Faster mutexes

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

# Logging & Tracing
tracing = "0.1"
tracing-subscriber = "0.3"
tracing-opentelemetry = "0.22"  # For latency monitoring

# Metrics & Monitoring
prometheus = { version = "0.13", optional = true }
metrics = "0.21"

# Error Handling
anyhow = "1.0"
thiserror = "1.0"

# Utilities
regex = "1.10"
uuid = { version = "1.6", features = ["v4", "serde", "fast-rng"] }
instant = "0.1"  # High-precision timing

[features]
default = ["software-encoder", "webrtc-rs"]

# Encoder Options
software-encoder = ["x264", "openh264"]
hardware-vaapi = ["libva"]
hardware-nvenc = ["nvidia-encode"]
all-encoders = ["software-encoder", "hardware-vaapi", "hardware-nvenc"]

# WebRTC Implementation
webrtc-rs = ["webrtc"]
custom-webrtc = []

# Low Latency Features
low-latency = []
ultra-low-latency = ["low-latency", "all-encoders"]

# Monitoring
monitoring = ["prometheus", "tracing-opentelemetry"]

# Development
dev = ["monitoring", "all-encoders"]
```

### Encoder Options

| Encoder | Hardware | Performance | Quality | License | Use Case |
|---------|----------|-------------|---------|---------|----------|
| H.264 (x264) | CPU | Medium | High | GPL | Fallback |
| H.264 (VA-API) | GPU | High | Medium | Open Source | Linux Intel/AMD |
| H.264 (NVENC) | GPU (NVIDIA) | Very High | High | Proprietary | NVIDIA GPUs |
| H.265 (HEVC) | GPU | High | Very High | Mixed | Bandwidth-constrained |
| VP9 | CPU/GPU | Medium | High | BSD | Open Web |
| AV1 | GPU | Medium | Very High | Open Source | Future-proof |

**Recommended Primary:** VA-API H.264 (Linux), NVENC H.264 (NVIDIA)
**Recommended Fallback:** x264 H.264 (software)

### WebRTC Libraries

**Option 1: webrtc-rs** (Recommended)
- Pure Rust implementation
- Active development
- Good WebRTC spec compliance
- Zero-copy support for media

**Option 2: Custom Implementation**
- Use `webrtc` crate as base
- Add specialized zero-copy optimizations
- Tighter integration with encoder pipeline

---

## Key Components Design

### 1. Wayland Screen Capture Module

```rust
// src/capture/mod.rs
use pipewire as pw;
use pipewire::properties;
use pipewire::spa::param::format::Format;
use pipewire::stream::StreamFlags;
use async_channel::{Sender, Receiver};

pub struct WaylandCapture {
    core: pw::Core,
    context: pw::Context,
    main_loop: pw::MainLoop,
    stream: pw::stream::Stream,
    frame_sender: Sender<CapturedFrame>,
    frame_receiver: Receiver<CapturedFrame>,
}

impl WaylandCapture {
    pub async fn new(config: CaptureConfig) -> Result<Self, CaptureError> {
        let main_loop = pw::MainLoop::new()?;
        let context = pw::Context::new(&main_loop)?;
        let core = context.connect(None)?;
        
        // Request screen capture via xdg-desktop-portal
        let portal = Portal::new().await?;
        let session = portal.create_session(ScreenCaptureType::Monitor).await?;
        let sources = portal.request_sources(&session).await?;
        
        let (sender, receiver) = async_channel::bounded(30);
        
        Ok(Self {
            core,
            context,
            main_loop,
            stream: Self::create_stream(&context, &session, sender.clone())?,
            frame_sender: sender,
            frame_receiver: receiver,
        })
    }
    
    fn create_stream(
        context: &pw::Context,
        session: &Session,
        sender: Sender<CapturedFrame>,
    ) -> Result<pw::stream::Stream, pw::Error> {
        let mut stream = pw::stream::Stream::new(
            context,
            "wl-webrtc-capture",
            properties! {
                *pw::keys::MEDIA_TYPE => "Video",
                *pw::keys::MEDIA_CATEGORY => "Capture",
                *pw::keys::MEDIA_ROLE => "Screen",
            },
        )?;
        
        stream.connect(
            pw::spa::direction::Direction::Input,
            None,
            StreamFlags::AUTOCONNECT | StreamFlags::MAP_BUFFERS,
        )?;
        
        // Set up callback for new frames (zero-copy DMA-BUF)
        let listener = stream.add_local_listener()?;
        listener
            .register(pw::stream::events::Events::param_done, |data| {
                // Handle stream parameter changes
            })
            .register(pw::stream::events::Events::process, |data| {
                // Process new frame - DMA-BUF is already mapped
                Self::process_frame(data, sender.clone());
            })?;
        
        Ok(stream)
    }
    
    fn process_frame(
        stream: &pw::stream::Stream,
        sender: Sender<CapturedFrame>,
    ) {
        // Get buffer without copying - DMA-BUF is in GPU memory
        let buffer = stream.dequeue_buffer().expect("no buffer");
        let datas = buffer.datas();
        let data = &datas[0];
        
        // Create zero-copy frame
        let frame = CapturedFrame {
            dma_buf: DmaBufHandle::from_buffer(buffer),
            width: stream.format().unwrap().size().width,
            height: stream.format().unwrap().size().height,
            format: PixelFormat::from_spa_format(&stream.format().unwrap()),
            timestamp: timestamp_ns(),
        };
        
        // Send frame (ownership transferred via move)
        let _ = sender.try_send(frame);
    }
    
    pub async fn next_frame(&self) -> CapturedFrame {
        self.frame_receiver.recv().await.unwrap()
    }
}

// Zero-copy DMA-BUF handle
pub struct DmaBufHandle {
    fd: RawFd,
    size: usize,
    stride: u32,
    offset: u32,
}

impl DmaBufHandle {
    pub fn from_buffer(buffer: &pw::buffer::Buffer) -> Self {
        let data = &buffer.datas()[0];
        Self {
            fd: data.fd().unwrap(),
            size: data.chunk().size() as usize,
            stride: data.chunk().stride(),
            offset: data.chunk().offset(),
        }
    }
    
    pub unsafe fn as_ptr(&self) -> *mut u8 {
        // Memory map the DMA-BUF
        let ptr = libc::mmap(
            ptr::null_mut(),
            self.size,
            libc::PROT_READ,
            libc::MAP_SHARED,
            self.fd,
            self.offset as i64,
        );
        
        if ptr == libc::MAP_FAILED {
            panic!("Failed to mmap DMA-BUF");
        }
        
        ptr as *mut u8
    }
}

impl Drop for DmaBufHandle {
    fn drop(&mut self) {
        // Unmap and close FD when handle is dropped
        unsafe {
            libc::munmap(ptr::null_mut(), self.size);
            libc::close(self.fd);
        }
    }
}
```

### 2. Frame Buffer Management (Zero-Copy)

```rust
// src/buffer/mod.rs
use bytes::Bytes;
use std::sync::Arc;
use std::collections::VecDeque;

pub struct FrameBufferPool {
    dma_bufs: VecDeque<DmaBufHandle>,
    encoded_buffers: VecDeque<Bytes>,
    max_dma_bufs: usize,
    max_encoded: usize,
}

impl FrameBufferPool {
    pub fn new(max_dma_bufs: usize, max_encoded: usize) -> Self {
        Self {
            dma_bufs: VecDeque::with_capacity(max_dma_bufs),
            encoded_buffers: VecDeque::with_capacity(max_encoded),
            max_dma_bufs,
            max_encoded,
        }
    }
    
    pub fn acquire_dma_buf(&mut self) -> Option<DmaBufHandle> {
        self.dma_bufs.pop_front()
    }
    
    pub fn release_dma_buf(&mut self, buf: DmaBufHandle) {
        if self.dma_bufs.len() < self.max_dma_bufs {
            self.dma_bufs.push_back(buf);
        }
        // Else: Drop the buffer, let OS reclaim DMA-BUF
    }
    
    pub fn acquire_encoded_buffer(&mut self, size: usize) -> Bytes {
        // Try to reuse existing buffer
        if let Some(mut buf) = self.encoded_buffers.pop_front() {
            if buf.len() >= size {
                // Slice to requested size (zero-copy view)
                return buf.split_to(size);
            }
        }
        
        // Allocate new buffer if needed
        Bytes::from(vec![0u8; size])
    }
    
    pub fn release_encoded_buffer(&mut self, buf: Bytes) {
        if self.encoded_buffers.len() < self.max_encoded {
            self.encoded_buffers.push_back(buf);
        }
        // Else: Drop the buffer, memory freed
    }
}

// Zero-copy frame wrapper
pub struct ZeroCopyFrame {
    pub data: Bytes, // Reference-counted, no copying
    pub metadata: FrameMetadata,
}

pub struct FrameMetadata {
    pub width: u32,
    pub height: u32,
    pub format: PixelFormat,
    pub timestamp: u64,
    pub is_keyframe: bool,
}

// Smart pointer for DMA-BUF
pub struct DmaBufPtr {
    ptr: *mut u8,
    len: usize,
    _marker: PhantomData<&'static mut [u8]>,
}

impl DmaBufPtr {
    pub unsafe fn new(ptr: *mut u8, len: usize) -> Self {
        Self {
            ptr,
            len,
            _marker: PhantomData,
        }
    }
    
    pub fn as_slice(&self) -> &[u8] {
        unsafe { std::slice::from_raw_parts(self.ptr, self.len) }
    }
}

unsafe impl Send for DmaBufPtr {}
unsafe impl Sync for DmaBufPtr {}

impl Drop for DmaBufPtr {
    fn drop(&mut self) {
        // Memory will be unmapped by DmaBufHandle's Drop
    }
}
```

### 3. Video Encoder Integration

```rust
// src/encoder/mod.rs
use async_trait::async_trait;

pub enum EncoderType {
    H264_VAAPI,
    H264_NVENC,
    H264_X264,
    VP9_VAAPI,
}

pub struct EncoderConfig {
    pub encoder_type: EncoderType,
    pub bitrate: u32,
    pub keyframe_interval: u32,
    pub preset: EncodePreset,
}

pub enum EncodePreset {
    Ultrafast,
    Superfast,
    Veryfast,
    Faster,
    Fast,
    Medium,
    Slow,
    Slower,
    Veryslow,
}

#[async_trait]
pub trait VideoEncoder: Send + Sync {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError>;
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError>;
    async fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}

pub struct VaapiEncoder {
    display: va::Display,
    context: va::Context,
    config: EncoderConfig,
    sequence_number: u64,
}

impl VaapiEncoder {
    pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
        let display = va::Display::open(None)?;
        let context = va::Context::new(&display)?;
        
        Ok(Self {
            display,
            context,
            config,
            sequence_number: 0,
        })
    }
}

#[async_trait]
impl VideoEncoder for VaapiEncoder {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
        // Zero-copy: Import DMA-BUF directly into VA-API surface
        let surface = unsafe {
            self.context.import_dma_buf(
                frame.dma_buf.fd,
                frame.width,
                frame.height,
                frame.format.as_va_format(),
            )?
        };
        
        // Encode frame (hardware accelerated)
        let encoded_data = self.context.encode_surface(surface)?;
        
        // Create zero-copy Bytes wrapper
        let bytes = Bytes::from(encoded_data);
        
        self.sequence_number += 1;
        
        Ok(EncodedFrame {
            data: bytes,
            is_keyframe: surface.is_keyframe(),
            timestamp: frame.timestamp,
            sequence_number: self.sequence_number,
        })
    }
    
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
        self.config = config;
        self.context.set_bitrate(config.bitrate)?;
        self.context.set_preset(config.preset)?;
        Ok(())
    }
    
    async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
        self.context.force_keyframe()?;
        Ok(())
    }
}

// Fallback software encoder
pub struct X264Encoder {
    encoder: x264::Encoder,
    config: EncoderConfig,
    sequence_number: u64,
}

impl X264Encoder {
    pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
        let params = x264::Params::default();
        params.set_width(1920);
        params.set_height(1080);
        params.set_fps(60, 1);
        params.set_bitrate(config.bitrate);
        params.set_preset(config.preset);
        params.set_tune("zerolatency");
        
        let encoder = x264::Encoder::open(&params)?;
        
        Ok(Self {
            encoder,
            config,
            sequence_number: 0,
        })
    }
}

#[async_trait]
impl VideoEncoder for X264Encoder {
    async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
        // Map DMA-BUF to CPU memory (one-time copy)
        let ptr = unsafe { frame.dma_buf.as_ptr() };
        let slice = unsafe { std::slice::from_raw_parts(ptr, frame.dma_buf.size) };
        
        // Convert to YUV if needed
        let yuv_frame = self.convert_to_yuv(slice, frame.width, frame.height)?;
        
        // Encode frame
        let encoded_data = self.encoder.encode(&yuv_frame)?;
        
        self.sequence_number += 1;
        
        Ok(EncodedFrame {
            data: Bytes::from(encoded_data),
            is_keyframe: self.encoder.is_keyframe(),
            timestamp: frame.timestamp,
            sequence_number: self.sequence_number,
        })
    }
    
    async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
        self.config = config;
        // Reopen encoder with new params
        Ok(())
    }
    
    async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
        self.encoder.force_keyframe();
        Ok(())
    }
}
```

### 4. WebRTC Signaling and Data Transport

```rust
// src/webrtc/mod.rs
use webrtc::{
    api::APIBuilder,
    ice_transport::ice_server::RTCIceServer,
    media_track::{track_local::track_local_static_sample::TrackLocalStaticSample, TrackLocal},
    peer_connection::{
        configuration::RTCConfiguration,
        peer_connection_state::RTCPeerConnectionState,
        sdp::session_description::RTCSessionDescription,
        RTCPeerConnection,
    },
    rtp_transceiver::rtp_codec::RTCRtpCodecCapability,
};

pub struct WebRtcServer {
    api: webrtc::API,
    peer_connections: Arc<Mutex<HashMap<String, PeerConnection>>>,
    signaling_server: SignalingServer,
}

impl WebRtcServer {
    pub async fn new(config: WebRtcConfig) -> Result<Self, WebRtcError> {
        let mut api = APIBuilder::new().build();
        
        let signaling_server = SignalingServer::new(config.signaling_addr).await?;
        
        Ok(Self {
            api,
            peer_connections: Arc::new(Mutex::new(HashMap::new())),
            signaling_server,
        })
    }
    
    pub async fn create_peer_connection(
        &self,
        session_id: String,
        video_track: TrackLocalStaticSample,
    ) -> Result<String, WebRtcError> {
        let config = RTCConfiguration {
            ice_servers: vec![RTCIceServer {
                urls: self.signaling_server.stun_servers(),
                ..Default::default()
            }],
            ..Default::default()
        };
        
        let pc = self.api.new_peer_connection(config).await?;
        
        // Add video track
        let rtp_transceiver = pc
            .add_track(Arc::new(video_track))
            .await?;
        
        // Set ICE candidate handler
        let peer_connections = self.peer_connections.clone();
        pc.on_ice_candidate(Box::new(move |candidate| {
            let peer_connections = peer_connections.clone();
            Box::pin(async move {
                if let Some(candidate) = candidate {
                    // Send candidate to signaling server
                    // ...
                }
            })
        }))
        .await;
        
        // Store peer connection
        self.peer_connections
            .lock()
            .await
            .insert(session_id.clone(), PeerConnection::new(pc));
        
        Ok(session_id)
    }
    
    pub async fn send_video_frame(
        &self,
        session_id: &str,
        frame: EncodedFrame,
    ) -> Result<(), WebRtcError> {
        let peer_connections = self.peer_connections.lock().await;
        
        if let Some(peer) = peer_connections.get(session_id) {
            peer.video_track.write_sample(&webrtc::media::Sample {
                data: frame.data.to_vec(),
                duration: std::time::Duration::from_nanos(frame.timestamp),
                ..Default::default()
            }).await?;
        }
        
        Ok(())
    }
}

pub struct PeerConnection {
    pc: RTCPeerConnection,
    video_track: Arc<TrackLocalStaticSample>,
    data_channel: Option<Arc<RTCDataChannel>>,
}

impl PeerConnection {
    pub async fn create_offer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
        let offer = self.pc.create_offer(None).await?;
        self.pc.set_local_description(offer.clone()).await?;
        Ok(offer)
    }
    
    pub async fn set_remote_description(
        &mut self,
        desc: RTCSessionDescription,
    ) -> Result<(), WebRtcError> {
        self.pc.set_remote_description(desc).await
    }
    
    pub async fn create_answer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
        let answer = self.pc.create_answer(None).await?;
        self.pc.set_local_description(answer.clone()).await?;
        Ok(answer)
    }
}

// Data channel for input/control
pub struct DataChannelManager {
    channels: HashMap<String, Arc<RTCDataChannel>>,
}

impl DataChannelManager {
    pub async fn send_input(&self, channel_id: &str, input: InputEvent) -> Result<()> {
        if let Some(channel) = self.channels.get(channel_id) {
            let data = serde_json::to_vec(&input)?;
            channel.send(&data).await?;
        }
        Ok(())
    }
    
    pub fn on_input<F>(&mut self, channel_id: String, callback: F)
    where
        F: Fn(InputEvent) + Send + Sync + 'static,
    {
        if let Some(channel) = self.channels.get(&channel_id) {
            channel.on_message(Box::new(move |msg| {
                if let Ok(input) = serde_json::from_slice::<InputEvent>(&msg.data) {
                    callback(input);
                }
            })).unwrap();
        }
    }
}

#[derive(Debug, Serialize, Deserialize)]
pub enum InputEvent {
    MouseMove { x: f32, y: f32 },
    MouseClick { button: MouseButton },
    KeyPress { key: String },
    KeyRelease { key: String },
}

#[derive(Debug, Serialize, Deserialize)]
pub enum MouseButton {
    Left,
    Right,
    Middle,
}
```

### 5. IPC Layer (Optional)

```rust
// src/ipc/mod.rs
use tokio::net::UnixListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};

pub struct IpcServer {
    listener: UnixListener,
}

impl IpcServer {
    pub async fn new(socket_path: &str) -> Result<Self> {
        // Remove existing socket if present
        let _ = std::fs::remove_file(socket_path);
        
        let listener = UnixListener::bind(socket_path)?;
        
        Ok(Self { listener })
    }
    
    pub async fn run(&self, sender: async_channel::Sender<IpcMessage>) {
        loop {
            match self.listener.accept().await {
                Ok((mut stream, _)) => {
                    let sender = sender.clone();
                    tokio::spawn(async move {
                        let mut buf = [0; 1024];
                        loop {
                            match stream.read(&mut buf).await {
                                Ok(0) => break,
                                Ok(n) => {
                                    if let Ok(msg) = serde_json::from_slice::<IpcMessage>(&buf[..n]) {
                                        let _ = sender.send(msg).await;
                                    }
                                }
                                Err(_) => break,
                            }
                        }
                    });
                }
                Err(_) => continue,
            }
        }
    }
}

#[derive(Debug, Serialize, Deserialize)]
pub enum IpcMessage {
    StartCapture { session_id: String },
    StopCapture { session_id: String },
    SetQuality { level: QualityLevel },
    GetStatus,
}
```

---

## Data Flow Optimization

### Zero-Copy Pipeline Stages

```
Stage 1: Capture
  Input: Wayland Compositor (GPU memory)
  Output: DMA-BUF file descriptor
  Copy: None (Zero-copy)
  
Stage 2: Buffer Manager
  Input: DMA-BUF FD
  Output: DmaBufHandle (RAII wrapper)
  Copy: None (Zero-copy ownership transfer)
  
Stage 3: Encoder
  Input: DmaBufHandle
  Output: Bytes (reference-counted)
  Copy: None (DMA-BUF imported directly to GPU encoder)
  
Stage 4: WebRTC
  Input: Bytes
  Output: RTP packets (references to Bytes)
  Copy: None (Zero-copy to socket buffers)
  
Stage 5: Network
  Input: RTP packets
  Output: UDP datagrams
  Copy: Minimal (kernel space only)
```

### Memory Ownership Transfer

```rust
// Example: Ownership transfer through pipeline
async fn process_frame_pipeline(
    mut capture: WaylandCapture,
    mut encoder: VaapiEncoder,
    mut webrtc: WebRtcServer,
) -> Result<()> {
    loop {
        // Stage 1: Capture (ownership moves from PipeWire to our code)
        let frame = capture.next_frame().await; // CapturedFrame owns DmaBufHandle
        
        // Stage 2: Encode (ownership moved, not copied)
        let encoded = encoder.encode(frame).await?; // EncodedFrame owns Bytes
        
        // Stage 3: Send (Bytes is reference-counted, no copy)
        webrtc.send_video_frame("session-123", encoded).await?;
        
        // Ownership transferred all the way without copying
    }
}
```

### Buffer Sharing Mechanisms

#### 1. DMA-BUF (Primary)
- GPU memory buffers
- Exported as file descriptors
- Zero-copy to hardware encoders
- Limited to same GPU/driver

```rust
pub fn export_dma_buf(surface: &va::Surface) -> Result<DmaBufHandle> {
    let fd = surface.export_dma_buf()?;
    Ok(DmaBufHandle {
        fd,
        size: surface.size(),
        stride: surface.stride(),
        offset: 0,
    })
}
```

#### 2. Shared Memory (Fallback)
- POSIX shared memory (shm_open)
- For software encoding path
- Copy from DMA-BUF to shared memory

```rust
pub fn create_shared_buffer(size: usize) -> Result<SharedBuffer> {
    let name = format!("/wl-webrtc-{}", uuid::Uuid::new_v4());
    let fd = shm_open(&name, O_CREAT | O_RDWR, 0666)?;
    ftruncate(fd, size as i64)?;
    
    let ptr = unsafe { mmap(ptr::null_mut(), size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) };
    
    Ok(SharedBuffer {
        ptr,
        size,
        fd,
        name,
    })
}
```

#### 3. Memory-Mapped Files (Alternative)
- For persistent caching
- Cross-process communication
- Used for frame buffering

```rust
pub struct MappedFile {
    file: File,
    ptr: *mut u8,
    size: usize,
}

impl MappedFile {
    pub fn new(path: &Path, size: usize) -> Result<Self> {
        let file = OpenOptions::new()
            .read(true)
            .write(true)
            .create(true)
            .open(path)?;
        
        file.set_len(size as u64)?;
        
        let ptr = unsafe {
            mmap(
                ptr::null_mut(),
                size,
                PROT_READ | PROT_WRITE,
                MAP_SHARED,
                file.as_raw_fd(),
                0,
            )
        }?;
        
        Ok(Self { file, ptr, size })
    }
}
```

### Pipeline Optimization Strategies

#### 1. Parallel Encoding
```rust
// Run multiple encoders in parallel for different quality levels
pub struct AdaptiveEncoder {
    encoders: Vec<Box<dyn VideoEncoder>>,
    active_encoder: usize,
    bandwidth_monitor: BandwidthMonitor,
}

impl AdaptiveEncoder {
    pub async fn encode_adaptive(&mut self, frame: CapturedFrame) -> Result<EncodedFrame> {
        let bandwidth = self.bandwidth_monitor.current_bandwidth();
        
        // Switch encoder based on bandwidth
        let new_encoder = match bandwidth {
            b if b < 500_000 => 0,  // Low bitrate
            b if b < 2_000_000 => 1, // Medium bitrate
            _ => 2,                  // High bitrate
        };
        
        if new_encoder != self.active_encoder {
            self.active_encoder = new_encoder;
        }
        
        self.encoders[self.active_encoder].encode(frame).await
    }
}
```

#### 2. Frame Skipping
```rust
pub struct FrameSkipper {
    target_fps: u32,
    last_frame_time: Instant,
    skip_count: u32,
}

impl FrameSkipper {
    pub fn should_skip(&mut self) -> bool {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_frame_time).as_millis();
        let frame_interval = 1000 / self.target_fps as u128;
        
        if elapsed < frame_interval {
            self.skip_count += 1;
            return true;
        }
        
        self.last_frame_time = now;
        self.skip_count = 0;
        false
    }
}
```

#### 3. Region of Interest (ROI)
```rust
pub struct RegionEncoder {
    full_encoder: Box<dyn VideoEncoder>,
    roi_encoder: Box<dyn VideoEncoder>,
    current_region: Option<ScreenRegion>,
}

impl RegionEncoder {
    pub async fn encode_roi(
        &mut self,
        frame: CapturedFrame,
        roi: Option<ScreenRegion>,
    ) -> Result<EncodedFrame> {
        if let Some(region) = roi {
            // Encode only ROI with higher quality
            let cropped = self.crop_frame(frame, region)?;
            self.roi_encoder.encode(cropped).await
        } else {
            // Encode full frame
            self.full_encoder.encode(frame).await
        }
    }
    
    fn crop_frame(&self, mut frame: CapturedFrame, region: ScreenRegion) -> Result<CapturedFrame> {
        // Adjust DMA-BUF offsets for region
        frame.width = region.width;
        frame.height = region.height;
        Ok(frame)
    }
}
```

---

## Low Latency Optimization

### Design Philosophy

To achieve 15-25ms latency on local networks, we prioritize:
1. **Speed over completeness**: Fast, low-latency delivery is more important than perfect reliability
2. **Minimize buffering**: Small buffers at every stage
3. **Zero-copy everywhere**: Eliminate CPU memory copies
4. **Hardware acceleration**: Use GPU for all intensive operations
5. **Predictive timing**: Reduce wait times with accurate timing

### 1. Encoder Optimization

#### Hardware Encoder Configuration

```rust
pub struct LowLatencyEncoderConfig {
    // Codec settings
    pub codec: VideoCodec,
    
    // Low-latency specific
    pub gop_size: u32,           // Small GOP: 8-15 frames
    pub b_frames: u32,           // Zero B-frames for minimal latency
    pub max_b_frames: u32,       // Always 0 for low latency
    pub lookahead: u32,           // Minimal lookahead: 0-2 frames
    
    // Rate control
    pub rc_mode: RateControlMode, // CBR or VBR with strict constraints
    pub bitrate: u32,            // Adaptive bitrate
    pub max_bitrate: u32,        // Tight max constraint
    pub min_bitrate: u32,
    pub vbv_buffer_size: u32,    // Very small VBV buffer
    pub vbv_max_rate: u32,       // Close to bitrate
    
    // Timing
    pub fps: u32,                // Target FPS (30-60)
    pub intra_period: u32,       // Keyframe interval
    
    // Quality vs Latency trade-offs
    pub preset: EncoderPreset,   // Ultrafast/Fast
    pub tune: EncoderTune,       // zerolatency
    pub quality: u8,             // Constant quality (CRF) or CQ
}

pub enum VideoCodec {
    H264,    // Best compatibility, good latency
    H265,    // Better compression, slightly higher latency
    VP9,     // Open alternative
}

pub enum RateControlMode {
    CBR,     // Constant Bitrate - predictable
    VBR,     // Variable Bitrate - better quality
    CQP,     // Constant Quantizer - lowest latency
}

pub enum EncoderPreset {
    Ultrafast,  // Lowest latency, lower quality
    Superfast,
    Veryfast,   // Recommended for 15-25ms
    Faster,
}

pub enum EncoderTune {
    Zerolatency, // Mandatory for low latency
    Film,
    Animation,
}
```

#### Recommended Encoder Settings

**VA-API (Intel/AMD) - For 15-25ms latency:**

```c
// libva-specific low-latency settings
VAConfigAttrib attribs[] = {
    {VAConfigAttribRTFormat, VA_RT_FORMAT_YUV420},
    {VAConfigAttribRateControl, VA_RC_CBR},
    {VAConfigAttribEncMaxRefFrames, {1, 0}},  // Min reference frames
    {VAConfigAttribEncPackedHeaders, VA_ENC_PACKED_HEADER_SEQUENCE},
};

VAEncSequenceParameterBufferH264 seq_param = {
    .intra_period = 15,           // Short GOP
    .ip_period = 1,               // No B-frames
    .bits_per_second = 4000000,
    .max_num_ref_frames = 1,     // Minimal references
    .time_scale = 90000,
    .num_units_in_tick = 1500,   // 60 FPS
};

VAEncPictureParameterBufferH264 pic_param = {
    .reference_frames = {
        {0, VA_FRAME_PICTURE},    // Single reference
    },
    .num_ref_idx_l0_active_minus1 = 0,
    .num_ref_idx_l1_active_minus1 = 0,
    .pic_fields.bits.idr_pic_flag = 0,
    .pic_fields.bits.reference_pic_flag = 1,
};

VAEncSliceParameterBufferH264 slice_param = {
    .num_ref_idx_l0_active_minus1 = 0,
    .num_ref_idx_l1_active_minus1 = 0,
    .disable_deblocking_filter_idc = 1,  // Faster
};
```

**NVENC (NVIDIA) - For 15-25ms latency:**

```rust
// NVENC low-latency configuration
let mut create_params = NV_ENC_INITIALIZE_PARAMS::default();
create_params.encodeGUID = NV_ENC_CODEC_H264_GUID;
create_params.presetGUID = NV_ENC_PRESET_P4_GUID;  // Low latency

let mut config = NV_ENC_CONFIG::default();
config.profileGUID = NV_ENC_H264_PROFILE_BASELINE_GUID;  // Faster encoding
config.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CBR;
config.rcParams.averageBitRate = 4000000;
config.rcParams.maxBitRate = 4000000;
config.rcParams.vbvBufferSize = 4000000;  // 1 second buffer
config.rcParams.vbvInitialDelay = 0;      // Minimal delay

let mut h264_config = unsafe { config.encodeCodecConfig.h264Config };
h264_config.enableIntraRefresh = 1;
h264_config.idrPeriod = 30;                // Keyframe every 30 frames
h264_config.repeatSPSPPS = 1;
h264_config.enableConstrainedEncoding = 1;
h264_config.frameNumD = 0;
h264_config.sliceMode = NV_ENC_SLICE_MODE_AUTOSELECT;

// Low-latency specific
h264_config.maxNumRefFrames = 1;          // Minimal references
h264_config.idrPeriod = 15;               // Shorter GOP
```

**x264 (Software) - For 50-100ms latency:**

```rust
// x264 parameters for low latency
let param = x264_param_t {
    i_width: 1920,
    i_height: 1080,
    i_fps_num: 60,
    i_fps_den: 1,
    
    // Rate control
    i_bitrate: 4000,                      // 4 Mbps
    i_keyint_max: 15,                     // Short GOP
    b_intra_refresh: 1,
    
    // Low latency
    b_repeat_headers: 1,
    b_annexb: 1,
    i_scenecut_threshold: 0,             // Disable scene detection
    
    // No B-frames for latency
    i_bframe: 0,
    i_bframe_adaptive: 0,
    i_bframe_pyramid: 0,
    
    // References
    i_frame_reference: 1,                 // Minimal references
    
    // Preset: ultrafast or superfast
    // This is set via preset function
};

// Apply preset
x264_param_apply_preset(&param, "superfast");
x264_param_apply_tune(&param, "zerolatency");
```

#### Dynamic Bitrate vs Latency Trade-offs

```rust
pub struct AdaptiveBitrateController {
    target_latency_ms: u32,
    current_bitrate: u32,
    frame_rate: u32,
    network_quality: NetworkQuality,
    buffer_depth_ms: u32,
}

pub struct NetworkQuality {
    bandwidth_mbps: f64,
    latency_ms: u32,
    packet_loss_rate: f64,
    jitter_ms: u32,
}

impl AdaptiveBitrateController {
    pub fn update_target_bitrate(&mut self, measured_latency_ms: u32) -> u32 {
        let latency_ratio = measured_latency_ms as f64 / self.target_latency_ms as f64;
        
        if latency_ratio > 1.5 {
            // Latency too high - reduce bitrate aggressively
            self.current_bitrate = (self.current_bitrate as f64 * 0.7) as u32;
        } else if latency_ratio > 1.2 {
            // Moderately high - reduce bitrate
            self.current_bitrate = (self.current_bitrate as f64 * 0.85) as u32;
        } else if latency_ratio < 0.8 {
            // Can increase bitrate
            self.current_bitrate = (self.current_bitrate as f64 * 1.1) as u32;
        }
        
        // Clamp to reasonable bounds
        self.current_bitrate = self.current_bitrate.clamp(1000000, 8000000);
        
        self.current_bitrate
    }
}
```

### 2. Capture Optimization

#### PipeWire DMA-BUF Zero-Copy

```rust
pub struct LowLatencyCaptureConfig {
    pub frame_rate: u32,           // 30-60 FPS
    pub zero_copy: bool,           // Always true
    pub track_damage: bool,        // Enable damage tracking
    pub partial_updates: bool,     // Encode only damaged regions
    pub buffer_pool_size: usize,   // Small pool: 3-5 buffers
}

pub struct DamageTracker {
    damaged_regions: VecDeque<ScreenRegion>,
    last_frame: Option<DmaBufHandle>,
    threshold: u32,                // Minimum change size to encode
}

impl DamageTracker {
    pub fn update(&mut self, new_frame: &CapturedFrame) -> Vec<ScreenRegion> {
        match &self.last_frame {
            Some(last) => {
                let regions = self.compute_damage_regions(last, new_frame);
                self.last_frame = Some(new_frame.dma_buf.clone());
                regions
            }
            None => {
                self.last_frame = Some(new_frame.dma_buf.clone());
                vec![ScreenRegion {
                    x: 0,
                    y: 0,
                    width: new_frame.width,
                    height: new_frame.height,
                }]
            }
        }
    }
    
    fn compute_damage_regions(&self, last: &DmaBufHandle, new: &CapturedFrame) -> Vec<ScreenRegion> {
        // Compare frames and find changed regions
        // This can be done efficiently with GPU
        // For MVP, we can use a simple block-based comparison
        
        // Block size for comparison (e.g., 16x16 pixels)
        let block_size = 16;
        let blocks_x = (new.width as usize + block_size - 1) / block_size;
        let blocks_y = (new.height as usize + block_size - 1) / block_size;
        
        // Merge adjacent damaged blocks into regions
        // ...
        
        vec![]  // Placeholder
    }
}
```

#### Partial Region Encoding

```rust
pub struct RegionEncoder {
    full_encoder: Box<dyn VideoEncoder>,
    tile_encoder: Box<dyn VideoEncoder>,
    current_regions: Vec<ScreenRegion>,
}

impl RegionEncoder {
    pub async fn encode_with_regions(
        &mut self,
        frame: CapturedFrame,
        regions: Vec<ScreenRegion>,
    ) -> Result<Vec<EncodedTile>> {
        let mut encoded_tiles = Vec::new();
        
        if regions.is_empty() || regions.len() > 4 {
            // Too many regions or no changes - encode full frame
            let encoded = self.full_encoder.encode(frame).await?;
            encoded_tiles.push(EncodedTile {
                region: ScreenRegion {
                    x: 0,
                    y: 0,
                    width: frame.width,
                    height: frame.height,
                },
                data: encoded.data,
                is_keyframe: encoded.is_keyframe,
            });
        } else {
            // Encode each damaged region separately
            for region in regions {
                let cropped = self.crop_frame(&frame, &region)?;
                let encoded = self.tile_encoder.encode(cropped).await?;
                
                encoded_tiles.push(EncodedTile {
                    region,
                    data: encoded.data,
                    is_keyframe: encoded.is_keyframe,
                });
            }
        }
        
        Ok(encoded_tiles)
    }
    
    fn crop_frame(&self, frame: &CapturedFrame, region: &ScreenRegion) -> Result<CapturedFrame> {
        // Adjust DMA-BUF offsets for the region
        // This is a zero-copy operation - just metadata changes
        
        Ok(CapturedFrame {
            dma_buf: DmaBufHandle::from_region(&frame.dma_buf, region)?,
            width: region.width,
            height: region.height,
            format: frame.format,
            timestamp: frame.timestamp,
        })
    }
}
```

### 3. WebRTC Transport Layer Optimization

#### Low-Latency WebRTC Configuration

```rust
pub struct LowLatencyWebRtcConfig {
    // ICE and transport
    pub ice_transport_policy: IceTransportPolicy,
    pub ice_servers: Vec<IceServer>,
    
    // Media settings
    pub video_codecs: Vec<VideoCodecConfig>,
    pub max_bitrate: u32,
    pub min_bitrate: u32,
    pub start_bitrate: u32,
    
    // Buffering - minimize for low latency
    pub playout_delay_min_ms: u16,    // 0-10ms (default 50ms)
    pub playout_delay_max_ms: u16,    // 10-20ms (default 200ms)
    
    // Packetization
    pub rtp_payload_size: u16,        // Smaller packets: 1200 bytes
    pub packetization_mode: PacketizationMode,
    
    // Feedback and retransmission
    pub nack_enabled: bool,           // Limited NACK
    pub fec_enabled: bool,            // Disable FEC for latency
    pub transport_cc_enabled: bool,   // Congestion control
    
    // RTCP settings
    pub rtcp_report_interval_ms: u32,  // Frequent: 50-100ms
}

pub struct VideoCodecConfig {
    pub name: String,
    pub clock_rate: u32,
    pub num_channels: u16,
    pub parameters: CodecParameters,
}

impl LowLatencyWebRtcConfig {
    pub fn for_ultra_low_latency() -> Self {
        Self {
            ice_transport_policy: IceTransportPolicy::All,
            ice_servers: vec![],
            
            video_codecs: vec![
                VideoCodecConfig {
                    name: "H264".to_string(),
                    clock_rate: 90000,
                    num_channels: 1,
                    parameters: CodecParameters {
                        profile_level_id: "42e01f".to_string(),  // Baseline profile
                        packetization_mode: 1,
                        level_asymmetry_allowed: 1,
                    },
                },
            ],
            
            max_bitrate: 8000000,      // 8 Mbps max
            min_bitrate: 500000,       // 500 Kbps min
            start_bitrate: 4000000,    // 4 Mbps start
            
            // Critical: Minimal playout delay
            playout_delay_min_ms: 0,   // No minimum
            playout_delay_max_ms: 20,  // 20ms maximum
            
            // Smaller packets for lower serialization latency
            rtp_payload_size: 1200,
            packetization_mode: PacketizationMode::NonInterleaved,
            
            // Limited retransmission
            nack_enabled: true,        // But limit retransmission window
            fec_enabled: false,        // Disable FEC - adds latency
            transport_cc_enabled: true,
            
            // More frequent RTCP feedback
            rtcp_report_interval_ms: 50,
        }
    }
}
```

#### Packet Loss Handling Strategy

```rust
pub enum LossHandlingStrategy {
    PreferLatency,    // Drop late frames, prioritize low latency
    PreferQuality,     // Retransmit, prioritize quality
    Balanced,          // Adaptive based on network conditions
}

pub struct PacketLossHandler {
    strategy: LossHandlingStrategy,
    max_retransmission_delay_ms: u32,
    nack_window_size: u32,
}

impl PacketLossHandler {
    pub fn handle_packet_loss(
        &mut self,
        sequence_number: u16,
        now_ms: u64,
    ) -> RetransmissionDecision {
        match self.strategy {
            LossHandlingStrategy::PreferLatency => {
                // Don't retransmit if too old
                if now_ms > self.max_retransmission_delay_ms as u64 {
                    RetransmissionDecision::Drop
                } else {
                    RetransmissionDecision::None
                }
            }
            LossHandlingStrategy::PreferQuality => {
                // Always try to retransmit
                RetransmissionDecision::Request(sequence_number)
            }
            LossHandlingStrategy::Balanced => {
                // Adaptive based on loss rate
                RetransmissionDecision::None  // Placeholder
            }
        }
    }
}

pub enum RetransmissionDecision {
    Request(u16),
    Drop,
    None,
}
```

#### NACK vs FEC Selection

**Recommendation for 15-25ms latency:**

- **Primary**: Limited NACK
  - NACK window: 1-2 frames (16-33ms at 60fps)
  - Max retransmission delay: 20ms
  - Only retransmit keyframes or critical packets

- **Avoid FEC**: 
  - Forward Error Correction adds significant latency
  - With low-loss LAN, FEC overhead outweighs benefits
  - Use NACK selectively instead

```rust
pub struct NackController {
    window_size_ms: u32,           // 20ms window
    max_nack_packets_per_second: u32,
    nack_list: VecDeque<(u16, u64)>, // (seq_num, timestamp_ms)
}

impl NackController {
    pub fn should_send_nack(&self, seq_num: u16, now_ms: u64) -> bool {
        // Check if packet is within NACK window
        if let Some(&(_, oldest_ts)) = self.nack_list.front() {
            if now_ms - oldest_ts > self.window_size_ms as u64 {
                return false;  // Too old
            }
        }
        
        true
    }
}
```

### 4. Frame Rate and Buffer Strategy

#### Dynamic Frame Rate Adjustment

```rust
pub struct FrameRateController {
    target_fps: u32,              // Desired FPS (30-60)
    current_fps: u32,
    frame_times: VecDeque<Instant>,
    last_frame_time: Instant,
    min_interval: Duration,       // 1 / max_fps
}

impl FrameRateController {
    pub fn new(target_fps: u32) -> Self {
        let min_interval = Duration::from_micros(1_000_000 / 60);  // Max 60 FPS
        
        Self {
            target_fps,
            current_fps: 30,
            frame_times: VecDeque::with_capacity(60),
            last_frame_time: Instant::now(),
            min_interval,
        }
    }
    
    pub fn should_capture(&mut self) -> bool {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_frame_time);
        
        if elapsed < self.min_interval {
            return false;  // Too soon
        }
        
        // Update frame rate based on conditions
        self.adjust_fps_based_on_conditions();
        
        self.last_frame_time = now;
        true
    }
    
    pub fn adjust_fps_based_on_conditions(&mut self) {
        // Check system load, network conditions, etc.
        let system_load = self.get_system_load();
        let network_quality = self.get_network_quality();
        
        if system_load > 0.8 || network_quality.is_poor() {
            self.current_fps = 30;  // Reduce frame rate
        } else if system_load < 0.5 && network_quality.is_excellent() {
            self.current_fps = 60;  // Increase frame rate
        } else {
            self.current_fps = 45;  // Balanced
        }
    }
}
```

#### Fast Frame Dropping Strategy

```rust
pub struct FrameDropper {
    target_fps: u32,
    adaptive_drop_threshold_ms: u32,
    consecutive_drops: u32,
    max_consecutive_drops: u32,
}

impl FrameDropper {
    pub fn should_drop(&mut self, queue_latency_ms: u32) -> bool {
        if queue_latency_ms > self.adaptive_drop_threshold_ms {
            if self.consecutive_drops < self.max_consecutive_drops {
                self.consecutive_drops += 1;
                return true;
            }
        }
        
        self.consecutive_drops = 0;
        false
    }
    
    pub fn get_drop_interval(&self) -> u32 {
        // Calculate how many frames to drop
        match self.target_fps {
            60 => 1,   // Drop 1 out of every 2
            30 => 1,   // Drop 1 out of every 2
            _ => 0,
        }
    }
}
```

#### Minimal Buffering

**Sender Side:**

```rust
pub struct SenderBuffer {
    max_size_frames: usize,        // Very small: 1-2 frames
    queue: VecDeque<EncodedFrame>,
    target_latency_ms: u32,
}

impl SenderBuffer {
    pub fn new() -> Self {
        Self {
            max_size_frames: 1,      // Single frame buffer
            queue: VecDeque::with_capacity(2),
            target_latency_ms: 5,    // 5ms target
        }
    }
    
    pub fn push(&mut self, frame: EncodedFrame) -> Result<()> {
        if self.queue.len() >= self.max_size_frames {
            // Drop oldest frame to maintain low latency
            self.queue.pop_front();
        }
        
        self.queue.push_back(frame);
        Ok(())
    }
    
    pub fn pop(&mut self) -> Option<EncodedFrame> {
        self.queue.pop_front()
    }
}
```

**Receiver Side (Jitter Buffer):**

```rust
pub struct MinimalJitterBuffer {
    target_delay_ms: u32,           // 0-10ms
    min_delay_ms: u32,             // 0ms
    max_delay_ms: u32,             // 10-20ms
    packets: VecDeque<RtpPacket>,
}

impl MinimalJitterBuffer {
    pub fn new() -> Self {
        Self {
            target_delay_ms: 5,     // 5ms target
            min_delay_ms: 0,        // No minimum
            max_delay_ms: 10,      // 10ms maximum
            packets: VecDeque::with_capacity(10),
        }
    }
    
    pub fn push(&mut self, packet: RtpPacket) {
        if self.packets.len() < self.max_delay_ms as usize / 2 {
            self.packets.push_back(packet);
        } else {
            // Buffer full - drop oldest
            self.packets.pop_front();
            self.packets.push_back(packet);
        }
    }
    
    pub fn pop(&mut self) -> Option<RtpPacket> {
        self.packets.pop_front()
    }
}
```

### 5. Architecture Adjustments

#### Single-Threaded vs Multi-Threaded

**Recommendation: Hybrid Approach**

- **Capture Thread**: Dedicated thread for PipeWire
- **Encoder Thread**: Per-session encoder thread
- **Network Thread**: WebRTC transport thread
- **Coordination**: Lock-free channels for data passing

```rust
pub struct PipelineArchitecture {
    capture_thread: JoinHandle<()>,
    encoder_threads: Vec<JoinHandle<()>>,
    network_thread: JoinHandle<()>,
    
    // Lock-free communication
    capture_to_encoder: async_channel::Sender<CapturedFrame>,
    encoder_to_network: async_channel::Sender<EncodedFrame>,
}
```

#### Lock Competition Minimization

```rust
// Use lock-free data structures where possible
use crossbeam::queue::SegQueue;
use crossbeam::channel::{bounded, unbounded};

pub struct LockFreeFrameQueue {
    queue: SegQueue<CapturedFrame>,
    max_size: usize,
}

impl LockFreeFrameQueue {
    pub fn push(&self, frame: CapturedFrame) -> Result<()> {
        if self.queue.len() >= self.max_size {
            return Err(Error::QueueFull);
        }
        self.queue.push(frame);
        Ok(())
    }
    
    pub fn pop(&self) -> Option<CapturedFrame> {
        self.queue.pop()
    }
}
```

#### Async Task Scheduling

```rust
pub struct LowLatencyScheduler {
    capture_priority: TaskPriority,
    encode_priority: TaskPriority,
    network_priority: TaskPriority,
}

impl LowLatencyScheduler {
    pub async fn schedule_pipeline(&self) {
        tokio::spawn_with_priority(TaskPriority::High, async move {
            // Critical path: capture -> encode -> send
        });
        
        tokio::spawn_with_priority(TaskPriority::Medium, async move {
            // Background tasks: statistics, logging
        });
    }
}
```

### 6. Technology Stack Adjustments

#### Encoder Selection for Latency

| Encoder | Setup Latency | Per-Frame Latency | Quality | Recommendation |
|---------|--------------|------------------|---------|----------------|
| VA-API H.264 | 1-2ms | 2-3ms | Medium | Primary (Linux) |
| NVENC H.264 | 1-2ms | 1-2ms | High | Primary (NVIDIA) |
| x264 (ultrafast) | 0ms | 5-8ms | Low | Fallback |
| x264 (superfast) | 0ms | 8-12ms | Medium | Fallback |

**Recommendation:**
- **Primary**: VA-API or NVENC H.264 with ultrafast preset
- **Fallback**: x264 with ultrafast preset (accept 30-50ms latency)

#### Direct Wayland vs PipeWire

**Use PipeWire (recommended):**
- Better DMA-BUF support
- Hardware acceleration integration
- Zero-copy through ecosystem

**Direct Wayland (if needed):**
- Lower-level control
- Potentially lower capture latency (0.5-1ms)
- More complex implementation
- No portal integration (security issue)

**Recommendation:** Stick with PipeWire for MVP. Consider direct Wayland only if PipeWire latency is unacceptable.

#### webrtc-rs Latency Characteristics

**Pros:**
- Pure Rust, predictable behavior
- Good zero-copy support
- Customizable buffering

**Cons:**
- May have default buffer settings optimized for reliability
- Need manual configuration for ultra-low latency

**Custom WebRTC Layer (advanced):**
- Full control over buffering and timing
- Can inline packetization
- More complex implementation

**Recommendation:** Use webrtc-rs with low-latency configuration. Only consider custom layer if webrtc-rs cannot achieve targets.

### 7. Implementation Priority

#### P0 (Must-Have for MVP)

1. **Hardware Encoder Integration**
   - VA-API H.264 with low-latency settings
   - No B-frames, small GOP (15 frames)
   - Ultrafast preset

2. **DMA-BUF Zero-Copy**
   - PipeWire DMA-BUF import
   - Direct encoder feed
   - No CPU copies

3. **Minimal Buffering**
   - Single frame sender buffer
   - 0-5ms jitter buffer
   - Fast frame dropping

4. **Low-Latency WebRTC Config**
   - playout_delay_min: 0ms
   - playout_delay_max: 20ms
   - Disable FEC

#### P1 (Important for 15-25ms)

1. **Damage Tracking**
   - Partial region updates
   - Reduced encoding load

2. **Dynamic Frame Rate**
   - 30-60 FPS adaptation
   - Network-aware

3. **NACK Control**
   - Limited retransmission window (20ms)
   - Selective NACK

#### P2 (Nice-to-Have)

1. **Direct Wayland Capture**
   - If PipeWire latency insufficient

2. **Custom WebRTC Layer**
   - If webrtc-rs insufficient

3. **Advanced Congestion Control**
   - SCReAM or Google Congestion Control

### 8. Testing and Validation

#### End-to-End Latency Measurement

```rust
pub struct LatencyMeter {
    timestamps: VecDeque<(u64, LatencyStage)>,
}

pub enum LatencyStage {
    Capture,
    EncodeStart,
    EncodeEnd,
    Packetize,
    NetworkSend,
    NetworkReceive,
    Depacketize,
    DecodeStart,
    DecodeEnd,
    Display,
}

impl LatencyMeter {
    pub fn mark(&mut self, stage: LatencyStage) {
        let now = timestamp_ns();
        self.timestamps.push_back((now, stage));
    }
    
    pub fn calculate_total_latency(&self) -> Duration {
        if self.timestamps.len() < 2 {
            return Duration::ZERO;
        }
        
        let first = self.timestamps.front().unwrap().0;
        let last = self.timestamps.back().unwrap().0;
        Duration::from_nanos(last - first)
    }
}
```

**Measurement Method:**

1. **Timestamp Injection**
   - Inject frame ID at capture (visible timestamp on screen)
   - Capture at client with camera
   - Compare timestamps to calculate round-trip
   - Divide by 2 for one-way latency

2. **Network Timestamping**
   - Add frame capture time in RTP header extension
   - Measure at receiver
   - Account for clock skew

3. **Hardware Timestamping**
   - Use kernel packet timestamps (SO_TIMESTAMPING)
   - Hardware NIC timestamps if available

#### Performance Benchmarking

```rust
#[bench]
fn bench_full_pipeline_latency(b: &mut Bencher) {
    let mut pipeline = LowLatencyPipeline::new(config).unwrap();
    let mut latencies = Vec::new();
    
    b.iter(|| {
        let start = Instant::now();
        
        let frame = pipeline.capture().unwrap();
        let encoded = pipeline.encode(frame).unwrap();
        pipeline.send(encoded).unwrap();
        
        latencies.push(start.elapsed());
    });
    
    let avg_latency = latencies.iter().sum::<Duration>() / latencies.len() as u32;
    println!("Average latency: {:?}", avg_latency);
}
```

**Target Benchmarks:**

| Metric | Target | Acceptable |
|--------|--------|------------|
| Capture latency | 2-3ms | <5ms |
| Encode latency | 3-5ms | <8ms |
| Packetize latency | 1-2ms | <3ms |
| Network (LAN) | 0.5-1ms | <2ms |
| Decode latency | 1-2ms | <4ms |
| Display latency | 1-2ms | <4ms |
| **Total** | **15-25ms** | **<30ms** |

#### Tuning Strategy

1. **Baseline Measurement**
   - Measure each stage individually
   - Identify bottlenecks

2. **Iterative Tuning**
   - Tune one parameter at a time
   - Measure impact on total latency
   - Trade off quality if needed

3. **Validation**
   - Test under various network conditions
   - Test under system load
   - Test with different content (static, dynamic)

4. **Continuous Monitoring**
   - Track latency in production
   - Alert on degradation
   - Adaptive adjustments

---

## Implementation Roadmap (Updated for Low Latency)

### Phase 1: MVP (Minimum Viable Product) - 4-6 weeks

**Goal:** Basic screen capture and WebRTC streaming

**Week 1-2: Core Infrastructure**
- [ ] Project setup (Cargo.toml, directory structure)
- [ ] Tokio async runtime setup
- [ ] Error handling framework (anyhow/thiserror)
- [ ] Logging setup (tracing)
- [ ] Configuration management

**Week 2-3: Wayland Capture**
- [ ] PipeWire xdg-desktop-portal integration
- [ ] Basic screen capture (single monitor)
- [ ] DMA-BUF import/export
- [ ] Frame receiver channel

**Week 3-4: Simple Encoding**
- [ ] x264 software encoder (fallback)
- [ ] Basic frame pipeline (capture → encode)
- [ ] Frame rate limiting

**Week 4-5: WebRTC Transport**
- [ ] webrtc-rs integration
- [ ] Basic peer connection
- [ ] Video track setup
- [ ] Simple signaling (WebSocket)

**Week 5-6: Testing & Integration**
- [ ] End-to-end test (Wayland → WebRTC → Browser)
- [ ] Performance benchmarking
- [ ] Bug fixes

**MVP Deliverables:**
- Working screen capture
- WebRTC streaming to browser
- 15-30 FPS at 720p
- x264 encoding (software)

---

### Phase 2: Hardware Acceleration - 3-4 weeks

**Goal:** GPU-accelerated encoding for better performance

**Week 1-2: VA-API Integration**
- [ ] VA-API encoder implementation
- [ ] DMA-BUF to VA-API surface import
- [ ] H.264 encoding
- [ ] Intel/AMD GPU support

**Week 2-3: NVENC Integration**
- [ ] NVENC encoder for NVIDIA GPUs
- [ ] CUDA memory management
- [ ] NVENC H.264 encoding

**Week 3-4: Encoder Selection**
- [ ] Encoder detection and selection
- [ ] Fallback chain (NVENC → VA-API → x264)
- [ ] Encoder switching at runtime

**Phase 2 Deliverables:**
- GPU-accelerated encoding
- 30-60 FPS at 1080p
- Lower CPU usage
- Adaptive encoder selection

---

### Phase 3: Low Latency Optimization - 4-5 weeks

**Goal:** Achieve 25-35ms latency on local networks

**Week 1: Encoder Low-Latency Configuration (P0)**
- [ ] Configure VA-API/NVENC for <5ms encoding
- [ ] Disable B-frames, set GOP to 15 frames
- [ ] Implement CBR rate control with small VBV buffer
- [ ] Tune encoder preset (ultrafast/superfast)
- [ ] Measure encoder latency independently

**Week 2: Minimal Buffering (P0)**
- [ ] Reduce sender buffer to 1 frame
- [ ] Implement 0-10ms jitter buffer
- [ ] Configure WebRTC playout delay (0-20ms)
- [ ] Disable FEC for latency
- [ ] Test end-to-end latency

**Week 3: Damage Tracking & Partial Updates (P1)**
- [ ] Implement region change detection
- [ ] Add partial region encoding
- [ ] Optimize for static content
- [ ] Benchmark latency improvements

**Week 4: Dynamic Frame Rate & Quality (P1)**
- [ ] Implement adaptive frame rate (30-60fps)
- [ ] Network quality detection
- [ ] Dynamic bitrate vs latency trade-off
- [ ] Fast frame dropping under load

**Week 5: Advanced Optimizations (P1/P2)**
- [ ] Limited NACK window (20ms)
- [ ] Selective packet retransmission
- [ ] RTCP fine-tuning (50ms intervals)
- [ ] Performance profiling
- [ ] Final latency tuning

**Phase 3 Deliverables:**
- 25-35ms latency on LAN
- Zero-copy DMA-BUF pipeline
- Hardware encoder with low-latency config
- Minimal buffering throughout pipeline
- Adaptive quality based on conditions

---

### Phase 4: Production Ready with Ultra Low Latency - 5-7 weeks

**Goal:** Achieve 15-25ms latency while ensuring security, reliability, and deployment readiness

**Week 1-2: Ultra Low Latency Tuning (P1/P2)**
- [ ] Direct Wayland capture evaluation (if needed)
- [ ] Custom WebRTC layer evaluation (if needed)
- [ ] Advanced congestion control (SCReAM/Google CC)
- [ ] Kernel bypass optimization (DPDK/AF_XDP if needed)
- [ ] Final latency optimization and tuning

**Week 2-3: Security**
- [ ] Authentication (JWT, OAuth)
- [ ] Encryption (TLS, DTLS)
- [ ] Session management
- [ ] Access control
- [ ] Security audit and penetration testing

**Week 3-4: Reliability**
- [ ] Error recovery
- [ ] Connection health monitoring
- [ ] Automatic reconnection
- [ ] Graceful degradation with latency awareness
- [ ] Failover mechanisms

**Week 4-5: Monitoring & Debugging**
- [ ] Real-time latency metrics collection
- [ ] Per-stage latency tracking
- [ ] Logging improvements
- [ ] Debug mode with frame inspection
- [ ] Performance dashboard with latency visualization
- [ ] Alerting for latency degradation

**Week 5-6: Deployment**
- [ ] Docker containerization
- [ ] Systemd service
- [ ] Configuration file with low-latency presets
- [ ] Installation scripts
- [ ] Performance tuning documentation

**Week 6-7: Testing**
- [ ] Integration tests
- [ ] Load testing with latency monitoring
- [ ] Cross-browser testing
- [ ] Long-running stability tests
- [ ] Latency regression tests
- [ ] Automated performance benchmarks

**Phase 4 Deliverables:**
- 15-25ms latency on LAN
- Production-ready deployment
- Security features
- Monitoring and observability
- Comprehensive testing
- Latency regression testing

---

### Phase 5: Advanced Features (Optional) - Ongoing

**Potential Features:**
- [ ] Audio capture and streaming
- [ ] Bidirectional input (mouse, keyboard)
- [ ] Clipboard sharing
- [ ] File transfer
- [ ] Recording (save sessions)
- [ ] Multi-user sessions
- [ ] Mobile client support
- [ ] WebRTC data channels for control
- [ ] WebRTC insertable streams (client-side effects)
- [ ] Adaptive resolution
- [ ] H.265/HEVC encoding
- [ ] AV1 encoding
- [ ] Screen region selection
- [ ] Virtual display support
- [ ] Wayland virtual pointer protocol

---

### Testing Strategy

#### Unit Tests
```rust
#[cfg(test)]
mod tests {
    use super::*;
    
    #[tokio::test]
    async fn test_dma_buf_lifecycle() {
        let handle = DmaBufHandle::new(/* ... */);
        assert_eq!(handle.ref_count(), 1);
        
        let handle2 = handle.clone();
        assert_eq!(handle.ref_count(), 2);
        
        drop(handle);
        assert_eq!(handle2.ref_count(), 1);
        
        drop(handle2);
        // Buffer freed
    }
    
    #[tokio::test]
    async fn test_encoder_pipeline() {
        let config = EncoderConfig {
            encoder_type: EncoderType::H264_X264,
            bitrate: 2_000_000,
            keyframe_interval: 30,
            preset: EncodePreset::Fast,
        };
        
        let mut encoder = X264Encoder::new(config).unwrap();
        
        let frame = create_test_frame(1920, 1080);
        let encoded = encoder.encode(frame).await.unwrap();
        
        assert!(!encoded.data.is_empty());
        assert!(encoded.is_keyframe);
    }
}
```

#### Integration Tests
```rust
#[tokio::test]
async fn test_full_pipeline() {
    // Setup
    let capture = WaylandCapture::new(CaptureConfig::default()).await.unwrap();
    let encoder = VaapiEncoder::new(EncoderConfig::default()).unwrap();
    let webrtc = WebRtcServer::new(WebRtcConfig::default()).await.unwrap();
    
    // Run pipeline for 100 frames
    for _ in 0..100 {
        let frame = capture.next_frame().await;
        let encoded = encoder.encode(frame).await.unwrap();
        webrtc.send_video_frame("test-session", encoded).await.unwrap();
    }
    
    // Verify
    assert_eq!(webrtc.frames_sent(), 100);
}
```

#### Load Testing
```bash
# Simulate 10 concurrent connections
for i in {1..10}; do
    cargo test test_full_pipeline --release &
done
wait
```

#### Performance Benchmarks
```rust
#[bench]
fn bench_encode_frame(b: &mut Bencher) {
    let mut encoder = X264Encoder::new(config).unwrap();
    let frame = create_test_frame(1920, 1080);
    
    b.iter(|| {
        encoder.encode(frame.clone()).unwrap()
    });
}
```

---

## Potential Challenges & Solutions

### 1. Wayland Protocol Limitations

**Challenge:** Wayland's security model restricts screen capture

**Solution:**
- Use xdg-desktop-portal for permission management
- Implement user prompts for capture authorization
- Support multiple portal backends (GNOME, KDE, etc.)

```rust
pub async fn request_capture_permission() -> Result<bool> {
    let portal = Portal::new().await?;
    let session = portal.create_session(ScreenCaptureType::Monitor).await?;
    
    // User will see a dialog asking for permission
    let sources = portal.request_sources(&session).await?;
    
    Ok(!sources.is_empty())
}
```

**Alternative:** Use PipeWire directly with proper authentication

### 2. Hardware Acceleration Compatibility

**Challenge:** Different GPUs require different APIs (VA-API, NVENC, etc.)

**Solution:**
- Implement multiple encoder backends
- Runtime detection of available encoders
- Graceful fallback to software encoding

```rust
pub fn detect_best_encoder() -> EncoderType {
    // Try NVENC first (NVIDIA)
    if nvenc::is_available() {
        return EncoderType::H264_NVENC;
    }
    
    // Try VA-API (Intel/AMD)
    if vaapi::is_available() {
        return EncoderType::H264_VAAPI;
    }
    
    // Fallback to software
    EncoderType::H264_X264
}
```

### 3. Cross-Browser WebRTC Compatibility

**Challenge:** Different browsers have different WebRTC implementations

**Solution:**
- Use standardized codecs (H.264, VP8, VP9)
- Implement codec negotiation
- Provide fallback options

```rust
pub fn get_supported_codecs() -> Vec<RTCRtpCodecCapability> {
    vec![
        RTCRtpCodecCapability {
            mime_type: "video/H264".to_string(),
            clock_rate: 90000,
            ..Default::default()
        },
        RTCRtpCodecCapability {
            mime_type: "video/VP9".to_string(),
            clock_rate: 90000,
            ..Default::default()
        },
    ]
}
```

### 4. Security and Authentication

**Challenge:** Secure remote access without exposing desktop to unauthorized users

**Solution:**
- Implement JWT-based authentication
- Use DTLS for media encryption
- Add rate limiting and access control

```rust
pub struct AuthManager {
    secret: String,
    sessions: Arc<Mutex<HashMap<String, Session>>>,
}

impl AuthManager {
    pub fn create_token(&self, user_id: &str) -> Result<String> {
        let claims = Claims {
            sub: user_id.to_string(),
            exp: Utc::now() + chrono::Duration::hours(1),
        };
        
        encode(&Header::default(), &claims, &EncodingKey::from_secret(self.secret.as_ref()))
    }
    
    pub fn validate_token(&self, token: &str) -> Result<Claims> {
        decode::<Claims>(
            token,
            &DecodingKey::from_secret(self.secret.as_ref()),
            &Validation::default(),
        )
        .map(|data| data.claims)
        .map_err(|_| AuthError::InvalidToken)
    }
}
```

### 5. Memory Management

**Challenge:** Avoid memory leaks with DMA-BUF and shared memory

**Solution:**
- Use Rust's ownership system
- RAII patterns for resource cleanup
- Buffer pools with limits

```rust
pub struct ScopedDmaBuf {
    handle: DmaBufHandle,
}

impl Drop for ScopedDmaBuf {
    fn drop(&mut self) {
        // Automatically release DMA-BUF
        // File descriptor closed
        // GPU memory freed
    }
}

// Usage ensures cleanup
{
    let buf = ScopedDmaBuf::new(/* ... */);
    // Use buffer
} // Automatically dropped here
```

### 6. Latency Optimization

**Challenge:** Minimize end-to-end latency

**Solution:**
- Zero-copy pipeline
- Hardware acceleration
- Adaptive quality
- Frame skipping

```rust
pub struct LatencyOptimizer {
    target_latency_ms: u32,
    current_latency_ms: u32,
}

impl LatencyOptimizer {
    pub fn adjust_parameters(&mut self) {
        if self.current_latency_ms > self.target_latency_ms {
            // Reduce quality to improve latency
            self.reduce_bitrate();
            self.increase_frame_skipping();
        } else {
            // Increase quality
            self.increase_bitrate();
        }
    }
}
```

### 7. Network Conditions

**Challenge:** Varying bandwidth and network conditions

**Solution:**
- Adaptive bitrate streaming
- Multiple quality presets
- Congestion control

```rust
pub struct BandwidthMonitor {
    measurements: VecDeque<u32>,
    window_size: usize,
}

impl BandwidthMonitor {
    pub fn update(&mut self, bytes_sent: u32, duration: Duration) {
        let bandwidth = bytes_sent * 8 / duration.as_secs() as u32;
        self.measurements.push_back(bandwidth);
        
        if self.measurements.len() > self.window_size {
            self.measurements.pop_front();
        }
    }
    
    pub fn average_bandwidth(&self) -> u32 {
        if self.measurements.is_empty() {
            return 0;
        }
        
        self.measurements.iter().sum::<u32>() / self.measurements.len() as u32
    }
}
```

### 8. Cross-Platform Compatibility

**Challenge:** Support different Linux distributions and desktop environments

**Solution:**
- Containerize application
- Detect available technologies at runtime
- Provide fallback options

```rust
pub fn detect_desktop_environment() -> DesktopEnvironment {
    if std::path::Path::new("/usr/bin/gnome-shell").exists() {
        DesktopEnvironment::GNOME
    } else if std::path::Path::new("/usr/bin/plasmashell").exists() {
        DesktopEnvironment::KDE
    } else {
        DesktopEnvironment::Other
    }
}

pub fn configure_portal_for_env(env: DesktopEnvironment) -> PortalConfig {
    match env {
        DesktopEnvironment::GNOME => PortalConfig::gnome(),
        DesktopEnvironment::KDE => PortalConfig::kde(),
        DesktopEnvironment::Other => PortalConfig::generic(),
    }
}
```

### 9. Debugging and Troubleshooting

**Challenge:** Debugging complex pipeline with multiple components

**Solution:**
- Comprehensive logging
- Metrics collection
- Debug mode with frame inspection

```rust
pub struct DebugLogger {
    enabled: bool,
    output: DebugOutput,
}

pub enum DebugOutput {
    Console,
    File(PathBuf),
    Both,
}

impl DebugLogger {
    pub fn log_frame(&self, frame: &CapturedFrame) {
        if !self.enabled {
            return;
        }
        
        tracing::debug!(
            "Frame: {}x{}, format: {:?}, timestamp: {}",
            frame.width,
            frame.height,
            frame.format,
            frame.timestamp
        );
    }
    
    pub fn log_encoding(&self, encoded: &EncodedFrame) {
        if !self.enabled {
            return;
        }
        
        tracing::debug!(
            "Encoded: {} bytes, keyframe: {}, seq: {}",
            encoded.data.len(),
            encoded.is_keyframe,
            encoded.sequence_number
        );
    }
}
```

### 10. Resource Limits

**Challenge:** Prevent resource exhaustion (CPU, memory, GPU)

**Solution:**
- Limit concurrent sessions
- Monitor resource usage
- Implement graceful degradation

```rust
pub struct ResourceManager {
    max_sessions: usize,
    active_sessions: Arc<Mutex<HashSet<String>>>,
    cpu_threshold: f32,
    memory_threshold: u64,
}

impl ResourceManager {
    pub async fn can_create_session(&self) -> bool {
        let sessions = self.active_sessions.lock().await;
        
        if sessions.len() >= self.max_sessions {
            return false;
        }
        
        if self.cpu_usage() > self.cpu_threshold {
            return false;
        }
        
        if self.memory_usage() > self.memory_threshold {
            return false;
        }
        
        true
    }
}
```

---

## Code Examples

### Main Application Entry Point

```rust
// src/main.rs
mod capture;
mod encoder;
mod webrtc;
mod buffer;
mod ipc;
mod config;

use anyhow::Result;
use tracing::{info, error};
use tracing_subscriber;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize logging
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::INFO)
        .init();
    
    // Load configuration
    let config = config::load_config("config.toml")?;
    
    info!("Starting Wayland WebRTC Backend");
    info!("Configuration: {:?}", config);
    
    // Initialize components
    let capture = capture::WaylandCapture::new(config.capture).await?;
    let encoder = encoder::create_encoder(config.encoder)?;
    let webrtc = webrtc::WebRtcServer::new(config.webrtc).await?;
    
    // Create video track
    let video_track = webrtc::create_video_track()?;
    
    // Run capture pipeline
    let session_id = uuid::Uuid::new_v4().to_string();
    webrtc.create_peer_connection(session_id.clone(), video_track).await?;
    
    // Main loop
    loop {
        match capture.next_frame().await {
            Ok(frame) => {
                match encoder.encode(frame).await {
                    Ok(encoded) => {
                        if let Err(e) = webrtc.send_video_frame(&session_id, encoded).await {
                            error!("Failed to send frame: {}", e);
                        }
                    }
                    Err(e) => {
                        error!("Failed to encode frame: {}", e);
                    }
                }
            }
            Err(e) => {
                error!("Failed to capture frame: {}", e);
            }
        }
    }
}
```

### Configuration Example

```toml
# config.toml - Low Latency Configuration

[server]
address = "0.0.0.0:8080"
max_sessions = 10
log_level = "info"

[capture]
frame_rate = 60
quality = "high"
screen_region = null
# screen_region = { x = 0, y = 0, width = 1920, height = 1080 }

# Low-latency capture settings
zero_copy = true  # Always use DMA-BUF zero-copy
track_damage = true  # Enable damage tracking
partial_updates = true  # Encode only damaged regions
buffer_pool_size = 3  # Small buffer pool for low latency

[encoder]
type = "auto"
# Options: "vaapi", "nvenc", "x264", "auto"
bitrate = 4000000
keyframe_interval = 15  # Short GOP for low latency
preset = "ultrafast"  # Ultrafast for minimal latency

# Low-latency specific settings
[encoder.low_latency]
gop_size = 15  # Short GOP
b_frames = 0  # No B-frames for low latency
lookahead = 0  # Minimal lookahead
rc_mode = "CBR"  # Constant bitrate for predictable latency
vbv_buffer_size = 4000000  # 1 second VBV buffer
max_bitrate = 4000000  # Tight max constraint
intra_period = 15  # Keyframe interval

# Quality vs latency presets
[encoder.quality_presets]
low = { bitrate = 1000000, preset = "ultrafast", gop_size = 30 }
medium = { bitrate = 2000000, preset = "ultrafast", gop_size = 20 }
high = { bitrate = 4000000, preset = "ultrafast", gop_size = 15 }
ultra = { bitrate = 8000000, preset = "veryfast", gop_size = 15 }

[webrtc]
stun_servers = ["stun:stun.l.google.com:19302"]
turn_servers = []

[webrtc.low_latency]
# Critical: Minimal playout delay
playout_delay_min_ms = 0  # No minimum delay
playout_delay_max_ms = 20  # 20ms maximum delay

# Bitrate settings
max_bitrate = 8000000  # 8 Mbps max
min_bitrate = 500000  # 500 Kbps min
start_bitrate = 4000000  # 4 Mbps start

# Packetization
rtp_payload_size = 1200  # Smaller packets for lower latency
packetization_mode = "non_interleaved"

# Retransmission settings
nack_enabled = true  # Enable NACK
nack_window_size_ms = 20  # Only request retransmission for recent packets
max_nack_packets_per_second = 50
fec_enabled = false  # Disable FEC for low latency

# Congestion control
transport_cc_enabled = true  # Enable transport-wide congestion control

# RTCP settings
rtcp_report_interval_ms = 50  # Frequent RTCP feedback

[webrtc.ice]
transport_policy = "all"
candidate_network_types = ["udp", "tcp"]

[buffer]
# Minimal buffering for low latency
dma_buf_pool_size = 3  # Small pool
encoded_buffer_pool_size = 5  # Very small pool
sender_buffer_max_frames = 1  # Single frame sender buffer
jitter_buffer_target_delay_ms = 5  # 5ms target jitter buffer
jitter_buffer_max_delay_ms = 10  # 10ms maximum jitter buffer

[latency]
# Latency targets
target_latency_ms = 25  # Target end-to-end latency
max_acceptable_latency_ms = 50  # Maximum acceptable latency

# Adaptive settings
adaptive_frame_rate = true
adaptive_bitrate = true
fast_frame_drop = true

# Frame dropping strategy
consecutive_drop_limit = 3  # Max consecutive drops
drop_threshold_ms = 5  # Drop if queue latency exceeds this

[monitoring]
enable_metrics = true
enable_latency_tracking = true
metrics_port = 9090
```

---

## Performance Targets

### Latency Targets

#### Local Network (LAN)
| Scenario | Target | Acceptable |
|----------|--------|------------|
| Core Target | 25-35ms | 15-25ms (Excellent) |
| Minimum | <50ms | 15-25ms (Excellent) |
| User Experience | <16ms: Imperceptible | 16-33ms: Very Smooth |
| User Experience | 33-50ms: Good | 50-100ms: Acceptable |

#### Internet
| Scenario | Target | Acceptable |
|----------|--------|------------|
| Excellent | 40-60ms | <80ms |
| Good | 60-80ms | <100ms |

### Performance Metrics by Phase

| Metric | MVP | Phase 2 | Phase 3 | Phase 4 |
|--------|-----|---------|---------|---------|
| FPS (LAN) | 30 | 60 | 60 | 60 |
| FPS (Internet) | 15-20 | 30 | 30-60 | 60 |
| Resolution | 720p | 1080p | 1080p/4K | 1080p/4K |
| Latency (LAN) | <100ms | <50ms | 25-35ms | 15-25ms |
| Latency (Internet) | <200ms | <100ms | 60-80ms | 40-60ms |
| CPU Usage | 20-30% | 10-15% | 5-10% | <5% |
| Memory Usage | 150MB | 250MB | 400MB | <400MB |
| Bitrate | 2-4 Mbps | 4-8 Mbps | Adaptive | Adaptive |
| Concurrent Sessions | 1 | 3-5 | 5-10 | 10+ |

### Latency Budget Allocation (15-25ms Target)

| Component | Time (ms) | Percentage | Optimization Strategy |
|-----------|-----------|------------|---------------------|
| Wayland Capture | 2-3 | 12-15% | DMA-BUF zero-copy, partial update |
| Encoder | 3-5 | 20-25% | Hardware encoder, no B-frames |
| Packetization | 1-2 | 6-10% | Inline RTP, minimal buffering |
| Network (LAN) | 0.5-1 | 3-5% | UDP direct path, kernel bypass |
| Jitter Buffer | 0-2 | 0-10% | Minimal buffer, predictive jitter |
| Decoder | 1-2 | 6-10% | Hardware acceleration |
| Display | 1-2 | 6-10% | vsync bypass, direct scanout |
| **Total** | **15-25** | **100%** |  |

---

## Conclusion

This design provides a comprehensive blueprint for building an ultra-low-latency Wayland → WebRTC remote desktop backend in Rust. Key highlights:

1. **Zero-Copy Architecture**: Minimizes CPU copies through DMA-BUF and reference-counted buffers, achieving <5ms copy overhead
2. **Hardware Acceleration**: VA-API/NVENC encoders configured for <5ms encoding latency
3. **Minimal Buffering**: Single-frame sender buffer and 0-10ms jitter buffer throughout pipeline
4. **Low-Latency WebRTC**: Custom configuration with 0-20ms playout delay, no FEC, limited NACK
5. **Performance**: Targets 15-25ms latency on local networks at 60 FPS
6. **Adaptive Quality**: Dynamic frame rate (30-60fps) and bitrate adjustment based on network conditions
7. **Damage Tracking**: Partial region updates for static content to reduce encoding load

**Latency Budget Breakdown (15-25ms target):**
- Capture: 2-3ms (DMA-BUF zero-copy)
- Encoder: 3-5ms (hardware, no B-frames)
- Packetization: 1-2ms (inline RTP)
- Network: 0.5-1ms (LAN)
- Jitter Buffer: 0-2ms (minimal)
- Decoder: 1-2ms (hardware)
- Display: 1-2ms (vsync bypass)

The phased implementation approach allows for incremental development and testing:
- **Phase 1 (4-6 weeks)**: MVP with <100ms latency
- **Phase 2 (3-4 weeks)**: Hardware acceleration, <50ms latency
- **Phase 3 (4-5 weeks)**: Low-latency optimizations, 25-35ms latency
- **Phase 4 (5-7 weeks)**: Ultra-low latency tuning, 15-25ms latency

Critical P0 optimizations for achieving 15-25ms latency:
1. Hardware encoder with zero B-frames, 15-frame GOP
2. DMA-BUF zero-copy capture pipeline
3. Minimal buffering (1 frame sender, 0-10ms jitter)
4. WebRTC low-latency configuration (0-20ms playout delay)

---

## Additional Resources

### Wayland & PipeWire
- [Wayland Protocol](https://wayland.freedesktop.org/docs/html/)
- [PipeWire Documentation](https://docs.pipewire.org/)
- [xdg-desktop-portal](https://flatpak.github.io/xdg-desktop-portal/)

### WebRTC
- [WebRTC Specifications](https://www.w3.org/TR/webrtc/)
- [webrtc-rs](https://github.com/webrtc-rs/webrtc)
- [WebRTC for the Curious](https://webrtcforthecurious.com/)

### Video Encoding
- [VA-API](https://github.com/intel/libva)
- [NVENC](https://developer.nvidia.com/nvidia-video-codec-sdk)
- [x264](https://www.videolan.org/developers/x264.html)

### Rust
- [Tokio](https://tokio.rs/)
- [Bytes](https://docs.rs/bytes/)
- [Async Rust Book](https://rust-lang.github.io/async-book/)