Files
wl-webrtc/DESIGN.md
2026-02-03 11:14:25 +08:00

3077 lines
86 KiB
Markdown

# Wayland → WebRTC Remote Desktop Backend
## Technical Design Document
## Table of Contents
1. [System Architecture](#system-architecture)
2. [Technology Stack](#technology-stack)
3. [Key Components Design](#key-components-design)
4. [Data Flow Optimization](#data-flow-optimization)
5. [Low Latency Optimization](#low-latency-optimization)
6. [Implementation Roadmap](#implementation-roadmap)
7. [Potential Challenges & Solutions](#potential-challenges--solutions)
---
## System Architecture
### High-Level Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ Client Browser │
│ (WebRTC Receiver) │
└─────────────────────────────┬───────────────────────────────────────┘
│ WebRTC (UDP/TCP)
│ Signaling (WebSocket/HTTP)
┌─────────────────────────────────────────────────────────────────────┐
│ Signaling Server │
│ (WebSocket/WebSocket Secure) │
│ - Session Management │
│ - SDP Exchange │
│ - ICE Candidates │
└─────────────────────────────┬───────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Rust Backend Server │
├─────────────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Capture │ │ Encoder │ │ WebRTC │ │
│ │ Manager │───▶│ Pipeline │───▶│ Transport │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PipeWire │ │ Video │ │ Data │ │
│ │ Portal │ │ Encoder │ │ Channels │ │
│ │ (xdg- │ │ (H.264/ │ │ (Input/ │ │
│ │ desktop- │ │ H.265/VP9) │ │ Control) │ │
│ │ portal) │ └──────────────┘ └──────────────┘ │
│ └──────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Zero-Copy Buffer Manager │ │
│ │ - DMA-BUF Import/Export │ │
│ │ - Shared Memory Pools │ │
│ │ - Memory Ownership Tracking │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Wayland Compositor │
│ (PipeWire Screen Sharing) │
└─────────────────────────────────────────────────────────────────────┘
```
### Component Breakdown
#### 1. Capture Manager
**Responsibilities:**
- Interface with PipeWire xdg-desktop-portal
- Request screen capture permissions
- Receive DMA-BUF frames
- Manage frame buffer lifecycle
**Key Technologies:**
- `pipewire` crate for PipeWire protocol
- `wayland-client` for Wayland protocol
- `ashpd` for desktop portals
```rust
pub struct CaptureManager {
pipewire_connection: Rc<PipewireConnection>,
stream_handle: Option<StreamHandle>,
frame_sender: async_channel::Sender<CapturedFrame>,
config: CaptureConfig,
}
pub struct CaptureConfig {
pub frame_rate: u32,
pub quality: QualityLevel,
pub screen_region: Option<ScreenRegion>,
}
pub enum QualityLevel {
Low,
Medium,
High,
Ultra,
}
pub struct CapturedFrame {
pub dma_buf: DmaBufHandle,
pub width: u32,
pub height: u32,
pub format: PixelFormat,
pub timestamp: u64,
}
```
#### 2. Encoder Pipeline
**Responsibilities:**
- Receive raw frames from capture
- Encode to H.264/H.265/VP9
- Hardware acceleration (VA-API, NVENC, VideoToolbox)
- Bitrate adaptation
**Zero-Copy Strategy:**
- Direct DMA-BUF to encoder (no CPU copies)
- Encoder outputs to memory-mapped buffers
- WebRTC consumes encoded buffers directly
```rust
pub struct EncoderPipeline {
encoder: Box<dyn VideoEncoder>,
config: EncoderConfig,
stats: EncoderStats,
}
pub trait VideoEncoder: Send + Sync {
fn encode_frame(
&mut self,
frame: CapturedFrame,
) -> Result<EncodedFrame, EncoderError>;
fn set_bitrate(&mut self, bitrate: u32) -> Result<(), EncoderError>;
fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}
pub struct EncodedFrame {
pub data: Bytes, // Zero-copy Bytes wrapper
pub is_keyframe: bool,
pub timestamp: u64,
pub sequence_number: u64,
}
```
#### 3. WebRTC Transport
**Responsibilities:**
- WebRTC peer connection management
- Media track (video) and data channels
- RTP packetization
- ICE/STUN/TURN handling
- Congestion control
**Libraries:**
- `webrtc` crate (webrtc-rs) or custom WebRTC implementation
```rust
pub struct WebRtcTransport {
peer_connection: RTCPeerConnection,
video_track: RTCVideoTrack,
data_channel: Option<RTCDataChannel>,
config: WebRtcConfig,
}
pub struct WebRtcConfig {
pub stun_servers: Vec<String>,
pub turn_servers: Vec<TurnServer>,
pub ice_transport_policy: IceTransportPolicy,
}
pub struct TurnServer {
pub urls: Vec<String>,
pub username: String,
pub credential: String,
}
```
#### 4. Zero-Copy Buffer Manager
**Responsibilities:**
- Manage DMA-BUF lifecycle
- Pool pre-allocated memory
- Track ownership via Rust types
- Coordinate with PipeWire memory pools
```rust
pub struct BufferManager {
dma_buf_pool: Pool<DmaBufHandle>,
encoded_buffer_pool: Pool<Bytes>,
max_buffers: usize,
}
impl BufferManager {
pub fn acquire_dma_buf(&self) -> Option<DmaBufHandle> {
self.dma_buf_pool.acquire()
}
pub fn release_dma_buf(&self, handle: DmaBufHandle) {
self.dma_buf_pool.release(handle)
}
pub fn acquire_encoded_buffer(&self, size: usize) -> Option<Bytes> {
self.encoded_buffer_pool.acquire_with_size(size)
}
}
```
### Data Flow
```
Wayland Compositor
│ DMA-BUF (GPU memory)
PipeWire Portal
│ DMA-BUF file descriptor
Capture Manager
│ CapturedFrame { dma_buf, ... }
│ (Zero-copy ownership transfer)
Buffer Manager
│ DmaBufHandle (moved, not copied)
Encoder Pipeline
│ EncodedFrame { data: Bytes, ... }
│ (Zero-copy Bytes wrapper)
WebRTC Transport
│ RTP Packets (reference to Bytes)
Network (UDP/TCP)
Client Browser
```
---
## Technology Stack
### Core Dependencies
```toml
[dependencies]
# Async Runtime
tokio = { version = "1.35", features = ["full", "rt-multi-thread"] }
async-trait = "0.1"
# Wayland & PipeWire
wayland-client = "0.31"
wayland-protocols = "0.31"
pipewire = "0.8"
ashpd = "0.8"
# Video Encoding (Low Latency)
openh264 = { version = "0.6", optional = true }
x264 = { version = "0.4", optional = true }
nvenc = { version = "0.1", optional = true }
vpx = { version = "0.1", optional = true }
# Hardware Acceleration (Low Latency)
libva = { version = "0.14", optional = true } # VA-API
nvidia-encode = { version = "0.5", optional = true } # NVENC
# WebRTC (Low Latency Configuration)
webrtc = "0.11" # webrtc-rs
# Memory & Zero-Copy
bytes = "1.5"
memmap2 = "0.9"
shared_memory = "0.12"
# Lock-free data structures for minimal contention
crossbeam = { version = "0.8", features = ["std"] }
crossbeam-channel = "0.5"
crossbeam-queue = "0.3"
parking_lot = "0.12" # Faster mutexes
# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
# Logging & Tracing
tracing = "0.1"
tracing-subscriber = "0.3"
tracing-opentelemetry = "0.22" # For latency monitoring
# Metrics & Monitoring
prometheus = { version = "0.13", optional = true }
metrics = "0.21"
# Error Handling
anyhow = "1.0"
thiserror = "1.0"
# Utilities
regex = "1.10"
uuid = { version = "1.6", features = ["v4", "serde", "fast-rng"] }
instant = "0.1" # High-precision timing
[features]
default = ["software-encoder", "webrtc-rs"]
# Encoder Options
software-encoder = ["x264", "openh264"]
hardware-vaapi = ["libva"]
hardware-nvenc = ["nvidia-encode"]
all-encoders = ["software-encoder", "hardware-vaapi", "hardware-nvenc"]
# WebRTC Implementation
webrtc-rs = ["webrtc"]
custom-webrtc = []
# Low Latency Features
low-latency = []
ultra-low-latency = ["low-latency", "all-encoders"]
# Monitoring
monitoring = ["prometheus", "tracing-opentelemetry"]
# Development
dev = ["monitoring", "all-encoders"]
```
### Encoder Options
| Encoder | Hardware | Performance | Quality | License | Use Case |
|---------|----------|-------------|---------|---------|----------|
| H.264 (x264) | CPU | Medium | High | GPL | Fallback |
| H.264 (VA-API) | GPU | High | Medium | Open Source | Linux Intel/AMD |
| H.264 (NVENC) | GPU (NVIDIA) | Very High | High | Proprietary | NVIDIA GPUs |
| H.265 (HEVC) | GPU | High | Very High | Mixed | Bandwidth-constrained |
| VP9 | CPU/GPU | Medium | High | BSD | Open Web |
| AV1 | GPU | Medium | Very High | Open Source | Future-proof |
**Recommended Primary:** VA-API H.264 (Linux), NVENC H.264 (NVIDIA)
**Recommended Fallback:** x264 H.264 (software)
### WebRTC Libraries
**Option 1: webrtc-rs** (Recommended)
- Pure Rust implementation
- Active development
- Good WebRTC spec compliance
- Zero-copy support for media
**Option 2: Custom Implementation**
- Use `webrtc` crate as base
- Add specialized zero-copy optimizations
- Tighter integration with encoder pipeline
---
## Key Components Design
### 1. Wayland Screen Capture Module
```rust
// src/capture/mod.rs
use pipewire as pw;
use pipewire::properties;
use pipewire::spa::param::format::Format;
use pipewire::stream::StreamFlags;
use async_channel::{Sender, Receiver};
pub struct WaylandCapture {
core: pw::Core,
context: pw::Context,
main_loop: pw::MainLoop,
stream: pw::stream::Stream,
frame_sender: Sender<CapturedFrame>,
frame_receiver: Receiver<CapturedFrame>,
}
impl WaylandCapture {
pub async fn new(config: CaptureConfig) -> Result<Self, CaptureError> {
let main_loop = pw::MainLoop::new()?;
let context = pw::Context::new(&main_loop)?;
let core = context.connect(None)?;
// Request screen capture via xdg-desktop-portal
let portal = Portal::new().await?;
let session = portal.create_session(ScreenCaptureType::Monitor).await?;
let sources = portal.request_sources(&session).await?;
let (sender, receiver) = async_channel::bounded(30);
Ok(Self {
core,
context,
main_loop,
stream: Self::create_stream(&context, &session, sender.clone())?,
frame_sender: sender,
frame_receiver: receiver,
})
}
fn create_stream(
context: &pw::Context,
session: &Session,
sender: Sender<CapturedFrame>,
) -> Result<pw::stream::Stream, pw::Error> {
let mut stream = pw::stream::Stream::new(
context,
"wl-webrtc-capture",
properties! {
*pw::keys::MEDIA_TYPE => "Video",
*pw::keys::MEDIA_CATEGORY => "Capture",
*pw::keys::MEDIA_ROLE => "Screen",
},
)?;
stream.connect(
pw::spa::direction::Direction::Input,
None,
StreamFlags::AUTOCONNECT | StreamFlags::MAP_BUFFERS,
)?;
// Set up callback for new frames (zero-copy DMA-BUF)
let listener = stream.add_local_listener()?;
listener
.register(pw::stream::events::Events::param_done, |data| {
// Handle stream parameter changes
})
.register(pw::stream::events::Events::process, |data| {
// Process new frame - DMA-BUF is already mapped
Self::process_frame(data, sender.clone());
})?;
Ok(stream)
}
fn process_frame(
stream: &pw::stream::Stream,
sender: Sender<CapturedFrame>,
) {
// Get buffer without copying - DMA-BUF is in GPU memory
let buffer = stream.dequeue_buffer().expect("no buffer");
let datas = buffer.datas();
let data = &datas[0];
// Create zero-copy frame
let frame = CapturedFrame {
dma_buf: DmaBufHandle::from_buffer(buffer),
width: stream.format().unwrap().size().width,
height: stream.format().unwrap().size().height,
format: PixelFormat::from_spa_format(&stream.format().unwrap()),
timestamp: timestamp_ns(),
};
// Send frame (ownership transferred via move)
let _ = sender.try_send(frame);
}
pub async fn next_frame(&self) -> CapturedFrame {
self.frame_receiver.recv().await.unwrap()
}
}
// Zero-copy DMA-BUF handle
pub struct DmaBufHandle {
fd: RawFd,
size: usize,
stride: u32,
offset: u32,
}
impl DmaBufHandle {
pub fn from_buffer(buffer: &pw::buffer::Buffer) -> Self {
let data = &buffer.datas()[0];
Self {
fd: data.fd().unwrap(),
size: data.chunk().size() as usize,
stride: data.chunk().stride(),
offset: data.chunk().offset(),
}
}
pub unsafe fn as_ptr(&self) -> *mut u8 {
// Memory map the DMA-BUF
let ptr = libc::mmap(
ptr::null_mut(),
self.size,
libc::PROT_READ,
libc::MAP_SHARED,
self.fd,
self.offset as i64,
);
if ptr == libc::MAP_FAILED {
panic!("Failed to mmap DMA-BUF");
}
ptr as *mut u8
}
}
impl Drop for DmaBufHandle {
fn drop(&mut self) {
// Unmap and close FD when handle is dropped
unsafe {
libc::munmap(ptr::null_mut(), self.size);
libc::close(self.fd);
}
}
}
```
### 2. Frame Buffer Management (Zero-Copy)
```rust
// src/buffer/mod.rs
use bytes::Bytes;
use std::sync::Arc;
use std::collections::VecDeque;
pub struct FrameBufferPool {
dma_bufs: VecDeque<DmaBufHandle>,
encoded_buffers: VecDeque<Bytes>,
max_dma_bufs: usize,
max_encoded: usize,
}
impl FrameBufferPool {
pub fn new(max_dma_bufs: usize, max_encoded: usize) -> Self {
Self {
dma_bufs: VecDeque::with_capacity(max_dma_bufs),
encoded_buffers: VecDeque::with_capacity(max_encoded),
max_dma_bufs,
max_encoded,
}
}
pub fn acquire_dma_buf(&mut self) -> Option<DmaBufHandle> {
self.dma_bufs.pop_front()
}
pub fn release_dma_buf(&mut self, buf: DmaBufHandle) {
if self.dma_bufs.len() < self.max_dma_bufs {
self.dma_bufs.push_back(buf);
}
// Else: Drop the buffer, let OS reclaim DMA-BUF
}
pub fn acquire_encoded_buffer(&mut self, size: usize) -> Bytes {
// Try to reuse existing buffer
if let Some(mut buf) = self.encoded_buffers.pop_front() {
if buf.len() >= size {
// Slice to requested size (zero-copy view)
return buf.split_to(size);
}
}
// Allocate new buffer if needed
Bytes::from(vec![0u8; size])
}
pub fn release_encoded_buffer(&mut self, buf: Bytes) {
if self.encoded_buffers.len() < self.max_encoded {
self.encoded_buffers.push_back(buf);
}
// Else: Drop the buffer, memory freed
}
}
// Zero-copy frame wrapper
pub struct ZeroCopyFrame {
pub data: Bytes, // Reference-counted, no copying
pub metadata: FrameMetadata,
}
pub struct FrameMetadata {
pub width: u32,
pub height: u32,
pub format: PixelFormat,
pub timestamp: u64,
pub is_keyframe: bool,
}
// Smart pointer for DMA-BUF
pub struct DmaBufPtr {
ptr: *mut u8,
len: usize,
_marker: PhantomData<&'static mut [u8]>,
}
impl DmaBufPtr {
pub unsafe fn new(ptr: *mut u8, len: usize) -> Self {
Self {
ptr,
len,
_marker: PhantomData,
}
}
pub fn as_slice(&self) -> &[u8] {
unsafe { std::slice::from_raw_parts(self.ptr, self.len) }
}
}
unsafe impl Send for DmaBufPtr {}
unsafe impl Sync for DmaBufPtr {}
impl Drop for DmaBufPtr {
fn drop(&mut self) {
// Memory will be unmapped by DmaBufHandle's Drop
}
}
```
### 3. Video Encoder Integration
```rust
// src/encoder/mod.rs
use async_trait::async_trait;
pub enum EncoderType {
H264_VAAPI,
H264_NVENC,
H264_X264,
VP9_VAAPI,
}
pub struct EncoderConfig {
pub encoder_type: EncoderType,
pub bitrate: u32,
pub keyframe_interval: u32,
pub preset: EncodePreset,
}
pub enum EncodePreset {
Ultrafast,
Superfast,
Veryfast,
Faster,
Fast,
Medium,
Slow,
Slower,
Veryslow,
}
#[async_trait]
pub trait VideoEncoder: Send + Sync {
async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError>;
async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError>;
async fn request_keyframe(&mut self) -> Result<(), EncoderError>;
}
pub struct VaapiEncoder {
display: va::Display,
context: va::Context,
config: EncoderConfig,
sequence_number: u64,
}
impl VaapiEncoder {
pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
let display = va::Display::open(None)?;
let context = va::Context::new(&display)?;
Ok(Self {
display,
context,
config,
sequence_number: 0,
})
}
}
#[async_trait]
impl VideoEncoder for VaapiEncoder {
async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
// Zero-copy: Import DMA-BUF directly into VA-API surface
let surface = unsafe {
self.context.import_dma_buf(
frame.dma_buf.fd,
frame.width,
frame.height,
frame.format.as_va_format(),
)?
};
// Encode frame (hardware accelerated)
let encoded_data = self.context.encode_surface(surface)?;
// Create zero-copy Bytes wrapper
let bytes = Bytes::from(encoded_data);
self.sequence_number += 1;
Ok(EncodedFrame {
data: bytes,
is_keyframe: surface.is_keyframe(),
timestamp: frame.timestamp,
sequence_number: self.sequence_number,
})
}
async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
self.config = config;
self.context.set_bitrate(config.bitrate)?;
self.context.set_preset(config.preset)?;
Ok(())
}
async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
self.context.force_keyframe()?;
Ok(())
}
}
// Fallback software encoder
pub struct X264Encoder {
encoder: x264::Encoder,
config: EncoderConfig,
sequence_number: u64,
}
impl X264Encoder {
pub fn new(config: EncoderConfig) -> Result<Self, EncoderError> {
let params = x264::Params::default();
params.set_width(1920);
params.set_height(1080);
params.set_fps(60, 1);
params.set_bitrate(config.bitrate);
params.set_preset(config.preset);
params.set_tune("zerolatency");
let encoder = x264::Encoder::open(&params)?;
Ok(Self {
encoder,
config,
sequence_number: 0,
})
}
}
#[async_trait]
impl VideoEncoder for X264Encoder {
async fn encode(&mut self, frame: CapturedFrame) -> Result<EncodedFrame, EncoderError> {
// Map DMA-BUF to CPU memory (one-time copy)
let ptr = unsafe { frame.dma_buf.as_ptr() };
let slice = unsafe { std::slice::from_raw_parts(ptr, frame.dma_buf.size) };
// Convert to YUV if needed
let yuv_frame = self.convert_to_yuv(slice, frame.width, frame.height)?;
// Encode frame
let encoded_data = self.encoder.encode(&yuv_frame)?;
self.sequence_number += 1;
Ok(EncodedFrame {
data: Bytes::from(encoded_data),
is_keyframe: self.encoder.is_keyframe(),
timestamp: frame.timestamp,
sequence_number: self.sequence_number,
})
}
async fn reconfigure(&mut self, config: EncoderConfig) -> Result<(), EncoderError> {
self.config = config;
// Reopen encoder with new params
Ok(())
}
async fn request_keyframe(&mut self) -> Result<(), EncoderError> {
self.encoder.force_keyframe();
Ok(())
}
}
```
### 4. WebRTC Signaling and Data Transport
```rust
// src/webrtc/mod.rs
use webrtc::{
api::APIBuilder,
ice_transport::ice_server::RTCIceServer,
media_track::{track_local::track_local_static_sample::TrackLocalStaticSample, TrackLocal},
peer_connection::{
configuration::RTCConfiguration,
peer_connection_state::RTCPeerConnectionState,
sdp::session_description::RTCSessionDescription,
RTCPeerConnection,
},
rtp_transceiver::rtp_codec::RTCRtpCodecCapability,
};
pub struct WebRtcServer {
api: webrtc::API,
peer_connections: Arc<Mutex<HashMap<String, PeerConnection>>>,
signaling_server: SignalingServer,
}
impl WebRtcServer {
pub async fn new(config: WebRtcConfig) -> Result<Self, WebRtcError> {
let mut api = APIBuilder::new().build();
let signaling_server = SignalingServer::new(config.signaling_addr).await?;
Ok(Self {
api,
peer_connections: Arc::new(Mutex::new(HashMap::new())),
signaling_server,
})
}
pub async fn create_peer_connection(
&self,
session_id: String,
video_track: TrackLocalStaticSample,
) -> Result<String, WebRtcError> {
let config = RTCConfiguration {
ice_servers: vec![RTCIceServer {
urls: self.signaling_server.stun_servers(),
..Default::default()
}],
..Default::default()
};
let pc = self.api.new_peer_connection(config).await?;
// Add video track
let rtp_transceiver = pc
.add_track(Arc::new(video_track))
.await?;
// Set ICE candidate handler
let peer_connections = self.peer_connections.clone();
pc.on_ice_candidate(Box::new(move |candidate| {
let peer_connections = peer_connections.clone();
Box::pin(async move {
if let Some(candidate) = candidate {
// Send candidate to signaling server
// ...
}
})
}))
.await;
// Store peer connection
self.peer_connections
.lock()
.await
.insert(session_id.clone(), PeerConnection::new(pc));
Ok(session_id)
}
pub async fn send_video_frame(
&self,
session_id: &str,
frame: EncodedFrame,
) -> Result<(), WebRtcError> {
let peer_connections = self.peer_connections.lock().await;
if let Some(peer) = peer_connections.get(session_id) {
peer.video_track.write_sample(&webrtc::media::Sample {
data: frame.data.to_vec(),
duration: std::time::Duration::from_nanos(frame.timestamp),
..Default::default()
}).await?;
}
Ok(())
}
}
pub struct PeerConnection {
pc: RTCPeerConnection,
video_track: Arc<TrackLocalStaticSample>,
data_channel: Option<Arc<RTCDataChannel>>,
}
impl PeerConnection {
pub async fn create_offer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
let offer = self.pc.create_offer(None).await?;
self.pc.set_local_description(offer.clone()).await?;
Ok(offer)
}
pub async fn set_remote_description(
&mut self,
desc: RTCSessionDescription,
) -> Result<(), WebRtcError> {
self.pc.set_remote_description(desc).await
}
pub async fn create_answer(&mut self) -> Result<RTCSessionDescription, WebRtcError> {
let answer = self.pc.create_answer(None).await?;
self.pc.set_local_description(answer.clone()).await?;
Ok(answer)
}
}
// Data channel for input/control
pub struct DataChannelManager {
channels: HashMap<String, Arc<RTCDataChannel>>,
}
impl DataChannelManager {
pub async fn send_input(&self, channel_id: &str, input: InputEvent) -> Result<()> {
if let Some(channel) = self.channels.get(channel_id) {
let data = serde_json::to_vec(&input)?;
channel.send(&data).await?;
}
Ok(())
}
pub fn on_input<F>(&mut self, channel_id: String, callback: F)
where
F: Fn(InputEvent) + Send + Sync + 'static,
{
if let Some(channel) = self.channels.get(&channel_id) {
channel.on_message(Box::new(move |msg| {
if let Ok(input) = serde_json::from_slice::<InputEvent>(&msg.data) {
callback(input);
}
})).unwrap();
}
}
}
#[derive(Debug, Serialize, Deserialize)]
pub enum InputEvent {
MouseMove { x: f32, y: f32 },
MouseClick { button: MouseButton },
KeyPress { key: String },
KeyRelease { key: String },
}
#[derive(Debug, Serialize, Deserialize)]
pub enum MouseButton {
Left,
Right,
Middle,
}
```
### 5. IPC Layer (Optional)
```rust
// src/ipc/mod.rs
use tokio::net::UnixListener;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
pub struct IpcServer {
listener: UnixListener,
}
impl IpcServer {
pub async fn new(socket_path: &str) -> Result<Self> {
// Remove existing socket if present
let _ = std::fs::remove_file(socket_path);
let listener = UnixListener::bind(socket_path)?;
Ok(Self { listener })
}
pub async fn run(&self, sender: async_channel::Sender<IpcMessage>) {
loop {
match self.listener.accept().await {
Ok((mut stream, _)) => {
let sender = sender.clone();
tokio::spawn(async move {
let mut buf = [0; 1024];
loop {
match stream.read(&mut buf).await {
Ok(0) => break,
Ok(n) => {
if let Ok(msg) = serde_json::from_slice::<IpcMessage>(&buf[..n]) {
let _ = sender.send(msg).await;
}
}
Err(_) => break,
}
}
});
}
Err(_) => continue,
}
}
}
}
#[derive(Debug, Serialize, Deserialize)]
pub enum IpcMessage {
StartCapture { session_id: String },
StopCapture { session_id: String },
SetQuality { level: QualityLevel },
GetStatus,
}
```
---
## Data Flow Optimization
### Zero-Copy Pipeline Stages
```
Stage 1: Capture
Input: Wayland Compositor (GPU memory)
Output: DMA-BUF file descriptor
Copy: None (Zero-copy)
Stage 2: Buffer Manager
Input: DMA-BUF FD
Output: DmaBufHandle (RAII wrapper)
Copy: None (Zero-copy ownership transfer)
Stage 3: Encoder
Input: DmaBufHandle
Output: Bytes (reference-counted)
Copy: None (DMA-BUF imported directly to GPU encoder)
Stage 4: WebRTC
Input: Bytes
Output: RTP packets (references to Bytes)
Copy: None (Zero-copy to socket buffers)
Stage 5: Network
Input: RTP packets
Output: UDP datagrams
Copy: Minimal (kernel space only)
```
### Memory Ownership Transfer
```rust
// Example: Ownership transfer through pipeline
async fn process_frame_pipeline(
mut capture: WaylandCapture,
mut encoder: VaapiEncoder,
mut webrtc: WebRtcServer,
) -> Result<()> {
loop {
// Stage 1: Capture (ownership moves from PipeWire to our code)
let frame = capture.next_frame().await; // CapturedFrame owns DmaBufHandle
// Stage 2: Encode (ownership moved, not copied)
let encoded = encoder.encode(frame).await?; // EncodedFrame owns Bytes
// Stage 3: Send (Bytes is reference-counted, no copy)
webrtc.send_video_frame("session-123", encoded).await?;
// Ownership transferred all the way without copying
}
}
```
### Buffer Sharing Mechanisms
#### 1. DMA-BUF (Primary)
- GPU memory buffers
- Exported as file descriptors
- Zero-copy to hardware encoders
- Limited to same GPU/driver
```rust
pub fn export_dma_buf(surface: &va::Surface) -> Result<DmaBufHandle> {
let fd = surface.export_dma_buf()?;
Ok(DmaBufHandle {
fd,
size: surface.size(),
stride: surface.stride(),
offset: 0,
})
}
```
#### 2. Shared Memory (Fallback)
- POSIX shared memory (shm_open)
- For software encoding path
- Copy from DMA-BUF to shared memory
```rust
pub fn create_shared_buffer(size: usize) -> Result<SharedBuffer> {
let name = format!("/wl-webrtc-{}", uuid::Uuid::new_v4());
let fd = shm_open(&name, O_CREAT | O_RDWR, 0666)?;
ftruncate(fd, size as i64)?;
let ptr = unsafe { mmap(ptr::null_mut(), size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) };
Ok(SharedBuffer {
ptr,
size,
fd,
name,
})
}
```
#### 3. Memory-Mapped Files (Alternative)
- For persistent caching
- Cross-process communication
- Used for frame buffering
```rust
pub struct MappedFile {
file: File,
ptr: *mut u8,
size: usize,
}
impl MappedFile {
pub fn new(path: &Path, size: usize) -> Result<Self> {
let file = OpenOptions::new()
.read(true)
.write(true)
.create(true)
.open(path)?;
file.set_len(size as u64)?;
let ptr = unsafe {
mmap(
ptr::null_mut(),
size,
PROT_READ | PROT_WRITE,
MAP_SHARED,
file.as_raw_fd(),
0,
)
}?;
Ok(Self { file, ptr, size })
}
}
```
### Pipeline Optimization Strategies
#### 1. Parallel Encoding
```rust
// Run multiple encoders in parallel for different quality levels
pub struct AdaptiveEncoder {
encoders: Vec<Box<dyn VideoEncoder>>,
active_encoder: usize,
bandwidth_monitor: BandwidthMonitor,
}
impl AdaptiveEncoder {
pub async fn encode_adaptive(&mut self, frame: CapturedFrame) -> Result<EncodedFrame> {
let bandwidth = self.bandwidth_monitor.current_bandwidth();
// Switch encoder based on bandwidth
let new_encoder = match bandwidth {
b if b < 500_000 => 0, // Low bitrate
b if b < 2_000_000 => 1, // Medium bitrate
_ => 2, // High bitrate
};
if new_encoder != self.active_encoder {
self.active_encoder = new_encoder;
}
self.encoders[self.active_encoder].encode(frame).await
}
}
```
#### 2. Frame Skipping
```rust
pub struct FrameSkipper {
target_fps: u32,
last_frame_time: Instant,
skip_count: u32,
}
impl FrameSkipper {
pub fn should_skip(&mut self) -> bool {
let now = Instant::now();
let elapsed = now.duration_since(self.last_frame_time).as_millis();
let frame_interval = 1000 / self.target_fps as u128;
if elapsed < frame_interval {
self.skip_count += 1;
return true;
}
self.last_frame_time = now;
self.skip_count = 0;
false
}
}
```
#### 3. Region of Interest (ROI)
```rust
pub struct RegionEncoder {
full_encoder: Box<dyn VideoEncoder>,
roi_encoder: Box<dyn VideoEncoder>,
current_region: Option<ScreenRegion>,
}
impl RegionEncoder {
pub async fn encode_roi(
&mut self,
frame: CapturedFrame,
roi: Option<ScreenRegion>,
) -> Result<EncodedFrame> {
if let Some(region) = roi {
// Encode only ROI with higher quality
let cropped = self.crop_frame(frame, region)?;
self.roi_encoder.encode(cropped).await
} else {
// Encode full frame
self.full_encoder.encode(frame).await
}
}
fn crop_frame(&self, mut frame: CapturedFrame, region: ScreenRegion) -> Result<CapturedFrame> {
// Adjust DMA-BUF offsets for region
frame.width = region.width;
frame.height = region.height;
Ok(frame)
}
}
```
---
## Low Latency Optimization
### Design Philosophy
To achieve 15-25ms latency on local networks, we prioritize:
1. **Speed over completeness**: Fast, low-latency delivery is more important than perfect reliability
2. **Minimize buffering**: Small buffers at every stage
3. **Zero-copy everywhere**: Eliminate CPU memory copies
4. **Hardware acceleration**: Use GPU for all intensive operations
5. **Predictive timing**: Reduce wait times with accurate timing
### 1. Encoder Optimization
#### Hardware Encoder Configuration
```rust
pub struct LowLatencyEncoderConfig {
// Codec settings
pub codec: VideoCodec,
// Low-latency specific
pub gop_size: u32, // Small GOP: 8-15 frames
pub b_frames: u32, // Zero B-frames for minimal latency
pub max_b_frames: u32, // Always 0 for low latency
pub lookahead: u32, // Minimal lookahead: 0-2 frames
// Rate control
pub rc_mode: RateControlMode, // CBR or VBR with strict constraints
pub bitrate: u32, // Adaptive bitrate
pub max_bitrate: u32, // Tight max constraint
pub min_bitrate: u32,
pub vbv_buffer_size: u32, // Very small VBV buffer
pub vbv_max_rate: u32, // Close to bitrate
// Timing
pub fps: u32, // Target FPS (30-60)
pub intra_period: u32, // Keyframe interval
// Quality vs Latency trade-offs
pub preset: EncoderPreset, // Ultrafast/Fast
pub tune: EncoderTune, // zerolatency
pub quality: u8, // Constant quality (CRF) or CQ
}
pub enum VideoCodec {
H264, // Best compatibility, good latency
H265, // Better compression, slightly higher latency
VP9, // Open alternative
}
pub enum RateControlMode {
CBR, // Constant Bitrate - predictable
VBR, // Variable Bitrate - better quality
CQP, // Constant Quantizer - lowest latency
}
pub enum EncoderPreset {
Ultrafast, // Lowest latency, lower quality
Superfast,
Veryfast, // Recommended for 15-25ms
Faster,
}
pub enum EncoderTune {
Zerolatency, // Mandatory for low latency
Film,
Animation,
}
```
#### Recommended Encoder Settings
**VA-API (Intel/AMD) - For 15-25ms latency:**
```c
// libva-specific low-latency settings
VAConfigAttrib attribs[] = {
{VAConfigAttribRTFormat, VA_RT_FORMAT_YUV420},
{VAConfigAttribRateControl, VA_RC_CBR},
{VAConfigAttribEncMaxRefFrames, {1, 0}}, // Min reference frames
{VAConfigAttribEncPackedHeaders, VA_ENC_PACKED_HEADER_SEQUENCE},
};
VAEncSequenceParameterBufferH264 seq_param = {
.intra_period = 15, // Short GOP
.ip_period = 1, // No B-frames
.bits_per_second = 4000000,
.max_num_ref_frames = 1, // Minimal references
.time_scale = 90000,
.num_units_in_tick = 1500, // 60 FPS
};
VAEncPictureParameterBufferH264 pic_param = {
.reference_frames = {
{0, VA_FRAME_PICTURE}, // Single reference
},
.num_ref_idx_l0_active_minus1 = 0,
.num_ref_idx_l1_active_minus1 = 0,
.pic_fields.bits.idr_pic_flag = 0,
.pic_fields.bits.reference_pic_flag = 1,
};
VAEncSliceParameterBufferH264 slice_param = {
.num_ref_idx_l0_active_minus1 = 0,
.num_ref_idx_l1_active_minus1 = 0,
.disable_deblocking_filter_idc = 1, // Faster
};
```
**NVENC (NVIDIA) - For 15-25ms latency:**
```rust
// NVENC low-latency configuration
let mut create_params = NV_ENC_INITIALIZE_PARAMS::default();
create_params.encodeGUID = NV_ENC_CODEC_H264_GUID;
create_params.presetGUID = NV_ENC_PRESET_P4_GUID; // Low latency
let mut config = NV_ENC_CONFIG::default();
config.profileGUID = NV_ENC_H264_PROFILE_BASELINE_GUID; // Faster encoding
config.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CBR;
config.rcParams.averageBitRate = 4000000;
config.rcParams.maxBitRate = 4000000;
config.rcParams.vbvBufferSize = 4000000; // 1 second buffer
config.rcParams.vbvInitialDelay = 0; // Minimal delay
let mut h264_config = unsafe { config.encodeCodecConfig.h264Config };
h264_config.enableIntraRefresh = 1;
h264_config.idrPeriod = 30; // Keyframe every 30 frames
h264_config.repeatSPSPPS = 1;
h264_config.enableConstrainedEncoding = 1;
h264_config.frameNumD = 0;
h264_config.sliceMode = NV_ENC_SLICE_MODE_AUTOSELECT;
// Low-latency specific
h264_config.maxNumRefFrames = 1; // Minimal references
h264_config.idrPeriod = 15; // Shorter GOP
```
**x264 (Software) - For 50-100ms latency:**
```rust
// x264 parameters for low latency
let param = x264_param_t {
i_width: 1920,
i_height: 1080,
i_fps_num: 60,
i_fps_den: 1,
// Rate control
i_bitrate: 4000, // 4 Mbps
i_keyint_max: 15, // Short GOP
b_intra_refresh: 1,
// Low latency
b_repeat_headers: 1,
b_annexb: 1,
i_scenecut_threshold: 0, // Disable scene detection
// No B-frames for latency
i_bframe: 0,
i_bframe_adaptive: 0,
i_bframe_pyramid: 0,
// References
i_frame_reference: 1, // Minimal references
// Preset: ultrafast or superfast
// This is set via preset function
};
// Apply preset
x264_param_apply_preset(&param, "superfast");
x264_param_apply_tune(&param, "zerolatency");
```
#### Dynamic Bitrate vs Latency Trade-offs
```rust
pub struct AdaptiveBitrateController {
target_latency_ms: u32,
current_bitrate: u32,
frame_rate: u32,
network_quality: NetworkQuality,
buffer_depth_ms: u32,
}
pub struct NetworkQuality {
bandwidth_mbps: f64,
latency_ms: u32,
packet_loss_rate: f64,
jitter_ms: u32,
}
impl AdaptiveBitrateController {
pub fn update_target_bitrate(&mut self, measured_latency_ms: u32) -> u32 {
let latency_ratio = measured_latency_ms as f64 / self.target_latency_ms as f64;
if latency_ratio > 1.5 {
// Latency too high - reduce bitrate aggressively
self.current_bitrate = (self.current_bitrate as f64 * 0.7) as u32;
} else if latency_ratio > 1.2 {
// Moderately high - reduce bitrate
self.current_bitrate = (self.current_bitrate as f64 * 0.85) as u32;
} else if latency_ratio < 0.8 {
// Can increase bitrate
self.current_bitrate = (self.current_bitrate as f64 * 1.1) as u32;
}
// Clamp to reasonable bounds
self.current_bitrate = self.current_bitrate.clamp(1000000, 8000000);
self.current_bitrate
}
}
```
### 2. Capture Optimization
#### PipeWire DMA-BUF Zero-Copy
```rust
pub struct LowLatencyCaptureConfig {
pub frame_rate: u32, // 30-60 FPS
pub zero_copy: bool, // Always true
pub track_damage: bool, // Enable damage tracking
pub partial_updates: bool, // Encode only damaged regions
pub buffer_pool_size: usize, // Small pool: 3-5 buffers
}
pub struct DamageTracker {
damaged_regions: VecDeque<ScreenRegion>,
last_frame: Option<DmaBufHandle>,
threshold: u32, // Minimum change size to encode
}
impl DamageTracker {
pub fn update(&mut self, new_frame: &CapturedFrame) -> Vec<ScreenRegion> {
match &self.last_frame {
Some(last) => {
let regions = self.compute_damage_regions(last, new_frame);
self.last_frame = Some(new_frame.dma_buf.clone());
regions
}
None => {
self.last_frame = Some(new_frame.dma_buf.clone());
vec![ScreenRegion {
x: 0,
y: 0,
width: new_frame.width,
height: new_frame.height,
}]
}
}
}
fn compute_damage_regions(&self, last: &DmaBufHandle, new: &CapturedFrame) -> Vec<ScreenRegion> {
// Compare frames and find changed regions
// This can be done efficiently with GPU
// For MVP, we can use a simple block-based comparison
// Block size for comparison (e.g., 16x16 pixels)
let block_size = 16;
let blocks_x = (new.width as usize + block_size - 1) / block_size;
let blocks_y = (new.height as usize + block_size - 1) / block_size;
// Merge adjacent damaged blocks into regions
// ...
vec![] // Placeholder
}
}
```
#### Partial Region Encoding
```rust
pub struct RegionEncoder {
full_encoder: Box<dyn VideoEncoder>,
tile_encoder: Box<dyn VideoEncoder>,
current_regions: Vec<ScreenRegion>,
}
impl RegionEncoder {
pub async fn encode_with_regions(
&mut self,
frame: CapturedFrame,
regions: Vec<ScreenRegion>,
) -> Result<Vec<EncodedTile>> {
let mut encoded_tiles = Vec::new();
if regions.is_empty() || regions.len() > 4 {
// Too many regions or no changes - encode full frame
let encoded = self.full_encoder.encode(frame).await?;
encoded_tiles.push(EncodedTile {
region: ScreenRegion {
x: 0,
y: 0,
width: frame.width,
height: frame.height,
},
data: encoded.data,
is_keyframe: encoded.is_keyframe,
});
} else {
// Encode each damaged region separately
for region in regions {
let cropped = self.crop_frame(&frame, &region)?;
let encoded = self.tile_encoder.encode(cropped).await?;
encoded_tiles.push(EncodedTile {
region,
data: encoded.data,
is_keyframe: encoded.is_keyframe,
});
}
}
Ok(encoded_tiles)
}
fn crop_frame(&self, frame: &CapturedFrame, region: &ScreenRegion) -> Result<CapturedFrame> {
// Adjust DMA-BUF offsets for the region
// This is a zero-copy operation - just metadata changes
Ok(CapturedFrame {
dma_buf: DmaBufHandle::from_region(&frame.dma_buf, region)?,
width: region.width,
height: region.height,
format: frame.format,
timestamp: frame.timestamp,
})
}
}
```
### 3. WebRTC Transport Layer Optimization
#### Low-Latency WebRTC Configuration
```rust
pub struct LowLatencyWebRtcConfig {
// ICE and transport
pub ice_transport_policy: IceTransportPolicy,
pub ice_servers: Vec<IceServer>,
// Media settings
pub video_codecs: Vec<VideoCodecConfig>,
pub max_bitrate: u32,
pub min_bitrate: u32,
pub start_bitrate: u32,
// Buffering - minimize for low latency
pub playout_delay_min_ms: u16, // 0-10ms (default 50ms)
pub playout_delay_max_ms: u16, // 10-20ms (default 200ms)
// Packetization
pub rtp_payload_size: u16, // Smaller packets: 1200 bytes
pub packetization_mode: PacketizationMode,
// Feedback and retransmission
pub nack_enabled: bool, // Limited NACK
pub fec_enabled: bool, // Disable FEC for latency
pub transport_cc_enabled: bool, // Congestion control
// RTCP settings
pub rtcp_report_interval_ms: u32, // Frequent: 50-100ms
}
pub struct VideoCodecConfig {
pub name: String,
pub clock_rate: u32,
pub num_channels: u16,
pub parameters: CodecParameters,
}
impl LowLatencyWebRtcConfig {
pub fn for_ultra_low_latency() -> Self {
Self {
ice_transport_policy: IceTransportPolicy::All,
ice_servers: vec![],
video_codecs: vec![
VideoCodecConfig {
name: "H264".to_string(),
clock_rate: 90000,
num_channels: 1,
parameters: CodecParameters {
profile_level_id: "42e01f".to_string(), // Baseline profile
packetization_mode: 1,
level_asymmetry_allowed: 1,
},
},
],
max_bitrate: 8000000, // 8 Mbps max
min_bitrate: 500000, // 500 Kbps min
start_bitrate: 4000000, // 4 Mbps start
// Critical: Minimal playout delay
playout_delay_min_ms: 0, // No minimum
playout_delay_max_ms: 20, // 20ms maximum
// Smaller packets for lower serialization latency
rtp_payload_size: 1200,
packetization_mode: PacketizationMode::NonInterleaved,
// Limited retransmission
nack_enabled: true, // But limit retransmission window
fec_enabled: false, // Disable FEC - adds latency
transport_cc_enabled: true,
// More frequent RTCP feedback
rtcp_report_interval_ms: 50,
}
}
}
```
#### Packet Loss Handling Strategy
```rust
pub enum LossHandlingStrategy {
PreferLatency, // Drop late frames, prioritize low latency
PreferQuality, // Retransmit, prioritize quality
Balanced, // Adaptive based on network conditions
}
pub struct PacketLossHandler {
strategy: LossHandlingStrategy,
max_retransmission_delay_ms: u32,
nack_window_size: u32,
}
impl PacketLossHandler {
pub fn handle_packet_loss(
&mut self,
sequence_number: u16,
now_ms: u64,
) -> RetransmissionDecision {
match self.strategy {
LossHandlingStrategy::PreferLatency => {
// Don't retransmit if too old
if now_ms > self.max_retransmission_delay_ms as u64 {
RetransmissionDecision::Drop
} else {
RetransmissionDecision::None
}
}
LossHandlingStrategy::PreferQuality => {
// Always try to retransmit
RetransmissionDecision::Request(sequence_number)
}
LossHandlingStrategy::Balanced => {
// Adaptive based on loss rate
RetransmissionDecision::None // Placeholder
}
}
}
}
pub enum RetransmissionDecision {
Request(u16),
Drop,
None,
}
```
#### NACK vs FEC Selection
**Recommendation for 15-25ms latency:**
- **Primary**: Limited NACK
- NACK window: 1-2 frames (16-33ms at 60fps)
- Max retransmission delay: 20ms
- Only retransmit keyframes or critical packets
- **Avoid FEC**:
- Forward Error Correction adds significant latency
- With low-loss LAN, FEC overhead outweighs benefits
- Use NACK selectively instead
```rust
pub struct NackController {
window_size_ms: u32, // 20ms window
max_nack_packets_per_second: u32,
nack_list: VecDeque<(u16, u64)>, // (seq_num, timestamp_ms)
}
impl NackController {
pub fn should_send_nack(&self, seq_num: u16, now_ms: u64) -> bool {
// Check if packet is within NACK window
if let Some(&(_, oldest_ts)) = self.nack_list.front() {
if now_ms - oldest_ts > self.window_size_ms as u64 {
return false; // Too old
}
}
true
}
}
```
### 4. Frame Rate and Buffer Strategy
#### Dynamic Frame Rate Adjustment
```rust
pub struct FrameRateController {
target_fps: u32, // Desired FPS (30-60)
current_fps: u32,
frame_times: VecDeque<Instant>,
last_frame_time: Instant,
min_interval: Duration, // 1 / max_fps
}
impl FrameRateController {
pub fn new(target_fps: u32) -> Self {
let min_interval = Duration::from_micros(1_000_000 / 60); // Max 60 FPS
Self {
target_fps,
current_fps: 30,
frame_times: VecDeque::with_capacity(60),
last_frame_time: Instant::now(),
min_interval,
}
}
pub fn should_capture(&mut self) -> bool {
let now = Instant::now();
let elapsed = now.duration_since(self.last_frame_time);
if elapsed < self.min_interval {
return false; // Too soon
}
// Update frame rate based on conditions
self.adjust_fps_based_on_conditions();
self.last_frame_time = now;
true
}
pub fn adjust_fps_based_on_conditions(&mut self) {
// Check system load, network conditions, etc.
let system_load = self.get_system_load();
let network_quality = self.get_network_quality();
if system_load > 0.8 || network_quality.is_poor() {
self.current_fps = 30; // Reduce frame rate
} else if system_load < 0.5 && network_quality.is_excellent() {
self.current_fps = 60; // Increase frame rate
} else {
self.current_fps = 45; // Balanced
}
}
}
```
#### Fast Frame Dropping Strategy
```rust
pub struct FrameDropper {
target_fps: u32,
adaptive_drop_threshold_ms: u32,
consecutive_drops: u32,
max_consecutive_drops: u32,
}
impl FrameDropper {
pub fn should_drop(&mut self, queue_latency_ms: u32) -> bool {
if queue_latency_ms > self.adaptive_drop_threshold_ms {
if self.consecutive_drops < self.max_consecutive_drops {
self.consecutive_drops += 1;
return true;
}
}
self.consecutive_drops = 0;
false
}
pub fn get_drop_interval(&self) -> u32 {
// Calculate how many frames to drop
match self.target_fps {
60 => 1, // Drop 1 out of every 2
30 => 1, // Drop 1 out of every 2
_ => 0,
}
}
}
```
#### Minimal Buffering
**Sender Side:**
```rust
pub struct SenderBuffer {
max_size_frames: usize, // Very small: 1-2 frames
queue: VecDeque<EncodedFrame>,
target_latency_ms: u32,
}
impl SenderBuffer {
pub fn new() -> Self {
Self {
max_size_frames: 1, // Single frame buffer
queue: VecDeque::with_capacity(2),
target_latency_ms: 5, // 5ms target
}
}
pub fn push(&mut self, frame: EncodedFrame) -> Result<()> {
if self.queue.len() >= self.max_size_frames {
// Drop oldest frame to maintain low latency
self.queue.pop_front();
}
self.queue.push_back(frame);
Ok(())
}
pub fn pop(&mut self) -> Option<EncodedFrame> {
self.queue.pop_front()
}
}
```
**Receiver Side (Jitter Buffer):**
```rust
pub struct MinimalJitterBuffer {
target_delay_ms: u32, // 0-10ms
min_delay_ms: u32, // 0ms
max_delay_ms: u32, // 10-20ms
packets: VecDeque<RtpPacket>,
}
impl MinimalJitterBuffer {
pub fn new() -> Self {
Self {
target_delay_ms: 5, // 5ms target
min_delay_ms: 0, // No minimum
max_delay_ms: 10, // 10ms maximum
packets: VecDeque::with_capacity(10),
}
}
pub fn push(&mut self, packet: RtpPacket) {
if self.packets.len() < self.max_delay_ms as usize / 2 {
self.packets.push_back(packet);
} else {
// Buffer full - drop oldest
self.packets.pop_front();
self.packets.push_back(packet);
}
}
pub fn pop(&mut self) -> Option<RtpPacket> {
self.packets.pop_front()
}
}
```
### 5. Architecture Adjustments
#### Single-Threaded vs Multi-Threaded
**Recommendation: Hybrid Approach**
- **Capture Thread**: Dedicated thread for PipeWire
- **Encoder Thread**: Per-session encoder thread
- **Network Thread**: WebRTC transport thread
- **Coordination**: Lock-free channels for data passing
```rust
pub struct PipelineArchitecture {
capture_thread: JoinHandle<()>,
encoder_threads: Vec<JoinHandle<()>>,
network_thread: JoinHandle<()>,
// Lock-free communication
capture_to_encoder: async_channel::Sender<CapturedFrame>,
encoder_to_network: async_channel::Sender<EncodedFrame>,
}
```
#### Lock Competition Minimization
```rust
// Use lock-free data structures where possible
use crossbeam::queue::SegQueue;
use crossbeam::channel::{bounded, unbounded};
pub struct LockFreeFrameQueue {
queue: SegQueue<CapturedFrame>,
max_size: usize,
}
impl LockFreeFrameQueue {
pub fn push(&self, frame: CapturedFrame) -> Result<()> {
if self.queue.len() >= self.max_size {
return Err(Error::QueueFull);
}
self.queue.push(frame);
Ok(())
}
pub fn pop(&self) -> Option<CapturedFrame> {
self.queue.pop()
}
}
```
#### Async Task Scheduling
```rust
pub struct LowLatencyScheduler {
capture_priority: TaskPriority,
encode_priority: TaskPriority,
network_priority: TaskPriority,
}
impl LowLatencyScheduler {
pub async fn schedule_pipeline(&self) {
tokio::spawn_with_priority(TaskPriority::High, async move {
// Critical path: capture -> encode -> send
});
tokio::spawn_with_priority(TaskPriority::Medium, async move {
// Background tasks: statistics, logging
});
}
}
```
### 6. Technology Stack Adjustments
#### Encoder Selection for Latency
| Encoder | Setup Latency | Per-Frame Latency | Quality | Recommendation |
|---------|--------------|------------------|---------|----------------|
| VA-API H.264 | 1-2ms | 2-3ms | Medium | Primary (Linux) |
| NVENC H.264 | 1-2ms | 1-2ms | High | Primary (NVIDIA) |
| x264 (ultrafast) | 0ms | 5-8ms | Low | Fallback |
| x264 (superfast) | 0ms | 8-12ms | Medium | Fallback |
**Recommendation:**
- **Primary**: VA-API or NVENC H.264 with ultrafast preset
- **Fallback**: x264 with ultrafast preset (accept 30-50ms latency)
#### Direct Wayland vs PipeWire
**Use PipeWire (recommended):**
- Better DMA-BUF support
- Hardware acceleration integration
- Zero-copy through ecosystem
**Direct Wayland (if needed):**
- Lower-level control
- Potentially lower capture latency (0.5-1ms)
- More complex implementation
- No portal integration (security issue)
**Recommendation:** Stick with PipeWire for MVP. Consider direct Wayland only if PipeWire latency is unacceptable.
#### webrtc-rs Latency Characteristics
**Pros:**
- Pure Rust, predictable behavior
- Good zero-copy support
- Customizable buffering
**Cons:**
- May have default buffer settings optimized for reliability
- Need manual configuration for ultra-low latency
**Custom WebRTC Layer (advanced):**
- Full control over buffering and timing
- Can inline packetization
- More complex implementation
**Recommendation:** Use webrtc-rs with low-latency configuration. Only consider custom layer if webrtc-rs cannot achieve targets.
### 7. Implementation Priority
#### P0 (Must-Have for MVP)
1. **Hardware Encoder Integration**
- VA-API H.264 with low-latency settings
- No B-frames, small GOP (15 frames)
- Ultrafast preset
2. **DMA-BUF Zero-Copy**
- PipeWire DMA-BUF import
- Direct encoder feed
- No CPU copies
3. **Minimal Buffering**
- Single frame sender buffer
- 0-5ms jitter buffer
- Fast frame dropping
4. **Low-Latency WebRTC Config**
- playout_delay_min: 0ms
- playout_delay_max: 20ms
- Disable FEC
#### P1 (Important for 15-25ms)
1. **Damage Tracking**
- Partial region updates
- Reduced encoding load
2. **Dynamic Frame Rate**
- 30-60 FPS adaptation
- Network-aware
3. **NACK Control**
- Limited retransmission window (20ms)
- Selective NACK
#### P2 (Nice-to-Have)
1. **Direct Wayland Capture**
- If PipeWire latency insufficient
2. **Custom WebRTC Layer**
- If webrtc-rs insufficient
3. **Advanced Congestion Control**
- SCReAM or Google Congestion Control
### 8. Testing and Validation
#### End-to-End Latency Measurement
```rust
pub struct LatencyMeter {
timestamps: VecDeque<(u64, LatencyStage)>,
}
pub enum LatencyStage {
Capture,
EncodeStart,
EncodeEnd,
Packetize,
NetworkSend,
NetworkReceive,
Depacketize,
DecodeStart,
DecodeEnd,
Display,
}
impl LatencyMeter {
pub fn mark(&mut self, stage: LatencyStage) {
let now = timestamp_ns();
self.timestamps.push_back((now, stage));
}
pub fn calculate_total_latency(&self) -> Duration {
if self.timestamps.len() < 2 {
return Duration::ZERO;
}
let first = self.timestamps.front().unwrap().0;
let last = self.timestamps.back().unwrap().0;
Duration::from_nanos(last - first)
}
}
```
**Measurement Method:**
1. **Timestamp Injection**
- Inject frame ID at capture (visible timestamp on screen)
- Capture at client with camera
- Compare timestamps to calculate round-trip
- Divide by 2 for one-way latency
2. **Network Timestamping**
- Add frame capture time in RTP header extension
- Measure at receiver
- Account for clock skew
3. **Hardware Timestamping**
- Use kernel packet timestamps (SO_TIMESTAMPING)
- Hardware NIC timestamps if available
#### Performance Benchmarking
```rust
#[bench]
fn bench_full_pipeline_latency(b: &mut Bencher) {
let mut pipeline = LowLatencyPipeline::new(config).unwrap();
let mut latencies = Vec::new();
b.iter(|| {
let start = Instant::now();
let frame = pipeline.capture().unwrap();
let encoded = pipeline.encode(frame).unwrap();
pipeline.send(encoded).unwrap();
latencies.push(start.elapsed());
});
let avg_latency = latencies.iter().sum::<Duration>() / latencies.len() as u32;
println!("Average latency: {:?}", avg_latency);
}
```
**Target Benchmarks:**
| Metric | Target | Acceptable |
|--------|--------|------------|
| Capture latency | 2-3ms | <5ms |
| Encode latency | 3-5ms | <8ms |
| Packetize latency | 1-2ms | <3ms |
| Network (LAN) | 0.5-1ms | <2ms |
| Decode latency | 1-2ms | <4ms |
| Display latency | 1-2ms | <4ms |
| **Total** | **15-25ms** | **<30ms** |
#### Tuning Strategy
1. **Baseline Measurement**
- Measure each stage individually
- Identify bottlenecks
2. **Iterative Tuning**
- Tune one parameter at a time
- Measure impact on total latency
- Trade off quality if needed
3. **Validation**
- Test under various network conditions
- Test under system load
- Test with different content (static, dynamic)
4. **Continuous Monitoring**
- Track latency in production
- Alert on degradation
- Adaptive adjustments
---
## Implementation Roadmap (Updated for Low Latency)
### Phase 1: MVP (Minimum Viable Product) - 4-6 weeks
**Goal:** Basic screen capture and WebRTC streaming
**Week 1-2: Core Infrastructure**
- [ ] Project setup (Cargo.toml, directory structure)
- [ ] Tokio async runtime setup
- [ ] Error handling framework (anyhow/thiserror)
- [ ] Logging setup (tracing)
- [ ] Configuration management
**Week 2-3: Wayland Capture**
- [ ] PipeWire xdg-desktop-portal integration
- [ ] Basic screen capture (single monitor)
- [ ] DMA-BUF import/export
- [ ] Frame receiver channel
**Week 3-4: Simple Encoding**
- [ ] x264 software encoder (fallback)
- [ ] Basic frame pipeline (capture → encode)
- [ ] Frame rate limiting
**Week 4-5: WebRTC Transport**
- [ ] webrtc-rs integration
- [ ] Basic peer connection
- [ ] Video track setup
- [ ] Simple signaling (WebSocket)
**Week 5-6: Testing & Integration**
- [ ] End-to-end test (Wayland → WebRTC → Browser)
- [ ] Performance benchmarking
- [ ] Bug fixes
**MVP Deliverables:**
- Working screen capture
- WebRTC streaming to browser
- 15-30 FPS at 720p
- x264 encoding (software)
---
### Phase 2: Hardware Acceleration - 3-4 weeks
**Goal:** GPU-accelerated encoding for better performance
**Week 1-2: VA-API Integration**
- [ ] VA-API encoder implementation
- [ ] DMA-BUF to VA-API surface import
- [ ] H.264 encoding
- [ ] Intel/AMD GPU support
**Week 2-3: NVENC Integration**
- [ ] NVENC encoder for NVIDIA GPUs
- [ ] CUDA memory management
- [ ] NVENC H.264 encoding
**Week 3-4: Encoder Selection**
- [ ] Encoder detection and selection
- [ ] Fallback chain (NVENC → VA-API → x264)
- [ ] Encoder switching at runtime
**Phase 2 Deliverables:**
- GPU-accelerated encoding
- 30-60 FPS at 1080p
- Lower CPU usage
- Adaptive encoder selection
---
### Phase 3: Low Latency Optimization - 4-5 weeks
**Goal:** Achieve 25-35ms latency on local networks
**Week 1: Encoder Low-Latency Configuration (P0)**
- [ ] Configure VA-API/NVENC for <5ms encoding
- [ ] Disable B-frames, set GOP to 15 frames
- [ ] Implement CBR rate control with small VBV buffer
- [ ] Tune encoder preset (ultrafast/superfast)
- [ ] Measure encoder latency independently
**Week 2: Minimal Buffering (P0)**
- [ ] Reduce sender buffer to 1 frame
- [ ] Implement 0-10ms jitter buffer
- [ ] Configure WebRTC playout delay (0-20ms)
- [ ] Disable FEC for latency
- [ ] Test end-to-end latency
**Week 3: Damage Tracking & Partial Updates (P1)**
- [ ] Implement region change detection
- [ ] Add partial region encoding
- [ ] Optimize for static content
- [ ] Benchmark latency improvements
**Week 4: Dynamic Frame Rate & Quality (P1)**
- [ ] Implement adaptive frame rate (30-60fps)
- [ ] Network quality detection
- [ ] Dynamic bitrate vs latency trade-off
- [ ] Fast frame dropping under load
**Week 5: Advanced Optimizations (P1/P2)**
- [ ] Limited NACK window (20ms)
- [ ] Selective packet retransmission
- [ ] RTCP fine-tuning (50ms intervals)
- [ ] Performance profiling
- [ ] Final latency tuning
**Phase 3 Deliverables:**
- 25-35ms latency on LAN
- Zero-copy DMA-BUF pipeline
- Hardware encoder with low-latency config
- Minimal buffering throughout pipeline
- Adaptive quality based on conditions
---
### Phase 4: Production Ready with Ultra Low Latency - 5-7 weeks
**Goal:** Achieve 15-25ms latency while ensuring security, reliability, and deployment readiness
**Week 1-2: Ultra Low Latency Tuning (P1/P2)**
- [ ] Direct Wayland capture evaluation (if needed)
- [ ] Custom WebRTC layer evaluation (if needed)
- [ ] Advanced congestion control (SCReAM/Google CC)
- [ ] Kernel bypass optimization (DPDK/AF_XDP if needed)
- [ ] Final latency optimization and tuning
**Week 2-3: Security**
- [ ] Authentication (JWT, OAuth)
- [ ] Encryption (TLS, DTLS)
- [ ] Session management
- [ ] Access control
- [ ] Security audit and penetration testing
**Week 3-4: Reliability**
- [ ] Error recovery
- [ ] Connection health monitoring
- [ ] Automatic reconnection
- [ ] Graceful degradation with latency awareness
- [ ] Failover mechanisms
**Week 4-5: Monitoring & Debugging**
- [ ] Real-time latency metrics collection
- [ ] Per-stage latency tracking
- [ ] Logging improvements
- [ ] Debug mode with frame inspection
- [ ] Performance dashboard with latency visualization
- [ ] Alerting for latency degradation
**Week 5-6: Deployment**
- [ ] Docker containerization
- [ ] Systemd service
- [ ] Configuration file with low-latency presets
- [ ] Installation scripts
- [ ] Performance tuning documentation
**Week 6-7: Testing**
- [ ] Integration tests
- [ ] Load testing with latency monitoring
- [ ] Cross-browser testing
- [ ] Long-running stability tests
- [ ] Latency regression tests
- [ ] Automated performance benchmarks
**Phase 4 Deliverables:**
- 15-25ms latency on LAN
- Production-ready deployment
- Security features
- Monitoring and observability
- Comprehensive testing
- Latency regression testing
---
### Phase 5: Advanced Features (Optional) - Ongoing
**Potential Features:**
- [ ] Audio capture and streaming
- [ ] Bidirectional input (mouse, keyboard)
- [ ] Clipboard sharing
- [ ] File transfer
- [ ] Recording (save sessions)
- [ ] Multi-user sessions
- [ ] Mobile client support
- [ ] WebRTC data channels for control
- [ ] WebRTC insertable streams (client-side effects)
- [ ] Adaptive resolution
- [ ] H.265/HEVC encoding
- [ ] AV1 encoding
- [ ] Screen region selection
- [ ] Virtual display support
- [ ] Wayland virtual pointer protocol
---
### Testing Strategy
#### Unit Tests
```rust
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_dma_buf_lifecycle() {
let handle = DmaBufHandle::new(/* ... */);
assert_eq!(handle.ref_count(), 1);
let handle2 = handle.clone();
assert_eq!(handle.ref_count(), 2);
drop(handle);
assert_eq!(handle2.ref_count(), 1);
drop(handle2);
// Buffer freed
}
#[tokio::test]
async fn test_encoder_pipeline() {
let config = EncoderConfig {
encoder_type: EncoderType::H264_X264,
bitrate: 2_000_000,
keyframe_interval: 30,
preset: EncodePreset::Fast,
};
let mut encoder = X264Encoder::new(config).unwrap();
let frame = create_test_frame(1920, 1080);
let encoded = encoder.encode(frame).await.unwrap();
assert!(!encoded.data.is_empty());
assert!(encoded.is_keyframe);
}
}
```
#### Integration Tests
```rust
#[tokio::test]
async fn test_full_pipeline() {
// Setup
let capture = WaylandCapture::new(CaptureConfig::default()).await.unwrap();
let encoder = VaapiEncoder::new(EncoderConfig::default()).unwrap();
let webrtc = WebRtcServer::new(WebRtcConfig::default()).await.unwrap();
// Run pipeline for 100 frames
for _ in 0..100 {
let frame = capture.next_frame().await;
let encoded = encoder.encode(frame).await.unwrap();
webrtc.send_video_frame("test-session", encoded).await.unwrap();
}
// Verify
assert_eq!(webrtc.frames_sent(), 100);
}
```
#### Load Testing
```bash
# Simulate 10 concurrent connections
for i in {1..10}; do
cargo test test_full_pipeline --release &
done
wait
```
#### Performance Benchmarks
```rust
#[bench]
fn bench_encode_frame(b: &mut Bencher) {
let mut encoder = X264Encoder::new(config).unwrap();
let frame = create_test_frame(1920, 1080);
b.iter(|| {
encoder.encode(frame.clone()).unwrap()
});
}
```
---
## Potential Challenges & Solutions
### 1. Wayland Protocol Limitations
**Challenge:** Wayland's security model restricts screen capture
**Solution:**
- Use xdg-desktop-portal for permission management
- Implement user prompts for capture authorization
- Support multiple portal backends (GNOME, KDE, etc.)
```rust
pub async fn request_capture_permission() -> Result<bool> {
let portal = Portal::new().await?;
let session = portal.create_session(ScreenCaptureType::Monitor).await?;
// User will see a dialog asking for permission
let sources = portal.request_sources(&session).await?;
Ok(!sources.is_empty())
}
```
**Alternative:** Use PipeWire directly with proper authentication
### 2. Hardware Acceleration Compatibility
**Challenge:** Different GPUs require different APIs (VA-API, NVENC, etc.)
**Solution:**
- Implement multiple encoder backends
- Runtime detection of available encoders
- Graceful fallback to software encoding
```rust
pub fn detect_best_encoder() -> EncoderType {
// Try NVENC first (NVIDIA)
if nvenc::is_available() {
return EncoderType::H264_NVENC;
}
// Try VA-API (Intel/AMD)
if vaapi::is_available() {
return EncoderType::H264_VAAPI;
}
// Fallback to software
EncoderType::H264_X264
}
```
### 3. Cross-Browser WebRTC Compatibility
**Challenge:** Different browsers have different WebRTC implementations
**Solution:**
- Use standardized codecs (H.264, VP8, VP9)
- Implement codec negotiation
- Provide fallback options
```rust
pub fn get_supported_codecs() -> Vec<RTCRtpCodecCapability> {
vec![
RTCRtpCodecCapability {
mime_type: "video/H264".to_string(),
clock_rate: 90000,
..Default::default()
},
RTCRtpCodecCapability {
mime_type: "video/VP9".to_string(),
clock_rate: 90000,
..Default::default()
},
]
}
```
### 4. Security and Authentication
**Challenge:** Secure remote access without exposing desktop to unauthorized users
**Solution:**
- Implement JWT-based authentication
- Use DTLS for media encryption
- Add rate limiting and access control
```rust
pub struct AuthManager {
secret: String,
sessions: Arc<Mutex<HashMap<String, Session>>>,
}
impl AuthManager {
pub fn create_token(&self, user_id: &str) -> Result<String> {
let claims = Claims {
sub: user_id.to_string(),
exp: Utc::now() + chrono::Duration::hours(1),
};
encode(&Header::default(), &claims, &EncodingKey::from_secret(self.secret.as_ref()))
}
pub fn validate_token(&self, token: &str) -> Result<Claims> {
decode::<Claims>(
token,
&DecodingKey::from_secret(self.secret.as_ref()),
&Validation::default(),
)
.map(|data| data.claims)
.map_err(|_| AuthError::InvalidToken)
}
}
```
### 5. Memory Management
**Challenge:** Avoid memory leaks with DMA-BUF and shared memory
**Solution:**
- Use Rust's ownership system
- RAII patterns for resource cleanup
- Buffer pools with limits
```rust
pub struct ScopedDmaBuf {
handle: DmaBufHandle,
}
impl Drop for ScopedDmaBuf {
fn drop(&mut self) {
// Automatically release DMA-BUF
// File descriptor closed
// GPU memory freed
}
}
// Usage ensures cleanup
{
let buf = ScopedDmaBuf::new(/* ... */);
// Use buffer
} // Automatically dropped here
```
### 6. Latency Optimization
**Challenge:** Minimize end-to-end latency
**Solution:**
- Zero-copy pipeline
- Hardware acceleration
- Adaptive quality
- Frame skipping
```rust
pub struct LatencyOptimizer {
target_latency_ms: u32,
current_latency_ms: u32,
}
impl LatencyOptimizer {
pub fn adjust_parameters(&mut self) {
if self.current_latency_ms > self.target_latency_ms {
// Reduce quality to improve latency
self.reduce_bitrate();
self.increase_frame_skipping();
} else {
// Increase quality
self.increase_bitrate();
}
}
}
```
### 7. Network Conditions
**Challenge:** Varying bandwidth and network conditions
**Solution:**
- Adaptive bitrate streaming
- Multiple quality presets
- Congestion control
```rust
pub struct BandwidthMonitor {
measurements: VecDeque<u32>,
window_size: usize,
}
impl BandwidthMonitor {
pub fn update(&mut self, bytes_sent: u32, duration: Duration) {
let bandwidth = bytes_sent * 8 / duration.as_secs() as u32;
self.measurements.push_back(bandwidth);
if self.measurements.len() > self.window_size {
self.measurements.pop_front();
}
}
pub fn average_bandwidth(&self) -> u32 {
if self.measurements.is_empty() {
return 0;
}
self.measurements.iter().sum::<u32>() / self.measurements.len() as u32
}
}
```
### 8. Cross-Platform Compatibility
**Challenge:** Support different Linux distributions and desktop environments
**Solution:**
- Containerize application
- Detect available technologies at runtime
- Provide fallback options
```rust
pub fn detect_desktop_environment() -> DesktopEnvironment {
if std::path::Path::new("/usr/bin/gnome-shell").exists() {
DesktopEnvironment::GNOME
} else if std::path::Path::new("/usr/bin/plasmashell").exists() {
DesktopEnvironment::KDE
} else {
DesktopEnvironment::Other
}
}
pub fn configure_portal_for_env(env: DesktopEnvironment) -> PortalConfig {
match env {
DesktopEnvironment::GNOME => PortalConfig::gnome(),
DesktopEnvironment::KDE => PortalConfig::kde(),
DesktopEnvironment::Other => PortalConfig::generic(),
}
}
```
### 9. Debugging and Troubleshooting
**Challenge:** Debugging complex pipeline with multiple components
**Solution:**
- Comprehensive logging
- Metrics collection
- Debug mode with frame inspection
```rust
pub struct DebugLogger {
enabled: bool,
output: DebugOutput,
}
pub enum DebugOutput {
Console,
File(PathBuf),
Both,
}
impl DebugLogger {
pub fn log_frame(&self, frame: &CapturedFrame) {
if !self.enabled {
return;
}
tracing::debug!(
"Frame: {}x{}, format: {:?}, timestamp: {}",
frame.width,
frame.height,
frame.format,
frame.timestamp
);
}
pub fn log_encoding(&self, encoded: &EncodedFrame) {
if !self.enabled {
return;
}
tracing::debug!(
"Encoded: {} bytes, keyframe: {}, seq: {}",
encoded.data.len(),
encoded.is_keyframe,
encoded.sequence_number
);
}
}
```
### 10. Resource Limits
**Challenge:** Prevent resource exhaustion (CPU, memory, GPU)
**Solution:**
- Limit concurrent sessions
- Monitor resource usage
- Implement graceful degradation
```rust
pub struct ResourceManager {
max_sessions: usize,
active_sessions: Arc<Mutex<HashSet<String>>>,
cpu_threshold: f32,
memory_threshold: u64,
}
impl ResourceManager {
pub async fn can_create_session(&self) -> bool {
let sessions = self.active_sessions.lock().await;
if sessions.len() >= self.max_sessions {
return false;
}
if self.cpu_usage() > self.cpu_threshold {
return false;
}
if self.memory_usage() > self.memory_threshold {
return false;
}
true
}
}
```
---
## Code Examples
### Main Application Entry Point
```rust
// src/main.rs
mod capture;
mod encoder;
mod webrtc;
mod buffer;
mod ipc;
mod config;
use anyhow::Result;
use tracing::{info, error};
use tracing_subscriber;
#[tokio::main]
async fn main() -> Result<()> {
// Initialize logging
tracing_subscriber::fmt()
.with_max_level(tracing::Level::INFO)
.init();
// Load configuration
let config = config::load_config("config.toml")?;
info!("Starting Wayland WebRTC Backend");
info!("Configuration: {:?}", config);
// Initialize components
let capture = capture::WaylandCapture::new(config.capture).await?;
let encoder = encoder::create_encoder(config.encoder)?;
let webrtc = webrtc::WebRtcServer::new(config.webrtc).await?;
// Create video track
let video_track = webrtc::create_video_track()?;
// Run capture pipeline
let session_id = uuid::Uuid::new_v4().to_string();
webrtc.create_peer_connection(session_id.clone(), video_track).await?;
// Main loop
loop {
match capture.next_frame().await {
Ok(frame) => {
match encoder.encode(frame).await {
Ok(encoded) => {
if let Err(e) = webrtc.send_video_frame(&session_id, encoded).await {
error!("Failed to send frame: {}", e);
}
}
Err(e) => {
error!("Failed to encode frame: {}", e);
}
}
}
Err(e) => {
error!("Failed to capture frame: {}", e);
}
}
}
}
```
### Configuration Example
```toml
# config.toml - Low Latency Configuration
[server]
address = "0.0.0.0:8080"
max_sessions = 10
log_level = "info"
[capture]
frame_rate = 60
quality = "high"
screen_region = null
# screen_region = { x = 0, y = 0, width = 1920, height = 1080 }
# Low-latency capture settings
zero_copy = true # Always use DMA-BUF zero-copy
track_damage = true # Enable damage tracking
partial_updates = true # Encode only damaged regions
buffer_pool_size = 3 # Small buffer pool for low latency
[encoder]
type = "auto"
# Options: "vaapi", "nvenc", "x264", "auto"
bitrate = 4000000
keyframe_interval = 15 # Short GOP for low latency
preset = "ultrafast" # Ultrafast for minimal latency
# Low-latency specific settings
[encoder.low_latency]
gop_size = 15 # Short GOP
b_frames = 0 # No B-frames for low latency
lookahead = 0 # Minimal lookahead
rc_mode = "CBR" # Constant bitrate for predictable latency
vbv_buffer_size = 4000000 # 1 second VBV buffer
max_bitrate = 4000000 # Tight max constraint
intra_period = 15 # Keyframe interval
# Quality vs latency presets
[encoder.quality_presets]
low = { bitrate = 1000000, preset = "ultrafast", gop_size = 30 }
medium = { bitrate = 2000000, preset = "ultrafast", gop_size = 20 }
high = { bitrate = 4000000, preset = "ultrafast", gop_size = 15 }
ultra = { bitrate = 8000000, preset = "veryfast", gop_size = 15 }
[webrtc]
stun_servers = ["stun:stun.l.google.com:19302"]
turn_servers = []
[webrtc.low_latency]
# Critical: Minimal playout delay
playout_delay_min_ms = 0 # No minimum delay
playout_delay_max_ms = 20 # 20ms maximum delay
# Bitrate settings
max_bitrate = 8000000 # 8 Mbps max
min_bitrate = 500000 # 500 Kbps min
start_bitrate = 4000000 # 4 Mbps start
# Packetization
rtp_payload_size = 1200 # Smaller packets for lower latency
packetization_mode = "non_interleaved"
# Retransmission settings
nack_enabled = true # Enable NACK
nack_window_size_ms = 20 # Only request retransmission for recent packets
max_nack_packets_per_second = 50
fec_enabled = false # Disable FEC for low latency
# Congestion control
transport_cc_enabled = true # Enable transport-wide congestion control
# RTCP settings
rtcp_report_interval_ms = 50 # Frequent RTCP feedback
[webrtc.ice]
transport_policy = "all"
candidate_network_types = ["udp", "tcp"]
[buffer]
# Minimal buffering for low latency
dma_buf_pool_size = 3 # Small pool
encoded_buffer_pool_size = 5 # Very small pool
sender_buffer_max_frames = 1 # Single frame sender buffer
jitter_buffer_target_delay_ms = 5 # 5ms target jitter buffer
jitter_buffer_max_delay_ms = 10 # 10ms maximum jitter buffer
[latency]
# Latency targets
target_latency_ms = 25 # Target end-to-end latency
max_acceptable_latency_ms = 50 # Maximum acceptable latency
# Adaptive settings
adaptive_frame_rate = true
adaptive_bitrate = true
fast_frame_drop = true
# Frame dropping strategy
consecutive_drop_limit = 3 # Max consecutive drops
drop_threshold_ms = 5 # Drop if queue latency exceeds this
[monitoring]
enable_metrics = true
enable_latency_tracking = true
metrics_port = 9090
```
---
## Performance Targets
### Latency Targets
#### Local Network (LAN)
| Scenario | Target | Acceptable |
|----------|--------|------------|
| Core Target | 25-35ms | 15-25ms (Excellent) |
| Minimum | <50ms | 15-25ms (Excellent) |
| User Experience | <16ms: Imperceptible | 16-33ms: Very Smooth |
| User Experience | 33-50ms: Good | 50-100ms: Acceptable |
#### Internet
| Scenario | Target | Acceptable |
|----------|--------|------------|
| Excellent | 40-60ms | <80ms |
| Good | 60-80ms | <100ms |
### Performance Metrics by Phase
| Metric | MVP | Phase 2 | Phase 3 | Phase 4 |
|--------|-----|---------|---------|---------|
| FPS (LAN) | 30 | 60 | 60 | 60 |
| FPS (Internet) | 15-20 | 30 | 30-60 | 60 |
| Resolution | 720p | 1080p | 1080p/4K | 1080p/4K |
| Latency (LAN) | <100ms | <50ms | 25-35ms | 15-25ms |
| Latency (Internet) | <200ms | <100ms | 60-80ms | 40-60ms |
| CPU Usage | 20-30% | 10-15% | 5-10% | <5% |
| Memory Usage | 150MB | 250MB | 400MB | <400MB |
| Bitrate | 2-4 Mbps | 4-8 Mbps | Adaptive | Adaptive |
| Concurrent Sessions | 1 | 3-5 | 5-10 | 10+ |
### Latency Budget Allocation (15-25ms Target)
| Component | Time (ms) | Percentage | Optimization Strategy |
|-----------|-----------|------------|---------------------|
| Wayland Capture | 2-3 | 12-15% | DMA-BUF zero-copy, partial update |
| Encoder | 3-5 | 20-25% | Hardware encoder, no B-frames |
| Packetization | 1-2 | 6-10% | Inline RTP, minimal buffering |
| Network (LAN) | 0.5-1 | 3-5% | UDP direct path, kernel bypass |
| Jitter Buffer | 0-2 | 0-10% | Minimal buffer, predictive jitter |
| Decoder | 1-2 | 6-10% | Hardware acceleration |
| Display | 1-2 | 6-10% | vsync bypass, direct scanout |
| **Total** | **15-25** | **100%** | |
---
## Conclusion
This design provides a comprehensive blueprint for building an ultra-low-latency Wayland → WebRTC remote desktop backend in Rust. Key highlights:
1. **Zero-Copy Architecture**: Minimizes CPU copies through DMA-BUF and reference-counted buffers, achieving <5ms copy overhead
2. **Hardware Acceleration**: VA-API/NVENC encoders configured for <5ms encoding latency
3. **Minimal Buffering**: Single-frame sender buffer and 0-10ms jitter buffer throughout pipeline
4. **Low-Latency WebRTC**: Custom configuration with 0-20ms playout delay, no FEC, limited NACK
5. **Performance**: Targets 15-25ms latency on local networks at 60 FPS
6. **Adaptive Quality**: Dynamic frame rate (30-60fps) and bitrate adjustment based on network conditions
7. **Damage Tracking**: Partial region updates for static content to reduce encoding load
**Latency Budget Breakdown (15-25ms target):**
- Capture: 2-3ms (DMA-BUF zero-copy)
- Encoder: 3-5ms (hardware, no B-frames)
- Packetization: 1-2ms (inline RTP)
- Network: 0.5-1ms (LAN)
- Jitter Buffer: 0-2ms (minimal)
- Decoder: 1-2ms (hardware)
- Display: 1-2ms (vsync bypass)
The phased implementation approach allows for incremental development and testing:
- **Phase 1 (4-6 weeks)**: MVP with <100ms latency
- **Phase 2 (3-4 weeks)**: Hardware acceleration, <50ms latency
- **Phase 3 (4-5 weeks)**: Low-latency optimizations, 25-35ms latency
- **Phase 4 (5-7 weeks)**: Ultra-low latency tuning, 15-25ms latency
Critical P0 optimizations for achieving 15-25ms latency:
1. Hardware encoder with zero B-frames, 15-frame GOP
2. DMA-BUF zero-copy capture pipeline
3. Minimal buffering (1 frame sender, 0-10ms jitter)
4. WebRTC low-latency configuration (0-20ms playout delay)
---
## Additional Resources
### Wayland & PipeWire
- [Wayland Protocol](https://wayland.freedesktop.org/docs/html/)
- [PipeWire Documentation](https://docs.pipewire.org/)
- [xdg-desktop-portal](https://flatpak.github.io/xdg-desktop-portal/)
### WebRTC
- [WebRTC Specifications](https://www.w3.org/TR/webrtc/)
- [webrtc-rs](https://github.com/webrtc-rs/webrtc)
- [WebRTC for the Curious](https://webrtcforthecurious.com/)
### Video Encoding
- [VA-API](https://github.com/intel/libva)
- [NVENC](https://developer.nvidia.com/nvidia-video-codec-sdk)
- [x264](https://www.videolan.org/developers/x264.html)
### Rust
- [Tokio](https://tokio.rs/)
- [Bytes](https://docs.rs/bytes/)
- [Async Rust Book](https://rust-lang.github.io/async-book/)